-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello,
I'm trying to set up a Nutch+Solr to crawl a list of domains. I want to get 50 pages per seed in the list (no external links) and save the seed each page came from in the result. The goal is to be able to query for a word and get all the seeds from my list that lead to a page containing it. Example: I have a seed list with: http://domainone.com http://domaintwo.com http://domainthree.com http://domainfour.com http://domainfive.com I save 50 subpages from each of them to solr. (Total of 5*50=250 pages indexed in solr) Now I query for "foobar" and want to get the items back from the seedlist which contained the word "foobar" or have subpages (http://domainthree.com/somepage.html) that contained that word. How would I save the seed a page came originally from in solr? Thanks, binaryplease -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJURPNWAAoJEB9MwmVSfxihkwoH/RbAT5Qlhy2ZqAF5IlbesXR8 seDIKUsk019iWc8L2s7Pe2NcMaMc7tGwXR2ukbLLIO6Ltuygt0W3Odx9O+2YRtlh XOG45Z3jvODZbYWRdQQ5uX6FdMkGMCz8xBxKKKfO35fsSVSiXSb2P4+taqvlFjSh 7ubpZCONCu124D5r5VhgtIlpWvolTWQLOXG2YDUqQrreFw2aSA7huUP8iyds5so2 Wp8IrBRWpFIHXA1AMeB5imgH+fmRxopg/lenUiyqHLTrxqIt3dIdipKk55+8qeks yBtFkwZhktNfi2tmmZpVxckmyMO/Ru+S3Dhwcrvg2Yt5NR0znPBnAE62bAH9z9I= =+H4B -----END PGP SIGNATURE-----