Re: How to copy a solr index to another index with a different schema collapsing stored data?

2008-09-17 Thread Erick Erickson
You *might* be able to reconstruct enough of the original documents
from your indexes to create another without recrawling. I know Luke
can reconstruct documents form an index, but for unstored data it's
slow and may be lossy.

But it may suit your needs given how long it takes to make your index
in the first place.

Best
Erick

On Tue, Sep 16, 2008 at 9:14 PM, Gene Campbell [EMAIL PROTECTED] wrote:

 I was pretty sure you'd say that.  But, I means lots that you take the
 time to confirm it.  Thanks Otis.

 I don't want to give details, but we crawl for our data, and we don't
 save it in a DB or on disk.  It goes from download to index.  Was a
 good idea at the time; when we thought our designs were done evolving.
  :)

 cheers
 gene


 On Wed, Sep 17, 2008 at 12:51 PM, Otis Gospodnetic
 [EMAIL PROTECTED] wrote:
  You can't copy+merge+flatten indices like that.  Reindexing would be the
 easiest.  Indexing taking weeks sounds suspicious.  How much data are you
 reindexing and how big are your indices?
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: ristretto.rb [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Tuesday, September 16, 2008 8:14:16 PM
  Subject: How to copy a solr index to another index with a different
 schema collapsing stored data?
 
  Is it possible to copy stored index data from index to another, but
  concatenating it as you go.
 
  Suppose 2 categories A and B both with 20 docs, for a total of 40 docs
  in the index.  The index has a stored field for the content from the
  docs.
 
  I want a new index with only two docs in it, one for A and one for B.
  And it would have a stored field that is the sum of all the stored
  data for the 20 docs of A and of B respectively.
 
  So, then a query on this index will tell me give me a relevant list of
  Categories?
 
  Perhaps there's a solr query to get that data out, and then I can
  handle concatenating it, and then indexing it in the new index.
 
  I'm hoping I don't have to reindex all this data from scratch?  It has
  taken weeks!
 
  thanks
  gene
 
 



Re: How to copy a solr index to another index with a different schema collapsing stored data?

2008-09-17 Thread Brian Carmalt
It wouldn't be that bad to merge the index externally and the reindex
the results, if it is as simple as your example. Search for id:[1 TO *]
and a fq for the category, increment the slice of the results you need
to process until you have covered all of the docs in the category.
Request the content field and extract them from the xml responses and
save them somewhere. When you have all the info, reindex it. 

Am Mittwoch, den 17.09.2008, 10:00 -0400 schrieb Erick Erickson:
 You *might* be able to reconstruct enough of the original documents
 from your indexes to create another without recrawling. I know Luke
 can reconstruct documents form an index, but for unstored data it's
 slow and may be lossy.
 
 But it may suit your needs given how long it takes to make your index
 in the first place.
 
 Best
 Erick
 
 On Tue, Sep 16, 2008 at 9:14 PM, Gene Campbell [EMAIL PROTECTED] wrote:
 
  I was pretty sure you'd say that.  But, I means lots that you take the
  time to confirm it.  Thanks Otis.
 
  I don't want to give details, but we crawl for our data, and we don't
  save it in a DB or on disk.  It goes from download to index.  Was a
  good idea at the time; when we thought our designs were done evolving.
   :)
 
  cheers
  gene
 
 
  On Wed, Sep 17, 2008 at 12:51 PM, Otis Gospodnetic
  [EMAIL PROTECTED] wrote:
   You can't copy+merge+flatten indices like that.  Reindexing would be the
  easiest.  Indexing taking weeks sounds suspicious.  How much data are you
  reindexing and how big are your indices?
  
   Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
   - Original Message 
   From: ristretto.rb [EMAIL PROTECTED]
   To: solr-user@lucene.apache.org
   Sent: Tuesday, September 16, 2008 8:14:16 PM
   Subject: How to copy a solr index to another index with a different
  schema collapsing stored data?
  
   Is it possible to copy stored index data from index to another, but
   concatenating it as you go.
  
   Suppose 2 categories A and B both with 20 docs, for a total of 40 docs
   in the index.  The index has a stored field for the content from the
   docs.
  
   I want a new index with only two docs in it, one for A and one for B.
   And it would have a stored field that is the sum of all the stored
   data for the 20 docs of A and of B respectively.
  
   So, then a query on this index will tell me give me a relevant list of
   Categories?
  
   Perhaps there's a solr query to get that data out, and then I can
   handle concatenating it, and then indexing it in the new index.
  
   I'm hoping I don't have to reindex all this data from scratch?  It has
   taken weeks!
  
   thanks
   gene
  
  
 



Re: How to copy a solr index to another index with a different schema collapsing stored data?

2008-09-16 Thread Otis Gospodnetic
You can't copy+merge+flatten indices like that.  Reindexing would be the 
easiest.  Indexing taking weeks sounds suspicious.  How much data are you 
reindexing and how big are your indices?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: ristretto.rb [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Tuesday, September 16, 2008 8:14:16 PM
 Subject: How to copy a solr index to another index with a different schema 
 collapsing stored data?
 
 Is it possible to copy stored index data from index to another, but
 concatenating it as you go.
 
 Suppose 2 categories A and B both with 20 docs, for a total of 40 docs
 in the index.  The index has a stored field for the content from the
 docs.
 
 I want a new index with only two docs in it, one for A and one for B.
 And it would have a stored field that is the sum of all the stored
 data for the 20 docs of A and of B respectively.
 
 So, then a query on this index will tell me give me a relevant list of
 Categories?
 
 Perhaps there's a solr query to get that data out, and then I can
 handle concatenating it, and then indexing it in the new index.
 
 I'm hoping I don't have to reindex all this data from scratch?  It has
 taken weeks!
 
 thanks
 gene



Re: How to copy a solr index to another index with a different schema collapsing stored data?

2008-09-16 Thread Gene Campbell
I was pretty sure you'd say that.  But, I means lots that you take the
time to confirm it.  Thanks Otis.

I don't want to give details, but we crawl for our data, and we don't
save it in a DB or on disk.  It goes from download to index.  Was a
good idea at the time; when we thought our designs were done evolving.
 :)

cheers
gene


On Wed, Sep 17, 2008 at 12:51 PM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
 You can't copy+merge+flatten indices like that.  Reindexing would be the 
 easiest.  Indexing taking weeks sounds suspicious.  How much data are you 
 reindexing and how big are your indices?

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: ristretto.rb [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Tuesday, September 16, 2008 8:14:16 PM
 Subject: How to copy a solr index to another index with a different schema 
 collapsing stored data?

 Is it possible to copy stored index data from index to another, but
 concatenating it as you go.

 Suppose 2 categories A and B both with 20 docs, for a total of 40 docs
 in the index.  The index has a stored field for the content from the
 docs.

 I want a new index with only two docs in it, one for A and one for B.
 And it would have a stored field that is the sum of all the stored
 data for the 20 docs of A and of B respectively.

 So, then a query on this index will tell me give me a relevant list of
 Categories?

 Perhaps there's a solr query to get that data out, and then I can
 handle concatenating it, and then indexing it in the new index.

 I'm hoping I don't have to reindex all this data from scratch?  It has
 taken weeks!

 thanks
 gene




Re: How to copy a solr index to another index with a different schema collapsing stored data?

2008-09-16 Thread ristretto . rb
is it possible to query out the stored data as, uh, tokens I suppose.
Then, index those tokens in the next index?

thanks
gene


On Wed, Sep 17, 2008 at 1:14 PM, Gene Campbell [EMAIL PROTECTED] wrote:
 I was pretty sure you'd say that.  But, I means lots that you take the
 time to confirm it.  Thanks Otis.

 I don't want to give details, but we crawl for our data, and we don't
 save it in a DB or on disk.  It goes from download to index.  Was a
 good idea at the time; when we thought our designs were done evolving.
  :)

 cheers
 gene


 On Wed, Sep 17, 2008 at 12:51 PM, Otis Gospodnetic
 [EMAIL PROTECTED] wrote:
 You can't copy+merge+flatten indices like that.  Reindexing would be the 
 easiest.  Indexing taking weeks sounds suspicious.  How much data are you 
 reindexing and how big are your indices?

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: ristretto.rb [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Tuesday, September 16, 2008 8:14:16 PM
 Subject: How to copy a solr index to another index with a different schema 
 collapsing stored data?

 Is it possible to copy stored index data from index to another, but
 concatenating it as you go.

 Suppose 2 categories A and B both with 20 docs, for a total of 40 docs
 in the index.  The index has a stored field for the content from the
 docs.

 I want a new index with only two docs in it, one for A and one for B.
 And it would have a stored field that is the sum of all the stored
 data for the 20 docs of A and of B respectively.

 So, then a query on this index will tell me give me a relevant list of
 Categories?

 Perhaps there's a solr query to get that data out, and then I can
 handle concatenating it, and then indexing it in the new index.

 I'm hoping I don't have to reindex all this data from scratch?  It has
 taken weeks!

 thanks
 gene