Re: DataImport using last_indexed_id or getting max(id) quickly

2012-07-13 Thread Erick Erickson
You could also just keep a "special" document in your index with a known
ID that contains meta-data fields. If this document had no fields in common
with any other document it wouldn't satisfy searches (except the *:* search).

Or you could store this info somewhere else (file, DB, etc).

Or you can commit with "user data", although this isn't exposed
through Solr yet, see:
https://issues.apache.org/jira/browse/SOLR-2701

Best
Erick

On Thu, Jul 12, 2012 at 5:22 AM,   wrote:
> Hi Avenka,
>
> you asked for a HowTo to add a field "inverseID" which allows to calculate 
> max(id) from its first term:
> If you do not use solr you have to calculate "1 - id" and store it in 
> an extra field "inverseID".
> If you fill solr with your own code, add a TrieLongField "inverseID" and fill 
> with the value "-id".
> If you only want to change schema.xml (and add some classes):
>   * You need a new FieldType "inverseLongType" and a Field "inverseID" of 
> Type "inverseLongType"
>   * You need a line 
>(see http://wiki.apache.org/solr/SchemaXml#Copy_Fields)
>
> For inverseLongType I see two possibilities
>  a) use TextField and make your own filter to calculate "1 - id"
>  b) extends TrieLongField to a new FieldType "InverseTrieLongField" with:
>   @Override
>   public String readableToIndexed(String val) {
> return super.readableToIndexed(Long.toString( -Long.parseLong(val)));
>   }
>   @Override
>   public Fieldable createField(SchemaField field, String externalVal, float 
> boost) {
> return super.createField(field,Long.toString( -Long.parseLong(val)), 
> boost );
>   }
>   @Override
>   public Object toObject(Fieldable f) {
> Object result = super.toObject(f);
> if(result instanceof Long){
>   return new Long( -((Long)result).longValue());
> }
> return result;
>   }
>
> Beste regards
>Karsten
>
> View this message in context:
> http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html
>
>
>  Original-Nachricht 
>> Datum: Wed, 11 Jul 2012 20:59:10 -0700 (PDT)
>> Von: avenka 
>> An: solr-user@lucene.apache.org
>> Betreff: Re: DataImport using last_indexed_id or getting max(id) quickly
>
>> Thanks. Can you explain more the first TermsComponent option to obtain
>> max(id)? Do I have to modify schema.xml to add a new field? How exactly do
>> I
>> query for the lowest value of "1 - id"?
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImport using last_indexed_id or getting max(id) quickly

2012-07-12 Thread karsten-solr
Hi Avenka,

you asked for a HowTo to add a field "inverseID" which allows to calculate 
max(id) from its first term:
If you do not use solr you have to calculate "1 - id" and store it in 
an extra field "inverseID".
If you fill solr with your own code, add a TrieLongField "inverseID" and fill 
with the value "-id".
If you only want to change schema.xml (and add some classes):
  * You need a new FieldType "inverseLongType" and a Field "inverseID" of Type 
"inverseLongType"
  * You need a line 
   (see http://wiki.apache.org/solr/SchemaXml#Copy_Fields)

For inverseLongType I see two possibilities
 a) use TextField and make your own filter to calculate "1 - id"
 b) extends TrieLongField to a new FieldType "InverseTrieLongField" with:
  @Override
  public String readableToIndexed(String val) {
return super.readableToIndexed(Long.toString( -Long.parseLong(val)));
  }
  @Override
  public Fieldable createField(SchemaField field, String externalVal, float 
boost) {
return super.createField(field,Long.toString( -Long.parseLong(val)), boost 
);
  }
  @Override
  public Object toObject(Fieldable f) {
Object result = super.toObject(f);
if(result instanceof Long){
  return new Long( -((Long)result).longValue());
}
return result;
  }

Beste regards
   Karsten

View this message in context:
http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html


 Original-Nachricht 
> Datum: Wed, 11 Jul 2012 20:59:10 -0700 (PDT)
> Von: avenka 
> An: solr-user@lucene.apache.org
> Betreff: Re: DataImport using last_indexed_id or getting max(id) quickly

> Thanks. Can you explain more the first TermsComponent option to obtain
> max(id)? Do I have to modify schema.xml to add a new field? How exactly do
> I
> query for the lowest value of "1 - id"?
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImport using last_indexed_id or getting max(id) quickly

2012-07-11 Thread avenka
Thanks. Can you explain more the first TermsComponent option to obtain
max(id)? Do I have to modify schema.xml to add a new field? How exactly do I
query for the lowest value of "1 - id"?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImport using last_indexed_id or getting max(id) quickly

2012-07-11 Thread karsten-solr
Hi Avenka,

*DataImportHandler*
1.) there is no configuration to add the last uniqueKeyField-Values to 
dataimport.properties
2.) you can use LogUpdateProcessor to log all "schema.printableUniqueKey(doc)" 
to log.info( ""+toLog + " 0 " + (elapsed) )
3.) you can write your own LogUpdateProcessor to log only the last UniqueKey
4.) you can change DocBuilder#execute to store the uniqueKey in 
dataimport.properties

*max(id)*
With TermsComponent you can easily ask for the first term in a field (so you 
could add a field with "1000 - id" to find the last term in id).
With solr 4.0 some index-codes will support "give me the last term" in a field: 
Fields#getUniqueTermCount() together with TermsEnum#seekExact(long)
With solr 3.6 you can use TermsComponent together wir guessing a "terms.lower" 
to find the last term in a field. This should outran a "*:*" search with 
function max(id).

Beste regards
  Karsten


View this message in context:
http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763.html

 Original-Nachricht 
> Datum: Sun, 8 Jul 2012 10:25:55 -0700 (PDT)
> Von: avenka 
> An: solr-user@lucene.apache.org
> Betreff: DataImport using last_indexed_id or getting max(id) quickly

> My understanding is that the DIH in solr only enters last_indexed_time in
> dataimport.properties, but not say last_indexed_id for a primary key 'id'.
> How can I efficiently get the max(id) (note that 'id' is an auto-increment
> field in the database) ? Maintaining max(id) outside of solr is brittle
> and
> calling max(id) before each dataimport can take several minutes when the
> index has several hundred million records.
> 
> How can I either import based on ID or get max(id) quickly? I can not use
> timestamp-based import because I get out-of-memory errors if/when solr
> falls
> behind and the suggested fixes online did not work for me. 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763.html
> Sent from the Solr - User mailing list archive at Nabble.com.