Re: DataImport using last_indexed_id or getting max(id) quickly
You could also just keep a "special" document in your index with a known ID that contains meta-data fields. If this document had no fields in common with any other document it wouldn't satisfy searches (except the *:* search). Or you could store this info somewhere else (file, DB, etc). Or you can commit with "user data", although this isn't exposed through Solr yet, see: https://issues.apache.org/jira/browse/SOLR-2701 Best Erick On Thu, Jul 12, 2012 at 5:22 AM, wrote: > Hi Avenka, > > you asked for a HowTo to add a field "inverseID" which allows to calculate > max(id) from its first term: > If you do not use solr you have to calculate "1 - id" and store it in > an extra field "inverseID". > If you fill solr with your own code, add a TrieLongField "inverseID" and fill > with the value "-id". > If you only want to change schema.xml (and add some classes): > * You need a new FieldType "inverseLongType" and a Field "inverseID" of > Type "inverseLongType" > * You need a line >(see http://wiki.apache.org/solr/SchemaXml#Copy_Fields) > > For inverseLongType I see two possibilities > a) use TextField and make your own filter to calculate "1 - id" > b) extends TrieLongField to a new FieldType "InverseTrieLongField" with: > @Override > public String readableToIndexed(String val) { > return super.readableToIndexed(Long.toString( -Long.parseLong(val))); > } > @Override > public Fieldable createField(SchemaField field, String externalVal, float > boost) { > return super.createField(field,Long.toString( -Long.parseLong(val)), > boost ); > } > @Override > public Object toObject(Fieldable f) { > Object result = super.toObject(f); > if(result instanceof Long){ > return new Long( -((Long)result).longValue()); > } > return result; > } > > Beste regards >Karsten > > View this message in context: > http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html > > > Original-Nachricht >> Datum: Wed, 11 Jul 2012 20:59:10 -0700 (PDT) >> Von: avenka >> An: solr-user@lucene.apache.org >> Betreff: Re: DataImport using last_indexed_id or getting max(id) quickly > >> Thanks. Can you explain more the first TermsComponent option to obtain >> max(id)? Do I have to modify schema.xml to add a new field? How exactly do >> I >> query for the lowest value of "1 - id"? >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html >> Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImport using last_indexed_id or getting max(id) quickly
Hi Avenka, you asked for a HowTo to add a field "inverseID" which allows to calculate max(id) from its first term: If you do not use solr you have to calculate "1 - id" and store it in an extra field "inverseID". If you fill solr with your own code, add a TrieLongField "inverseID" and fill with the value "-id". If you only want to change schema.xml (and add some classes): * You need a new FieldType "inverseLongType" and a Field "inverseID" of Type "inverseLongType" * You need a line (see http://wiki.apache.org/solr/SchemaXml#Copy_Fields) For inverseLongType I see two possibilities a) use TextField and make your own filter to calculate "1 - id" b) extends TrieLongField to a new FieldType "InverseTrieLongField" with: @Override public String readableToIndexed(String val) { return super.readableToIndexed(Long.toString( -Long.parseLong(val))); } @Override public Fieldable createField(SchemaField field, String externalVal, float boost) { return super.createField(field,Long.toString( -Long.parseLong(val)), boost ); } @Override public Object toObject(Fieldable f) { Object result = super.toObject(f); if(result instanceof Long){ return new Long( -((Long)result).longValue()); } return result; } Beste regards Karsten View this message in context: http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html Original-Nachricht > Datum: Wed, 11 Jul 2012 20:59:10 -0700 (PDT) > Von: avenka > An: solr-user@lucene.apache.org > Betreff: Re: DataImport using last_indexed_id or getting max(id) quickly > Thanks. Can you explain more the first TermsComponent option to obtain > max(id)? Do I have to modify schema.xml to add a new field? How exactly do > I > query for the lowest value of "1 - id"? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImport using last_indexed_id or getting max(id) quickly
Thanks. Can you explain more the first TermsComponent option to obtain max(id)? Do I have to modify schema.xml to add a new field? How exactly do I query for the lowest value of "1 - id"? -- View this message in context: http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImport using last_indexed_id or getting max(id) quickly
Hi Avenka, *DataImportHandler* 1.) there is no configuration to add the last uniqueKeyField-Values to dataimport.properties 2.) you can use LogUpdateProcessor to log all "schema.printableUniqueKey(doc)" to log.info( ""+toLog + " 0 " + (elapsed) ) 3.) you can write your own LogUpdateProcessor to log only the last UniqueKey 4.) you can change DocBuilder#execute to store the uniqueKey in dataimport.properties *max(id)* With TermsComponent you can easily ask for the first term in a field (so you could add a field with "1000 - id" to find the last term in id). With solr 4.0 some index-codes will support "give me the last term" in a field: Fields#getUniqueTermCount() together with TermsEnum#seekExact(long) With solr 3.6 you can use TermsComponent together wir guessing a "terms.lower" to find the last term in a field. This should outran a "*:*" search with function max(id). Beste regards Karsten View this message in context: http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763.html Original-Nachricht > Datum: Sun, 8 Jul 2012 10:25:55 -0700 (PDT) > Von: avenka > An: solr-user@lucene.apache.org > Betreff: DataImport using last_indexed_id or getting max(id) quickly > My understanding is that the DIH in solr only enters last_indexed_time in > dataimport.properties, but not say last_indexed_id for a primary key 'id'. > How can I efficiently get the max(id) (note that 'id' is an auto-increment > field in the database) ? Maintaining max(id) outside of solr is brittle > and > calling max(id) before each dataimport can take several minutes when the > index has several hundred million records. > > How can I either import based on ID or get max(id) quickly? I can not use > timestamp-based import because I get out-of-memory errors if/when solr > falls > behind and the suggested fixes online did not work for me. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763.html > Sent from the Solr - User mailing list archive at Nabble.com.