Re: Solr 4.x auto-increment/sequence/counter functionality.
A slightly different approach. * I noticed that I can sort by the internal Lucene _docid_. - http://wiki.apache.org/solr/CommonQueryParameters http://wiki.apache.org/solr/CommonQueryParameters You can sort by index id using sort=_docid_ asc or sort=_docid_ desc * I have also read the docid is represented by a sequential number. - http://lucene.472066.n3.nabble.com/Get-DocID-after-Document-insert-td556278.html http://lucene.472066.n3.nabble.com/Get-DocID-after-Document-insert-td556278.html Your document IDs may change, and in fact *will* change if you delete a document and then optimize. Say you index 100 docs, delete number 50 and optimize. Documents that originally had IDs 51-100 will now have IDs 50-99 and your hierarchy will be messed up. So there is a slight chance that the _docid_ might represent document creation order. Does anyone have knowledge and experience with the internals of the Lucene _docid_ field? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4046137.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.x auto-increment/sequence/counter functionality.
So I think I took the easiest option by creating an UpdateRequestProcessor implementation (I was unsure of the performance implications and object model of ScriptUpdateProcessor). The below DocumentCreationDetailsProcessorFactory class seems to achieve my aim of allowing me to sort my Solr Documents by a creation order (To an extent - I don't think it is exactly the commit order..), though the auto-increment/sequence/counter functionality is not continuous. Solr Sort Parameter String: sort=created_time_stamp_l asc, created_processing_sequence_number_l asc, created_by_solr_thread_id_l asc, created_by_solr_core_name_s asc, created_by_solr_shard_id_s asc Any comments or feedback would be appreciated. // // UpdateRequestProcessor implementation // public class DocumentCreationDetailsProcessorFactory extends UpdateRequestProcessorFactory { private static final AtomicLong processingSequenceNumber = new AtomicLong(); @Override public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { return new DocumentCreationDetailsProcessor(req, rsp, next, processingSequenceNumber); } } class DocumentCreationDetailsProcessor extends UpdateRequestProcessor { private final SolrQueryRequest req; @SuppressWarnings(unused) private final SolrQueryResponse rsp; @SuppressWarnings(unused) private final UpdateRequestProcessor next; private final AtomicLong processingSequenceNumber; public DocumentCreationDetailsProcessor(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next, AtomicLong processingSequenceNumber ) { super(next); this.req = req; this.rsp = rsp; this.next = next; this.processingSequenceNumber = processingSequenceNumber; } @Override public void processAdd(AddUpdateCommand cmd) throws IOException { SolrInputDocument solrInputDocument = cmd.getSolrInputDocument(); solrInputDocument.addField(created_time_stamp_l, System.currentTimeMillis()); solrInputDocument.addField(created_processing_sequence_number_l, processingSequenceNumber.incrementAndGet()); String solrCoreName = null; String solrShardId = null; if (req != null req.getCore() != null req.getCore().getCoreDescriptor() != null ) { SolrCore solrCore = req.getCore(); CoreDescriptor coreDesc = null; CloudDescriptor cloudDesc = null; if ( solrCore != null ) { solrCoreName = solrCore.getName(); coreDesc = req.getCore().getCoreDescriptor(); if (coreDesc != null) { cloudDesc = coreDesc.getCloudDescriptor(); } if (cloudDesc != null) { solrShardId = cloudDesc.getShardId(); } } } solrInputDocument.addField(created_by_solr_thread_id_l, Thread.currentThread().getId()); solrInputDocument.addField(created_by_solr_core_name_s, solrCoreName); solrInputDocument.addField(created_by_solr_shard_id_s, solrShardId); // pass it up the chain super.processAdd(cmd); } } // // // Added the below for a bit of context (http://wiki.apache.org/solr/SolrPlugins) // mkdir /opt/solr/instances/test/collection1/lib cp /home/user/download/test-solr-plugins-0.0.1.jar /opt/solr/instances/test/collection1/lib/ chown root:tomcat7 /opt/solr/instances/test/collection1/lib/* vim /opt/solr/instances/test/collection1/conf/solrconfig.xml updateRequestProcessorChain name=mychain processor class=com.test.solr.plugins.DocumentCreationDetailsProcessorFactory /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain vim /opt/solr/instances/test/collection1/conf/solrconfig.xml requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainmychain/str /lst /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4045725.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.x auto-increment/sequence/counter functionality.
Hi, How about a custom UpdateRequestProcessor that uses milliseconds or even nanoseconds and stores them in some field? If that is enough resolution and you still want to avoid collision, append a random letter/string/number to it, a la millis or nanos_extra stuff to make it unique. Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 6, 2013 at 2:31 AM, marks1900-pos...@yahoo.com.au marks1900-pos...@yahoo.com.au wrote: I am looking into how to add auto-increment/sequence/counter functionality to Solr 4.x. I specifically want to do this, so that I have numeric field which records the document insertion order that can be sorted against. This numeric field would have to be unique and not be allowed to change over time. Unfortunately using a insertion date would provide numerous collisions. Any feedback or ideas on an approach that would help me achieve this would be appreciated. I am thinking that this could be achieved multiple ways: * Via Remote Solr Document calls. (A Solr Singleton for remote calls + Solr calls to get the current sequence value and then a call to increment the value ) * A Solr Plugin (extend RequestHandlerBase - http:///sequence?q=namesize=1000 and return the next sequence/counter number ) * Using a standard RDBMS such as PostgreSQL. * Some special Solr/Lucene functionality that I don't know about. The closest information I could find is outlined here: http://lucene.472066.n3.nabble.com/counter-field-td3886549.html A bit more background: I am using Solr as a NoSQL solution with great text search capabilities. Currently, I am inserting beans using SolrJ and each of these beans has an id which is comprised of bean string type (Such as CUSTOMER, BOOK, STORE ) concatenated with a unique bean type identifier string ( Customer - UUID.randomUUID().toString().toLowerCase(Locale.ENGLISH), Book - ISDN, Store - name). For instance, CUSTOMER-b245659b-825c-4357-aab0-6d592468889a, BOOK-978-1782161325 or STORE-TheUniquelyNamedStore. Ideally I am aiming to add a numeric field to these beans that represents insertion position, that will then be used as a sorting field.
Re: Solr 4.x auto-increment/sequence/counter functionality.
Appending a random value only reduces the chance of a collision (And I need to ensure continuous uniqueness) and could hurt how the field is later sorted. I have not written a custom UpdateRequestProcessor before, is there a way to incorporate a Singleton that ensures one instance across a cluster? SolrCloud? I guess the main thing is that I want the value would also be kept unique across a cluster of Solr instances.As far as I know in Solr, the only *free* uniqueness check is with the uniqueKeyid/uniqueKey declaration in schema.xml. Are there other options that I should be considering? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4045239.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.x auto-increment/sequence/counter functionality.
This sounds like a job for Zookeeper (distributed coordination is what it does). Take a look at: http://zookeeper-user.578899.n2.nabble.com/Sequence-Number-Generation-With-Zookeeper-td5378618.html On Wed, Mar 6, 2013 at 10:00 AM, mark12345 marks1900-pos...@yahoo.com.au wrote: Appending a random value only reduces the chance of a collision (And I need to ensure continuous uniqueness) and could hurt how the field is later sorted. I have not written a custom UpdateRequestProcessor before, is there a way to incorporate a Singleton that ensures one instance across a cluster? SolrCloud? I guess the main thing is that I want the value would also be kept unique across a cluster of Solr instances.As far as I know in Solr, the only *free* uniqueness check is with the uniqueKeyid/uniqueKey declaration in schema.xml. Are there other options that I should be considering? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4045239.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.x auto-increment/sequence/counter functionality.
If you want to mess with UpdateRequestProcessors, try the ScriptUpdateProcessor, with which you can write your update logic in Javascript. That would allow you to add your unique field. Use something like timestamp+threadno+shardno and you'd have something unique (assuming you can access those from Javascript). Upayavira On Wed, Mar 6, 2013, at 03:42 PM, Timothy Potter wrote: This sounds like a job for Zookeeper (distributed coordination is what it does). Take a look at: http://zookeeper-user.578899.n2.nabble.com/Sequence-Number-Generation-With-Zookeeper-td5378618.html On Wed, Mar 6, 2013 at 10:00 AM, mark12345 marks1900-pos...@yahoo.com.au wrote: Appending a random value only reduces the chance of a collision (And I need to ensure continuous uniqueness) and could hurt how the field is later sorted. I have not written a custom UpdateRequestProcessor before, is there a way to incorporate a Singleton that ensures one instance across a cluster? SolrCloud? I guess the main thing is that I want the value would also be kept unique across a cluster of Solr instances.As far as I know in Solr, the only *free* uniqueness check is with the uniqueKeyid/uniqueKey declaration in schema.xml. Are there other options that I should be considering? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4045239.html Sent from the Solr - User mailing list archive at Nabble.com.