Re: deduplication of suggester results are not enough
Hi Roland, I wrote AnalyzingInfixSuggester that deduplicates data on several levels at index time. I will publish it in few days on github. I'll wrote to this thread when done. m. On štvrtok 26. marca 2020 16:01:57 CET Szűcs Roland wrote: > Hi All, > > I follow the discussion of the suggester related discussions quite a while > ago. Everybody agrees that it is not the expected behaviour from a > Suggester where the terms are the entities and not the documents to return > the same string representation several times. > > One suggestion was to make deduplication on client side of Solr. It is very > easy in most of the client solution as any set based data structure solve > this. > > *But one important problem is not solved the deduplication: suggest.count*. > > If I have15 matches by the suggester and the suggest.count=10 and the first > 9 matches are the same, I will get back only 2 after the deduplication and > the remaining 5 unique terms will be never shown. > > What is the solution for this? > > Cheers, > Roland >
Re: Deduplication
On Wed, May 20, 2015 at 12:59 PM, Bram Van Dam wrote: > >> Write a custom update processor and include it in your update chain. > >> You will then have the ability to do anything you want with the entire > >> input document before it hits the code to actually do the indexing. > > This sounded like the perfect option ... until I read Jack's comment: > > > > > My understanding was that the distributed update processor is near the > end > > of the chain, so that running of user update processors occurs before the > > distribution step, but is that distribution to the leader, or > distribution > > from leader to replicas for a shard? > > That would pose some potential problems. > > Would a custom update processor make the solution "cloud-safe"? > Starting with Solr 5.1, you have the ability to specify an update processor on the fly to requests and you can even control whether it is to be executed before any distribution happens or before it is actually indexed on the replica. e.g. you can specify processor=xyz,MyCustomUpdateProc in the request to have processor xyz run first and then MyCustomUpdateProc and then the default update processor chain (which will also distribute the doc to the leader or from the leader to a replica). This also means that such processors will not be executed on the replicas at all. You can also specify post-processor=xyz,MyCustomUpdateProc to have xyz and MyCustomUpdateProc to run on each replica (including the leader) right before the doc is indexed (i.e. just before RunUpdateProcessor) Unfortunately, due to an oversight, this feature hasn't been documented well which is something I'll fix. See https://issues.apache.org/jira/browse/SOLR-6892 for more details. > > Thx, > > - Bram > > -- Regards, Shalin Shekhar Mangar.
Re: Deduplication
What the Solr de-duplciation offers you is to calculate for each document in input an Hash ( based on a set of fields). You can then select two options : - Index everything, documents with same signature will be equals - avoid the overwriting of duplicates. How the similarity has is calculated is something you can play with and customise if needed. Clarified that, do you think can fit in some way, or definitely you are not talking about deduce ? 2015-05-20 8:37 GMT+01:00 Bram Van Dam : > On 19/05/15 14:47, Alessandro Benedetti wrote: > > Hi Bram, > > what do you mean with : > > " I > > would like it to provide the unique value myself, without having the > > deduplicator create a hash of field values " . > > > > This is not reduplication, but simple document filtering based on a > > constraint. > > In the case you want de-duplication ( which seemed from your very first > > part of the mail) here you can find a lot of info : > > Not sure whether de-duplication is the right word for what I'm after, I > essentially want a unique constraint on an arbitrary field. Without > overwrite semantics, because I want Solr to tell me if a duplicate is > sent to Solr. > > I was thinking that the de-duplication feature could accomplish this > somehow. > > > - Bram > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: Deduplication
On 19/05/15 14:47, Alessandro Benedetti wrote: > Hi Bram, > what do you mean with : > " I > would like it to provide the unique value myself, without having the > deduplicator create a hash of field values " . > > This is not reduplication, but simple document filtering based on a > constraint. > In the case you want de-duplication ( which seemed from your very first > part of the mail) here you can find a lot of info : Not sure whether de-duplication is the right word for what I'm after, I essentially want a unique constraint on an arbitrary field. Without overwrite semantics, because I want Solr to tell me if a duplicate is sent to Solr. I was thinking that the de-duplication feature could accomplish this somehow. - Bram
Re: Deduplication
>> Write a custom update processor and include it in your update chain. >> You will then have the ability to do anything you want with the entire >> input document before it hits the code to actually do the indexing. This sounded like the perfect option ... until I read Jack's comment: > > My understanding was that the distributed update processor is near the end > of the chain, so that running of user update processors occurs before the > distribution step, but is that distribution to the leader, or distribution > from leader to replicas for a shard? That would pose some potential problems. Would a custom update processor make the solution "cloud-safe"? Thx, - Bram
Re: Deduplication
Shawn, I was going to say the same thing, but... then I was thinking about SolrCloud and the fact that update processors are invoked before the document is set to its target node, so there wouldn't be a reliable way to tell if the input document field value exists on the target rather than current node. Or does the update processing only occur on the leader node after being forwarded from the originating node? Is the doc clear on this detail? My understanding was that the distributed update processor is near the end of the chain, so that running of user update processors occurs before the distribution step, but is that distribution to the leader, or distribution from leader to replicas for a shard? -- Jack Krupansky On Tue, May 19, 2015 at 9:01 AM, Shawn Heisey wrote: > On 5/19/2015 3:02 AM, Bram Van Dam wrote: > > I'm looking for a way to have Solr reject documents if a certain field > > value is duplicated (reject, not overwrite). There doesn't seem to be > > any kind of unique option in schema fields. > > > > The de-duplication feature seems to make this (somewhat) possible, but I > > would like it to provide the unique value myself, without having the > > deduplicator create a hash of field values. > > > > Am I missing an obvious (or less obvious) way of accomplishing this? > > Write a custom update processor and include it in your update chain. > You will then have the ability to do anything you want with the entire > input document before it hits the code to actually do the indexing. > > A script update processor is included with Solr allows you to write your > processor in a language other than Java, such as javascript. > > > https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html > > Here's how to discard a document in an update processor written in Java: > > > http://stackoverflow.com/questions/27108200/how-to-cancel-indexing-of-a-solr-document-using-update-request-processor > > The javadoc that I linked above describes the ability to return "false" > in other languages to discard the document. > > Thanks, > Shawn > >
Re: Deduplication
On 5/19/2015 3:02 AM, Bram Van Dam wrote: > I'm looking for a way to have Solr reject documents if a certain field > value is duplicated (reject, not overwrite). There doesn't seem to be > any kind of unique option in schema fields. > > The de-duplication feature seems to make this (somewhat) possible, but I > would like it to provide the unique value myself, without having the > deduplicator create a hash of field values. > > Am I missing an obvious (or less obvious) way of accomplishing this? Write a custom update processor and include it in your update chain. You will then have the ability to do anything you want with the entire input document before it hits the code to actually do the indexing. A script update processor is included with Solr allows you to write your processor in a language other than Java, such as javascript. https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html Here's how to discard a document in an update processor written in Java: http://stackoverflow.com/questions/27108200/how-to-cancel-indexing-of-a-solr-document-using-update-request-processor The javadoc that I linked above describes the ability to return "false" in other languages to discard the document. Thanks, Shawn
Re: Deduplication
Hi Bram, what do you mean with : " I would like it to provide the unique value myself, without having the deduplicator create a hash of field values " . This is not reduplication, but simple document filtering based on a constraint. In the case you want de-duplication ( which seemed from your very first part of the mail) here you can find a lot of info : https://cwiki.apache.org/confluence/display/solr/De-Duplication Let me know for more detailed requirements! 2015-05-19 10:02 GMT+01:00 Bram Van Dam : > Hi folks, > > I'm looking for a way to have Solr reject documents if a certain field > value is duplicated (reject, not overwrite). There doesn't seem to be > any kind of unique option in schema fields. > > The de-duplication feature seems to make this (somewhat) possible, but I > would like it to provide the unique value myself, without having the > deduplicator create a hash of field values. > > Am I missing an obvious (or less obvious) way of accomplishing this? > > Thanks, > > - Bram > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: Deduplication in SolrCloud
Should the old Signature code be removed? Given that the goal is to have everyone use SolrCloud, maybe this kind of landmine should be removed? On Fri, Jul 27, 2012 at 8:43 AM, Markus Jelsma wrote: > This issue doesn't really describe your problem but a more general problem of > distributed deduplication: > https://issues.apache.org/jira/browse/SOLR-3473 > > > -Original message- >> From:Daniel Brügge >> Sent: Fri 27-Jul-2012 17:38 >> To: solr-user@lucene.apache.org >> Subject: Deduplication in SolrCloud >> >> Hi, >> >> in my old Solr Setup I have used the deduplication feature in the update >> chain >> with couple of fields. >> >> >> >> true >> signature >> false >> uuid,type,url,content_hash >> > name="signatureClass">org.apache.solr.update.processor.Lookup3Signature >> >> >> >> >> >> This worked fine. When I now use this in my 2 shards SolrCloud setup when >> inserting 150.000 documents, >> I am always getting an error: >> >> *INFO: end_commit_flush* >> *Jul 27, 2012 3:29:36 PM org.apache.solr.common.SolrException log* >> *SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: >> unable to create new native thread* >> * at >> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456) >> * >> * at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284) >> * >> >> I am inserting the documents via CSV import and curl command and split them >> also into 50k chunks. >> >> Without the dedupe chain, the import finishes after 40secs. >> >> The curl command writes to one of my shards. >> >> >> Do you have an idea why this happens? Should I reduce the fields to one? I >> have read that not using the id as >> dedupe fields could be an issue? >> >> >> I have searched for deduplication with SolrCloud and I am wondering if it >> is already working correctly? see e.g. >> http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html >> >> Thanks & regards >> >> Daniel >> -- Lance Norskog goks...@gmail.com
RE: Deduplication in SolrCloud
This issue doesn't really describe your problem but a more general problem of distributed deduplication: https://issues.apache.org/jira/browse/SOLR-3473 -Original message- > From:Daniel Brügge > Sent: Fri 27-Jul-2012 17:38 > To: solr-user@lucene.apache.org > Subject: Deduplication in SolrCloud > > Hi, > > in my old Solr Setup I have used the deduplication feature in the update > chain > with couple of fields. > > > > true > signature > false > uuid,type,url,content_hash > name="signatureClass">org.apache.solr.update.processor.Lookup3Signature > > > > > > This worked fine. When I now use this in my 2 shards SolrCloud setup when > inserting 150.000 documents, > I am always getting an error: > > *INFO: end_commit_flush* > *Jul 27, 2012 3:29:36 PM org.apache.solr.common.SolrException log* > *SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: > unable to create new native thread* > * at > org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456) > * > * at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284) > * > > I am inserting the documents via CSV import and curl command and split them > also into 50k chunks. > > Without the dedupe chain, the import finishes after 40secs. > > The curl command writes to one of my shards. > > > Do you have an idea why this happens? Should I reduce the fields to one? I > have read that not using the id as > dedupe fields could be an issue? > > > I have searched for deduplication with SolrCloud and I am wondering if it > is already working correctly? see e.g. > http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html > > Thanks & regards > > Daniel >
Re: Deduplication questions
: Q1. Is is possible to pass *analyzed* content to the : : public abstract class Signature { No, analysis happens as the documents are being written to the lucene index, well after the UpdateProcessors have had a chance to interact with the values. : Q2. Method calculate() is using concatenated fields from name,features,cat : Is there any mechanism I could build "field dependant signatures"? At the moment the Signature API is fairly minimal, but it could definitley be improved by adding more methods (that have sensible defaults in the base class) that would give the impl more control over teh resulting signature ... we just beed people to propose good suggestions with example use cases. : Is idea to make two UpdadeProcessors and chain them OK? (Is ugly, but : would work) I don't know that what you describe is really intentional or not, but it should work -Hoss
Re: Deduplication
> TermsComponent maybe? > > or faceting? > q=*:*&facet=true&facet.field=signatureField&defType=lucene&rows=0&start=0 > > if you append &facet.mincount=1 to above url you can > see your duplications > After re-reading your message: sometimes you want to show duplicates, sometimes you don't want them. I have never used FieldCollapsing by myself but heard about it many times. http://wiki.apache.org/solr/FieldCollapsing
Re: Deduplication
> Basically for some uses cases I would like to show > duplicates for other I > wanted them ignored. > > If I have overwriteDupes=false and I just create the dedup > hash how can I > query for only unique hash values... ie something like a > SQL group by. TermsComponent maybe? or faceting? q=*:*&facet=true&facet.field=signatureField&defType=lucene&rows=0&start=0 if you append &facet.mincount=1 to above url you can see your duplications
Re: Deduplication in 1.4
Two sites that use field-collapsing: 1) www.ilocal.nl 2) www.welke.nl I'm not sure what you mean with double-tripping? The sites mentioned do not have performance problems that are caused by field collapsing. Field-collapsing currently only supports quasi distributed field-collapsing (as I have described on the Solr wiki). Currently I don't know a distributed field-collapsing algorithm that works properly and does not influence the search time in such a way that the search becomes slow. Martijn 2009/11/26 Otis Gospodnetic : > Hi Martijn, > > > - Original Message > >> From: Martijn v Groningen >> To: solr-user@lucene.apache.org >> Sent: Thu, November 26, 2009 3:19:40 AM >> Subject: Re: Deduplication in 1.4 >> >> Field collapsing has been used by many in their production >> environment. > > Got any pointers to public sites you know use it? I know of a high traffic > site that used an early version, and it caused performance problems. Is > double-tripping still required? > >> The last few months the stability of the patch grew as >> quiet some bugs were fixed. The only big feature missing currently is >> caching of the collapsing algorithm. I'm currently working on that and > > Is it also full distributed-search-ready? > >> I will put it in a new patch in the coming next days. So yes the >> patch is very near being production ready. > > Thanks, > Otis > >> Martijn >> >> 2009/11/26 KaktuChakarabati : >> > >> > Hey Otis, >> > Yep, I realized this myself after playing some with the dedupe feature >> > yesterday. >> > So it does look like Field collapsing is what I need pretty much. >> > Any idea on how close it is to being production-ready? >> > >> > Thanks, >> > -Chak >> > >> > Otis Gospodnetic wrote: >> >> >> >> Hi, >> >> >> >> As far as I know, the point of deduplication in Solr ( >> >> http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate >> >> document before indexing it in order to avoid duplicates in the index in >> >> the first place. >> >> >> >> What you are describing is closer to field collapsing patch in SOLR-236. >> >> >> >> Otis >> >> -- >> >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> >> >> >> >> >> >> >> - Original Message >> >>> From: KaktuChakarabati >> >>> To: solr-user@lucene.apache.org >> >>> Sent: Tue, November 24, 2009 5:29:00 PM >> >>> Subject: Deduplication in 1.4 >> >>> >> >>> >> >>> Hey, >> >>> I've been trying to find some documentation on using this feature in 1.4 >> >>> but >> >>> Wiki page is alittle sparse.. >> >>> In specific, here's what i'm trying to do: >> >>> >> >>> I have a field, say 'duplicate_group_id' that i'll populate based on some >> >>> offline documents deduplication process I have. >> >>> >> >>> All I want is for solr to compute a 'duplicate_signature' field based on >> >>> this one at update time, so that when i search for documents later, all >> >>> documents with same original 'duplicate_group_id' value will be rolled up >> >>> (e.g i'll just get the first one that came back according to relevancy). >> >>> >> >>> I enabled the deduplication processor and put it into updater, but i'm >> >>> not >> >>> seeing any difference in returned results (i.e results with same >> >>> duplicate_id are returned separately..) >> >>> >> >>> is there anything i need to supply in query-time for this to take effect? >> >>> what should be the behaviour? is there any working example of this? >> >>> >> >>> Anything will be helpful.. >> >>> >> >>> Thanks, >> >>> Chak >> >>> -- >> >>> View this message in context: >> >>> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html >> >>> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> >> >> > >> > -- >> > View this message in context: >> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html >> > Sent from the Solr - User mailing list archive at Nabble.com. >> > >> > > >
Re: Deduplication in 1.4
Hi Martijn, - Original Message > From: Martijn v Groningen > To: solr-user@lucene.apache.org > Sent: Thu, November 26, 2009 3:19:40 AM > Subject: Re: Deduplication in 1.4 > > Field collapsing has been used by many in their production > environment. Got any pointers to public sites you know use it? I know of a high traffic site that used an early version, and it caused performance problems. Is double-tripping still required? > The last few months the stability of the patch grew as > quiet some bugs were fixed. The only big feature missing currently is > caching of the collapsing algorithm. I'm currently working on that and Is it also full distributed-search-ready? > I will put it in a new patch in the coming next days. So yes the > patch is very near being production ready. Thanks, Otis > Martijn > > 2009/11/26 KaktuChakarabati : > > > > Hey Otis, > > Yep, I realized this myself after playing some with the dedupe feature > > yesterday. > > So it does look like Field collapsing is what I need pretty much. > > Any idea on how close it is to being production-ready? > > > > Thanks, > > -Chak > > > > Otis Gospodnetic wrote: > >> > >> Hi, > >> > >> As far as I know, the point of deduplication in Solr ( > >> http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate > >> document before indexing it in order to avoid duplicates in the index in > >> the first place. > >> > >> What you are describing is closer to field collapsing patch in SOLR-236. > >> > >> Otis > >> -- > >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls > >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > >> > >> > >> > >> - Original Message > >>> From: KaktuChakarabati > >>> To: solr-user@lucene.apache.org > >>> Sent: Tue, November 24, 2009 5:29:00 PM > >>> Subject: Deduplication in 1.4 > >>> > >>> > >>> Hey, > >>> I've been trying to find some documentation on using this feature in 1.4 > >>> but > >>> Wiki page is alittle sparse.. > >>> In specific, here's what i'm trying to do: > >>> > >>> I have a field, say 'duplicate_group_id' that i'll populate based on some > >>> offline documents deduplication process I have. > >>> > >>> All I want is for solr to compute a 'duplicate_signature' field based on > >>> this one at update time, so that when i search for documents later, all > >>> documents with same original 'duplicate_group_id' value will be rolled up > >>> (e.g i'll just get the first one that came back according to relevancy). > >>> > >>> I enabled the deduplication processor and put it into updater, but i'm > >>> not > >>> seeing any difference in returned results (i.e results with same > >>> duplicate_id are returned separately..) > >>> > >>> is there anything i need to supply in query-time for this to take effect? > >>> what should be the behaviour? is there any working example of this? > >>> > >>> Anything will be helpful.. > >>> > >>> Thanks, > >>> Chak > >>> -- > >>> View this message in context: > >>> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html > >>> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > >> > > > > -- > > View this message in context: > http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > >
Re: Deduplication in 1.4
Field collapsing has been used by many in their production environment. The last few months the stability of the patch grew as quiet some bugs were fixed. The only big feature missing currently is caching of the collapsing algorithm. I'm currently working on that and I will put it in a new patch in the coming next days. So yes the patch is very near being production ready. Martijn 2009/11/26 KaktuChakarabati : > > Hey Otis, > Yep, I realized this myself after playing some with the dedupe feature > yesterday. > So it does look like Field collapsing is what I need pretty much. > Any idea on how close it is to being production-ready? > > Thanks, > -Chak > > Otis Gospodnetic wrote: >> >> Hi, >> >> As far as I know, the point of deduplication in Solr ( >> http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate >> document before indexing it in order to avoid duplicates in the index in >> the first place. >> >> What you are describing is closer to field collapsing patch in SOLR-236. >> >> Otis >> -- >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> >> >> >> - Original Message >>> From: KaktuChakarabati >>> To: solr-user@lucene.apache.org >>> Sent: Tue, November 24, 2009 5:29:00 PM >>> Subject: Deduplication in 1.4 >>> >>> >>> Hey, >>> I've been trying to find some documentation on using this feature in 1.4 >>> but >>> Wiki page is alittle sparse.. >>> In specific, here's what i'm trying to do: >>> >>> I have a field, say 'duplicate_group_id' that i'll populate based on some >>> offline documents deduplication process I have. >>> >>> All I want is for solr to compute a 'duplicate_signature' field based on >>> this one at update time, so that when i search for documents later, all >>> documents with same original 'duplicate_group_id' value will be rolled up >>> (e.g i'll just get the first one that came back according to relevancy). >>> >>> I enabled the deduplication processor and put it into updater, but i'm >>> not >>> seeing any difference in returned results (i.e results with same >>> duplicate_id are returned separately..) >>> >>> is there anything i need to supply in query-time for this to take effect? >>> what should be the behaviour? is there any working example of this? >>> >>> Anything will be helpful.. >>> >>> Thanks, >>> Chak >>> -- >>> View this message in context: >>> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> > > -- > View this message in context: > http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Deduplication in 1.4
Hey Otis, Yep, I realized this myself after playing some with the dedupe feature yesterday. So it does look like Field collapsing is what I need pretty much. Any idea on how close it is to being production-ready? Thanks, -Chak Otis Gospodnetic wrote: > > Hi, > > As far as I know, the point of deduplication in Solr ( > http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate > document before indexing it in order to avoid duplicates in the index in > the first place. > > What you are describing is closer to field collapsing patch in SOLR-236. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: KaktuChakarabati >> To: solr-user@lucene.apache.org >> Sent: Tue, November 24, 2009 5:29:00 PM >> Subject: Deduplication in 1.4 >> >> >> Hey, >> I've been trying to find some documentation on using this feature in 1.4 >> but >> Wiki page is alittle sparse.. >> In specific, here's what i'm trying to do: >> >> I have a field, say 'duplicate_group_id' that i'll populate based on some >> offline documents deduplication process I have. >> >> All I want is for solr to compute a 'duplicate_signature' field based on >> this one at update time, so that when i search for documents later, all >> documents with same original 'duplicate_group_id' value will be rolled up >> (e.g i'll just get the first one that came back according to relevancy). >> >> I enabled the deduplication processor and put it into updater, but i'm >> not >> seeing any difference in returned results (i.e results with same >> duplicate_id are returned separately..) >> >> is there anything i need to supply in query-time for this to take effect? >> what should be the behaviour? is there any working example of this? >> >> Anything will be helpful.. >> >> Thanks, >> Chak >> -- >> View this message in context: >> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deduplication in 1.4
Hi, As far as I know, the point of deduplication in Solr ( http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate document before indexing it in order to avoid duplicates in the index in the first place. What you are describing is closer to field collapsing patch in SOLR-236. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: KaktuChakarabati > To: solr-user@lucene.apache.org > Sent: Tue, November 24, 2009 5:29:00 PM > Subject: Deduplication in 1.4 > > > Hey, > I've been trying to find some documentation on using this feature in 1.4 but > Wiki page is alittle sparse.. > In specific, here's what i'm trying to do: > > I have a field, say 'duplicate_group_id' that i'll populate based on some > offline documents deduplication process I have. > > All I want is for solr to compute a 'duplicate_signature' field based on > this one at update time, so that when i search for documents later, all > documents with same original 'duplicate_group_id' value will be rolled up > (e.g i'll just get the first one that came back according to relevancy). > > I enabled the deduplication processor and put it into updater, but i'm not > seeing any difference in returned results (i.e results with same > duplicate_id are returned separately..) > > is there anything i need to supply in query-time for this to take effect? > what should be the behaviour? is there any working example of this? > > Anything will be helpful.. > > Thanks, > Chak > -- > View this message in context: > http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deduplication patch not working in nightly build
I've seen similar errors when large background merges happen while looping in a result set. See http://lucene.grantingersoll.com/2008/07/16/mysql-solr-and-communications-link-failure/ On Jan 9, 2009, at 12:50 PM, Mark Miller wrote: Your basically writing segments more often now, and somehow avoiding a longer merge I think. Also, likely, deduplication is probably adding enough extra data to your index to hit a sweet spot where a merge is too long. Or something to that effect - MySql is especially sensitive to timeouts when doing a select * on a huge db in my testing. I didnt understand your answer on the autocommit - I take it you are using it? Or no? All a guess, but it def points to a merge taking a bit long and causing a timeout. I think you can relax the MySql timeout settings if that is it. I'd like to get to the bottom of this as well, so any other info you can provide would be great. - Mark Marc Sturlese wrote: Hey Shalin, In the begining (when the error was appearing) i had 32 and no maxBufferedDocs set Now I have: 32 50 I think taht setting maxBufferedDocs to 50 I am forcing more disk writting than I would like... but at least it works fine (but a bit slower,opiously). I keep saying that the most weird thing is that I don't have that problem using solr1.3, just with the nightly... Even that it's good that it works well now, would be great if someone can give me an explanation why this is happening Shalin Shekhar Mangar wrote: On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese wrote: hey there, I hadn't autoCommit set to true but I have it sorted! The error stopped appearing after setting the property maxBufferedDocs in solrconfig.xml. I can't exactly undersand why but it just worked. Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same? What I find strange is this line in the exception: "Last packet sent to the server was 202481 ms ago." Something took very very long to complete and the connection got closed by the time the next row was fetched from the opened resultset. Just curious, what was the previous value of maxBufferedDocs and what did you change it to? -- View this message in context: http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Deduplication patch not working in nightly build
Hey Mark, Sorry I was not enough especific, I wanted to mean that I have and I always had autoCommit=false. I will do some more traces and test. Will post if I have any new important thing to mention. Thanks. Marc Sturlese wrote: > > Hey Shalin, > > In the begining (when the error was appearing) i had > 32 > and no maxBufferedDocs set > > Now I have: > 32 > 50 > > I think taht setting maxBufferedDocs to 50 I am forcing more disk writting > than I would like... but at least it works fine (but a bit > slower,opiously). > > I keep saying that the most weird thing is that I don't have that problem > using solr1.3, just with the nightly... > > Even that it's good that it works well now, would be great if someone can > give me an explanation why this is happening > > > > Shalin Shekhar Mangar wrote: >> >> On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese >> wrote: >> >>> >>> hey there, >>> I hadn't autoCommit set to true but I have it sorted! The error >>> stopped >>> appearing after setting the property maxBufferedDocs in solrconfig.xml. >>> I >>> can't exactly undersand why but it just worked. >>> Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the >>> same? >>> >>> >> What I find strange is this line in the exception: >> "Last packet sent to the server was 202481 ms ago." >> >> Something took very very long to complete and the connection got closed >> by >> the time the next row was fetched from the opened resultset. >> >> Just curious, what was the previous value of maxBufferedDocs and what did >> you change it to? >> >> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> >> > > -- View this message in context: http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21378069.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deduplication patch not working in nightly build
Your basically writing segments more often now, and somehow avoiding a longer merge I think. Also, likely, deduplication is probably adding enough extra data to your index to hit a sweet spot where a merge is too long. Or something to that effect - MySql is especially sensitive to timeouts when doing a select * on a huge db in my testing. I didnt understand your answer on the autocommit - I take it you are using it? Or no? All a guess, but it def points to a merge taking a bit long and causing a timeout. I think you can relax the MySql timeout settings if that is it. I'd like to get to the bottom of this as well, so any other info you can provide would be great. - Mark Marc Sturlese wrote: Hey Shalin, In the begining (when the error was appearing) i had 32 and no maxBufferedDocs set Now I have: 32 50 I think taht setting maxBufferedDocs to 50 I am forcing more disk writting than I would like... but at least it works fine (but a bit slower,opiously). I keep saying that the most weird thing is that I don't have that problem using solr1.3, just with the nightly... Even that it's good that it works well now, would be great if someone can give me an explanation why this is happening Shalin Shekhar Mangar wrote: On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese wrote: hey there, I hadn't autoCommit set to true but I have it sorted! The error stopped appearing after setting the property maxBufferedDocs in solrconfig.xml. I can't exactly undersand why but it just worked. Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same? What I find strange is this line in the exception: "Last packet sent to the server was 202481 ms ago." Something took very very long to complete and the connection got closed by the time the next row was fetched from the opened resultset. Just curious, what was the previous value of maxBufferedDocs and what did you change it to? -- View this message in context: http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Deduplication patch not working in nightly build
Hey Shalin, In the begining (when the error was appearing) i had 32 and no maxBufferedDocs set Now I have: 32 50 I think taht setting maxBufferedDocs to 50 I am forcing more disk writting than I would like... but at least it works fine (but a bit slower,opiously). I keep saying that the most weird thing is that I don't have that problem using solr1.3, just with the nightly... Even that it's good that it works well now, would be great if someone can give me an explanation why this is happening Shalin Shekhar Mangar wrote: > > On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese > wrote: > >> >> hey there, >> I hadn't autoCommit set to true but I have it sorted! The error >> stopped >> appearing after setting the property maxBufferedDocs in solrconfig.xml. I >> can't exactly undersand why but it just worked. >> Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same? >> >> > What I find strange is this line in the exception: > "Last packet sent to the server was 202481 ms ago." > > Something took very very long to complete and the connection got closed by > the time the next row was fetched from the opened resultset. > > Just curious, what was the previous value of maxBufferedDocs and what did > you change it to? > > >> >> -- >> View this message in context: >> http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21376235.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deduplication patch not working in nightly build
On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese wrote: > > hey there, > I hadn't autoCommit set to true but I have it sorted! The error stopped > appearing after setting the property maxBufferedDocs in solrconfig.xml. I > can't exactly undersand why but it just worked. > Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same? > > What I find strange is this line in the exception: "Last packet sent to the server was 202481 ms ago." Something took very very long to complete and the connection got closed by the time the next row was fetched from the opened resultset. Just curious, what was the previous value of maxBufferedDocs and what did you change it to? > > -- > View this message in context: > http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.
Re: Deduplication patch not working in nightly build
hey there, I hadn't autoCommit set to true but I have it sorted! The error stopped appearing after setting the property maxBufferedDocs in solrconfig.xml. I can't exactly undersand why but it just worked. Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same? Thanks Marc Sturlese wrote: > > Hey there, > I was using the Deduplication patch with Solr 1.3 release and everything > was working perfectly. Now I upgraded to a nigthly build (20th december) > to be able to use new facet algorithm and other stuff and DeDuplication is > not working any more. I have followed exactly the same steps to apply the > patch to the source code. I am geting this error: > > WARNING: Error reading data > com.mysql.jdbc.CommunicationsException: Communications link failure due to > underlying exception: > > ** BEGIN NESTED EXCEPTION ** > > java.io.EOFException > > STACKTRACE: > > java.io.EOFException > at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) > at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) > at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) > at > com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) > at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) > at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) > at > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) > > > ** END NESTED EXCEPTION ** > Last packet sent to the server was 202481 ms ago. > at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) > at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) > at > com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) > at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) > at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) > at > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) > Jan 5, 2009 10:06:16 AM org.apache.solr.handler.dataimport.JdbcDataSource > logError > WARNING: Exception while closing result set > com.mysql.jdbc.CommunicationsException: Communications link failure due to > underlying exception: > > ** BEGIN NESTED EXCEPTION ** > > java.io.EOFException > > STACKTRACE: > > java.io.EOFException > at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) > at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2351) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) > at com.mysql.jdbc.MysqlIO.nextRow(My
Re: Deduplication patch not working in nightly build
I can't imagine why dedupe would have anything to do with this, other than what was said, it perhaps is taking a bit longer to get a document to the db, and it times out (maybe a long signature calculation?). Have you tried changing your MySql settings to allow for a longer timeout? (sorry, I'm not to up to date on what you have tried). Also, are you using autocommit during the import? If so, you might try turning it off for the full import. - Mark Marc Sturlese wrote: Hey there, I am stack in this problem sine 3 days ago and no idea how to sort it. I am using the nighlty from a week ago, mysql and this driver and url: driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/my_db" I can use deduplication patch with indexs of 200.000 docs and no problem. When I try a full-import with a db of 1.500.000 it stops indexing at doc number 15.000 aprox showing me the error posted above. Once I get the exception, i restart tomcat and start a delta-import... this time everything works fine! I need to avoid this error in the full import, i have tryed: url="jdbc:mysql://localhost/my_db?autoReconnect=true to sort it in case the connection was closed due to long time until next doc was indexed, but nothing changed... I keep having this: Jan 9, 2009 1:38:18 PM org.apache.solr.handler.dataimport.JdbcDataSource logError WARNING: Error reading data com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception: ** BEGIN NESTED EXCEPTION ** java.io.EOFException STACKTRACE: java.io.EOFException at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:279) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:167) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:205) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:387) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:209) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:160) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:368) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:437) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:428) ** END NESTED EXCEPTION ** Last packet sent to the server was 206097 ms ago. at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:279) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:167) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:205) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:387) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:209) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:160) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:368) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:437) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:428) Jan 9, 2009 1:38:18 PM org.apache.solr.handler.dataimport.JdbcDataSource logError WARNING: Exception while closing result set com.mysql.jdbc.CommunicationsExcepti
Re: Deduplication patch not working in nightly build
Hey there, I am stack in this problem sine 3 days ago and no idea how to sort it. I am using the nighlty from a week ago, mysql and this driver and url: driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/my_db" I can use deduplication patch with indexs of 200.000 docs and no problem. When I try a full-import with a db of 1.500.000 it stops indexing at doc number 15.000 aprox showing me the error posted above. Once I get the exception, i restart tomcat and start a delta-import... this time everything works fine! I need to avoid this error in the full import, i have tryed: url="jdbc:mysql://localhost/my_db?autoReconnect=true to sort it in case the connection was closed due to long time until next doc was indexed, but nothing changed... I keep having this: Jan 9, 2009 1:38:18 PM org.apache.solr.handler.dataimport.JdbcDataSource logError WARNING: Error reading data com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception: ** BEGIN NESTED EXCEPTION ** java.io.EOFException STACKTRACE: java.io.EOFException at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:279) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:167) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:205) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:387) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:209) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:160) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:368) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:437) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:428) ** END NESTED EXCEPTION ** Last packet sent to the server was 206097 ms ago. at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:279) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:167) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:205) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:387) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:209) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:160) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:368) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:437) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:428) Jan 9, 2009 1:38:18 PM org.apache.solr.handler.dataimport.JdbcDataSource logError WARNING: Exception while closing result set com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception: ** BEGIN NESTED EXCEPTION ** java.io.EOFException STACKTRACE: java.io.EOFException at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2351) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)
Re: Deduplication patch not working in nightly build
Thanks I will have a look to my JdbcDataSource. Anyway it's weird because using the 1.3 release I don't have that problem... Shalin Shekhar Mangar wrote: > > Yes, initially I figured that we are accidentally re-using a closed data > source. But Noble has pinned it right. I guess you can try looking into > your > JDBC driver's documentation for a setting which increases the connection > alive-ness. > > On Mon, Jan 5, 2009 at 5:29 PM, Noble Paul നോബിള് नोब्ळ् < > noble.p...@gmail.com> wrote: > >> I guess the indexing of a doc is taking too long (may be because of >> the de-dup patch) and the resultset gets closed automaticallly (timed >> out) >> --Noble >> >> On Mon, Jan 5, 2009 at 5:14 PM, Marc Sturlese >> wrote: >> > >> > Donig this fix I get the same error :( >> > >> > I am going to try to set up the last nigthly build... let's see if I >> have >> > better luck. >> > >> > The thing is it stop indexing at the doc num 150.000 aprox... and give >> me >> > that mysql exception error... Without DeDuplication patch I can index 2 >> > milion docs without problems... >> > >> > I am pretty lost with this... :( >> > >> > >> > Shalin Shekhar Mangar wrote: >> >> >> >> Yes I meant the 05/01/2008 build. The fix is a one line change >> >> >> >> Add the following as the last line of DataConfig.Entity.clearCache() >> >> dataSrc = null; >> >> >> >> >> >> >> >> On Mon, Jan 5, 2009 at 4:22 PM, Marc Sturlese >> >> wrote: >> >> >> >>> >> >>> Shalin you mean I should test the 05/01/2008 nighlty? maybe with this >> one >> >>> works? If the fix you did is not really big can u tell me where in >> the >> >>> source is and what is it for? (I have been debuging and tracing a lot >> the >> >>> dataimporthandler source and I I would like to know what the >> imporovement >> >>> is >> >>> about if it is not a problem...) >> >>> >> >>> Thanks! >> >>> >> >>> >> >>> Shalin Shekhar Mangar wrote: >> >>> > >> >>> > Marc, I've just committed a fix which may have caused the bug. Can >> you >> >>> use >> >>> > svn trunk (or the next nightly build) and confirm? >> >>> > >> >>> > On Mon, Jan 5, 2009 at 3:10 PM, Noble Paul നോബിള് नोब्ळ् < >> >>> > noble.p...@gmail.com> wrote: >> >>> > >> >>> >> looks like a bug w/ DIH with the recent fixes. >> >>> >> --Noble >> >>> >> >> >>> >> On Mon, Jan 5, 2009 at 2:36 PM, Marc Sturlese >> >>> >> >>> >> wrote: >> >>> >> > >> >>> >> > Hey there, >> >>> >> > I was using the Deduplication patch with Solr 1.3 release and >> >>> >> everything >> >>> >> was >> >>> >> > working perfectly. Now I upgraded to a nigthly build (20th >> december) >> >>> to >> >>> >> be >> >>> >> > able to use new facet algorithm and other stuff and >> DeDuplication >> is >> >>> >> not >> >>> >> > working any more. I have followed exactly the same steps to >> apply >> >>> the >> >>> >> patch >> >>> >> > to the source code. I am geting this error: >> >>> >> > >> >>> >> > WARNING: Error reading data >> >>> >> > com.mysql.jdbc.CommunicationsException: Communications link >> failure >> >>> due >> >>> >> to >> >>> >> > underlying exception: >> >>> >> > >> >>> >> > ** BEGIN NESTED EXCEPTION ** >> >>> >> > >> >>> >> > java.io.EOFException >> >>> >> > >> >>> >> > STACKTRACE: >> >>> >> > >> >>> >> > java.io.EOFException >> >>> >> >at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) >> >>> >> >at >> >>> com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) >> >>> >> >at >> com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >> >>> >> >at >> com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >> >>> >> >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >> >>> >> >at >> >>> >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >> >>> >> >at >> >>> com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >> >>> >> >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) >> >>> >> >at >> >>> >> > >> >>> >> >> >>> >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) >> >>> >> >at >> >>> >> > >> >>> >> >> >>> >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) >> >>> >> >at >> >>> >> > >> >>> >> >> >>> >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) >> >>> >> >at >> >>> >> > >> >>> >> >> >>> >> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) >> >>> >> >at >> >>> >> > >> >>> >> >> >>> >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) >> >>> >> >at >> >>> >> > >> >>> >> >> >>> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) >> >>> >> >at >> >>> >> > >> >>> >> >> >>> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) >> >>> >> >at >> >>> >> > >> >>> >> >> >>> >> org.apache.s
Re: Deduplication patch not working in nightly build
Yes, initially I figured that we are accidentally re-using a closed data source. But Noble has pinned it right. I guess you can try looking into your JDBC driver's documentation for a setting which increases the connection alive-ness. On Mon, Jan 5, 2009 at 5:29 PM, Noble Paul നോബിള് नोब्ळ् < noble.p...@gmail.com> wrote: > I guess the indexing of a doc is taking too long (may be because of > the de-dup patch) and the resultset gets closed automaticallly (timed > out) > --Noble > > On Mon, Jan 5, 2009 at 5:14 PM, Marc Sturlese > wrote: > > > > Donig this fix I get the same error :( > > > > I am going to try to set up the last nigthly build... let's see if I have > > better luck. > > > > The thing is it stop indexing at the doc num 150.000 aprox... and give me > > that mysql exception error... Without DeDuplication patch I can index 2 > > milion docs without problems... > > > > I am pretty lost with this... :( > > > > > > Shalin Shekhar Mangar wrote: > >> > >> Yes I meant the 05/01/2008 build. The fix is a one line change > >> > >> Add the following as the last line of DataConfig.Entity.clearCache() > >> dataSrc = null; > >> > >> > >> > >> On Mon, Jan 5, 2009 at 4:22 PM, Marc Sturlese > >> wrote: > >> > >>> > >>> Shalin you mean I should test the 05/01/2008 nighlty? maybe with this > one > >>> works? If the fix you did is not really big can u tell me where in the > >>> source is and what is it for? (I have been debuging and tracing a lot > the > >>> dataimporthandler source and I I would like to know what the > imporovement > >>> is > >>> about if it is not a problem...) > >>> > >>> Thanks! > >>> > >>> > >>> Shalin Shekhar Mangar wrote: > >>> > > >>> > Marc, I've just committed a fix which may have caused the bug. Can > you > >>> use > >>> > svn trunk (or the next nightly build) and confirm? > >>> > > >>> > On Mon, Jan 5, 2009 at 3:10 PM, Noble Paul നോബിള് नोब्ळ् < > >>> > noble.p...@gmail.com> wrote: > >>> > > >>> >> looks like a bug w/ DIH with the recent fixes. > >>> >> --Noble > >>> >> > >>> >> On Mon, Jan 5, 2009 at 2:36 PM, Marc Sturlese > >>> > >>> >> wrote: > >>> >> > > >>> >> > Hey there, > >>> >> > I was using the Deduplication patch with Solr 1.3 release and > >>> >> everything > >>> >> was > >>> >> > working perfectly. Now I upgraded to a nigthly build (20th > december) > >>> to > >>> >> be > >>> >> > able to use new facet algorithm and other stuff and DeDuplication > is > >>> >> not > >>> >> > working any more. I have followed exactly the same steps to apply > >>> the > >>> >> patch > >>> >> > to the source code. I am geting this error: > >>> >> > > >>> >> > WARNING: Error reading data > >>> >> > com.mysql.jdbc.CommunicationsException: Communications link > failure > >>> due > >>> >> to > >>> >> > underlying exception: > >>> >> > > >>> >> > ** BEGIN NESTED EXCEPTION ** > >>> >> > > >>> >> > java.io.EOFException > >>> >> > > >>> >> > STACKTRACE: > >>> >> > > >>> >> > java.io.EOFException > >>> >> >at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) > >>> >> >at > >>> com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) > >>> >> >at > com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) > >>> >> >at > com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) > >>> >> >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) > >>> >> >at > >>> >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) > >>> >> >at > >>> com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) > >>> >> >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) > >>> >> >at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) > >>> >> >at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) > >>> >> >at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) > >>> >> >at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) > >>> >> >at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) > >>> >> >at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) > >>> >> >at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) > >>> >> >at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) > >>> >> >at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) > >>> >> >at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) > >>> >> >
Re: Deduplication patch not working in nightly build
I guess the indexing of a doc is taking too long (may be because of the de-dup patch) and the resultset gets closed automaticallly (timed out) --Noble On Mon, Jan 5, 2009 at 5:14 PM, Marc Sturlese wrote: > > Donig this fix I get the same error :( > > I am going to try to set up the last nigthly build... let's see if I have > better luck. > > The thing is it stop indexing at the doc num 150.000 aprox... and give me > that mysql exception error... Without DeDuplication patch I can index 2 > milion docs without problems... > > I am pretty lost with this... :( > > > Shalin Shekhar Mangar wrote: >> >> Yes I meant the 05/01/2008 build. The fix is a one line change >> >> Add the following as the last line of DataConfig.Entity.clearCache() >> dataSrc = null; >> >> >> >> On Mon, Jan 5, 2009 at 4:22 PM, Marc Sturlese >> wrote: >> >>> >>> Shalin you mean I should test the 05/01/2008 nighlty? maybe with this one >>> works? If the fix you did is not really big can u tell me where in the >>> source is and what is it for? (I have been debuging and tracing a lot the >>> dataimporthandler source and I I would like to know what the imporovement >>> is >>> about if it is not a problem...) >>> >>> Thanks! >>> >>> >>> Shalin Shekhar Mangar wrote: >>> > >>> > Marc, I've just committed a fix which may have caused the bug. Can you >>> use >>> > svn trunk (or the next nightly build) and confirm? >>> > >>> > On Mon, Jan 5, 2009 at 3:10 PM, Noble Paul നോബിള് नोब्ळ् < >>> > noble.p...@gmail.com> wrote: >>> > >>> >> looks like a bug w/ DIH with the recent fixes. >>> >> --Noble >>> >> >>> >> On Mon, Jan 5, 2009 at 2:36 PM, Marc Sturlese >>> >>> >> wrote: >>> >> > >>> >> > Hey there, >>> >> > I was using the Deduplication patch with Solr 1.3 release and >>> >> everything >>> >> was >>> >> > working perfectly. Now I upgraded to a nigthly build (20th december) >>> to >>> >> be >>> >> > able to use new facet algorithm and other stuff and DeDuplication is >>> >> not >>> >> > working any more. I have followed exactly the same steps to apply >>> the >>> >> patch >>> >> > to the source code. I am geting this error: >>> >> > >>> >> > WARNING: Error reading data >>> >> > com.mysql.jdbc.CommunicationsException: Communications link failure >>> due >>> >> to >>> >> > underlying exception: >>> >> > >>> >> > ** BEGIN NESTED EXCEPTION ** >>> >> > >>> >> > java.io.EOFException >>> >> > >>> >> > STACKTRACE: >>> >> > >>> >> > java.io.EOFException >>> >> >at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) >>> >> >at >>> com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) >>> >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >>> >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >>> >> >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >>> >> >at >>> >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >>> >> >at >>> com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >>> >> >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) >>> >> >at >>> >> > >>> >> >>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) >>> >> >at >>> >> > >>> >> >>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) >>> >> >at >>> >> > >>> >> >>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) >>> >> >at >>> >> > >>> >> >>> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) >>> >> >at >>> >> > >>> >> >>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) >>> >> >at >>> >> > >>> >> >>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) >>> >> >at >>> >> > >>> >> >>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) >>> >> >at >>> >> > >>> >> >>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) >>> >> >at >>> >> > >>> >> >>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>> >> >at >>> >> > >>> >> >>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) >>> >> >at >>> >> > >>> >> >>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) >>> >> > >>> >> > >>> >> > ** END NESTED EXCEPTION ** >>> >> > Last packet sent to the server was 202481 ms ago. >>> >> >at >>> com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) >>> >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >>> >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >>> >> >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >>> >> >at >>> >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >>> >> >at >>>
Re: Deduplication patch not working in nightly build
Donig this fix I get the same error :( I am going to try to set up the last nigthly build... let's see if I have better luck. The thing is it stop indexing at the doc num 150.000 aprox... and give me that mysql exception error... Without DeDuplication patch I can index 2 milion docs without problems... I am pretty lost with this... :( Shalin Shekhar Mangar wrote: > > Yes I meant the 05/01/2008 build. The fix is a one line change > > Add the following as the last line of DataConfig.Entity.clearCache() > dataSrc = null; > > > > On Mon, Jan 5, 2009 at 4:22 PM, Marc Sturlese > wrote: > >> >> Shalin you mean I should test the 05/01/2008 nighlty? maybe with this one >> works? If the fix you did is not really big can u tell me where in the >> source is and what is it for? (I have been debuging and tracing a lot the >> dataimporthandler source and I I would like to know what the imporovement >> is >> about if it is not a problem...) >> >> Thanks! >> >> >> Shalin Shekhar Mangar wrote: >> > >> > Marc, I've just committed a fix which may have caused the bug. Can you >> use >> > svn trunk (or the next nightly build) and confirm? >> > >> > On Mon, Jan 5, 2009 at 3:10 PM, Noble Paul നോബിള് नोब्ळ् < >> > noble.p...@gmail.com> wrote: >> > >> >> looks like a bug w/ DIH with the recent fixes. >> >> --Noble >> >> >> >> On Mon, Jan 5, 2009 at 2:36 PM, Marc Sturlese >> >> >> wrote: >> >> > >> >> > Hey there, >> >> > I was using the Deduplication patch with Solr 1.3 release and >> >> everything >> >> was >> >> > working perfectly. Now I upgraded to a nigthly build (20th december) >> to >> >> be >> >> > able to use new facet algorithm and other stuff and DeDuplication is >> >> not >> >> > working any more. I have followed exactly the same steps to apply >> the >> >> patch >> >> > to the source code. I am geting this error: >> >> > >> >> > WARNING: Error reading data >> >> > com.mysql.jdbc.CommunicationsException: Communications link failure >> due >> >> to >> >> > underlying exception: >> >> > >> >> > ** BEGIN NESTED EXCEPTION ** >> >> > >> >> > java.io.EOFException >> >> > >> >> > STACKTRACE: >> >> > >> >> > java.io.EOFException >> >> >at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) >> >> >at >> com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) >> >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >> >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >> >> >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >> >> >at >> >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >> >> >at >> com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >> >> >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) >> >> > >> >> > >> >> > ** END NESTED EXCEPTION ** >> >> > Last packet sent to the server was 202481 ms ago. >> >> >at >> com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) >> >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >> >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >> >> >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >> >> >at >> >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >> >> >at >> com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >> >> >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) >> >> >at >> >> > >> >> >> org.apache.solr.handler.dataimport.JdbcDataSource$Resul
Re: Deduplication patch not working in nightly build
Yes I meant the 05/01/2008 build. The fix is a one line change Add the following as the last line of DataConfig.Entity.clearCache() dataSrc = null; On Mon, Jan 5, 2009 at 4:22 PM, Marc Sturlese wrote: > > Shalin you mean I should test the 05/01/2008 nighlty? maybe with this one > works? If the fix you did is not really big can u tell me where in the > source is and what is it for? (I have been debuging and tracing a lot the > dataimporthandler source and I I would like to know what the imporovement > is > about if it is not a problem...) > > Thanks! > > > Shalin Shekhar Mangar wrote: > > > > Marc, I've just committed a fix which may have caused the bug. Can you > use > > svn trunk (or the next nightly build) and confirm? > > > > On Mon, Jan 5, 2009 at 3:10 PM, Noble Paul നോബിള് नोब्ळ् < > > noble.p...@gmail.com> wrote: > > > >> looks like a bug w/ DIH with the recent fixes. > >> --Noble > >> > >> On Mon, Jan 5, 2009 at 2:36 PM, Marc Sturlese > >> wrote: > >> > > >> > Hey there, > >> > I was using the Deduplication patch with Solr 1.3 release and > >> everything > >> was > >> > working perfectly. Now I upgraded to a nigthly build (20th december) > to > >> be > >> > able to use new facet algorithm and other stuff and DeDuplication is > >> not > >> > working any more. I have followed exactly the same steps to apply the > >> patch > >> > to the source code. I am geting this error: > >> > > >> > WARNING: Error reading data > >> > com.mysql.jdbc.CommunicationsException: Communications link failure > due > >> to > >> > underlying exception: > >> > > >> > ** BEGIN NESTED EXCEPTION ** > >> > > >> > java.io.EOFException > >> > > >> > STACKTRACE: > >> > > >> > java.io.EOFException > >> >at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) > >> >at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) > >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) > >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) > >> >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) > >> >at > >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) > >> >at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) > >> >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) > >> > > >> > > >> > ** END NESTED EXCEPTION ** > >> > Last packet sent to the server was 202481 ms ago. > >> >at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) > >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) > >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) > >> >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) > >> >at > >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) > >> >at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) > >> >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) > >> >at > >> > > >> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351
Re: Deduplication patch not working in nightly build
Shalin you mean I should test the 05/01/2008 nighlty? maybe with this one works? If the fix you did is not really big can u tell me where in the source is and what is it for? (I have been debuging and tracing a lot the dataimporthandler source and I I would like to know what the imporovement is about if it is not a problem...) Thanks! Shalin Shekhar Mangar wrote: > > Marc, I've just committed a fix which may have caused the bug. Can you use > svn trunk (or the next nightly build) and confirm? > > On Mon, Jan 5, 2009 at 3:10 PM, Noble Paul നോബിള് नोब्ळ् < > noble.p...@gmail.com> wrote: > >> looks like a bug w/ DIH with the recent fixes. >> --Noble >> >> On Mon, Jan 5, 2009 at 2:36 PM, Marc Sturlese >> wrote: >> > >> > Hey there, >> > I was using the Deduplication patch with Solr 1.3 release and >> everything >> was >> > working perfectly. Now I upgraded to a nigthly build (20th december) to >> be >> > able to use new facet algorithm and other stuff and DeDuplication is >> not >> > working any more. I have followed exactly the same steps to apply the >> patch >> > to the source code. I am geting this error: >> > >> > WARNING: Error reading data >> > com.mysql.jdbc.CommunicationsException: Communications link failure due >> to >> > underlying exception: >> > >> > ** BEGIN NESTED EXCEPTION ** >> > >> > java.io.EOFException >> > >> > STACKTRACE: >> > >> > java.io.EOFException >> >at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) >> >at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >> >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >> >at >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >> >at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >> >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) >> >at >> > >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) >> >at >> > >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) >> >at >> > >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) >> >at >> > >> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) >> >at >> > >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) >> >at >> > >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) >> >at >> > >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) >> >at >> > >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) >> >at >> > >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >> >at >> > >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) >> >at >> > >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) >> > >> > >> > ** END NESTED EXCEPTION ** >> > Last packet sent to the server was 202481 ms ago. >> >at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >> >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >> >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >> >at >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >> >at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >> >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) >> >at >> > >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) >> >at >> > >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) >> >at >> > >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) >> >at >> > >> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) >> >at >> > >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) >> >at >> > >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) >> >at >> > >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) >> >at >> > >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) >> >at >> > >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >> >at >> > >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) >> >at >> > >> org.apache.solr.handler.dataimport.DataImporter$1
Re: Deduplication patch not working in nightly build
Yeah looks like but... if I don't use the DeDuplication patch everything works perfect. I can create my indexed using full import and delta import without problems. The JdbcDataSource of the nightly is pretty similar to the 1.3 release... The DeDuplication patch doesn't touch the dataimporthandler classes... it's coz I thought the problem was not there (but can't say it for sure...) I was thinking that the problem has something to do with the UpdateRequestProcessorChain but don't know how this part of the source works... I am really interested in updating to the nightly build as I think new facet algorithm and SolrDeletionPolicy are really great stuff! >>Marc, I've just committed a fix which may have caused the bug. Can you use >>svn trunk (or the next nightly build) and confirm? You meann the last nightly build? Thanks Noble Paul നോബിള് नोब्ळ् wrote: > > looks like a bug w/ DIH with the recent fixes. > --Noble > > On Mon, Jan 5, 2009 at 2:36 PM, Marc Sturlese > wrote: >> >> Hey there, >> I was using the Deduplication patch with Solr 1.3 release and everything >> was >> working perfectly. Now I upgraded to a nigthly build (20th december) to >> be >> able to use new facet algorithm and other stuff and DeDuplication is not >> working any more. I have followed exactly the same steps to apply the >> patch >> to the source code. I am geting this error: >> >> WARNING: Error reading data >> com.mysql.jdbc.CommunicationsException: Communications link failure due >> to >> underlying exception: >> >> ** BEGIN NESTED EXCEPTION ** >> >> java.io.EOFException >> >> STACKTRACE: >> >> java.io.EOFException >>at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) >>at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) >>at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >>at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >>at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >>at >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >>at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >>at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) >>at >> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) >>at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) >>at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) >>at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) >>at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) >>at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) >>at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) >> >> >> ** END NESTED EXCEPTION ** >> Last packet sent to the server was 202481 ms ago. >>at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) >>at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >>at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >>at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >>at >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >>at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >>at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) >>at >> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) >>at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) >>at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) >>at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) >>at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) >>at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) >>at >> org.apache.s
Re: Deduplication patch not working in nightly build
Yeah looks like but... if I don't use the DeDuplication patch everything works perfect. I can create my indexed using full import and delta import without problems. The JdbcDataSource of the nightly is pretty similar to the 1.3 release... The DeDuplication patch doesn't touch the dataimporthandler classes... it's coz I thought the problem was not there (but can't say it for sure...) I was thinking that the problem has something to do with the UpdateRequestProcessorChain but don't know how this part of the source works... Any advice how could I sort it? I am really interested in updating to the nightly build as I think new facet algorithm and SolrDeletionPolicy are really great stuff! Thanks Noble Paul നോബിള് नोब्ळ् wrote: > > looks like a bug w/ DIH with the recent fixes. > --Noble > > On Mon, Jan 5, 2009 at 2:36 PM, Marc Sturlese > wrote: >> >> Hey there, >> I was using the Deduplication patch with Solr 1.3 release and everything >> was >> working perfectly. Now I upgraded to a nigthly build (20th december) to >> be >> able to use new facet algorithm and other stuff and DeDuplication is not >> working any more. I have followed exactly the same steps to apply the >> patch >> to the source code. I am geting this error: >> >> WARNING: Error reading data >> com.mysql.jdbc.CommunicationsException: Communications link failure due >> to >> underlying exception: >> >> ** BEGIN NESTED EXCEPTION ** >> >> java.io.EOFException >> >> STACKTRACE: >> >> java.io.EOFException >>at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) >>at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) >>at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >>at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >>at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >>at >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >>at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >>at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) >>at >> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) >>at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) >>at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) >>at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) >>at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) >>at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) >>at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) >> >> >> ** END NESTED EXCEPTION ** >> Last packet sent to the server was 202481 ms ago. >>at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) >>at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >>at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >>at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >>at >> com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >>at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >>at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) >>at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) >>at >> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) >>at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) >>at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) >>at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) >>at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) >>at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) >>at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) >> Jan 5, 2009 10:06:16 AM org.apache.solr.handler.dataimport.JdbcData
Re: Deduplication patch not working in nightly build
Marc, I've just committed a fix which may have caused the bug. Can you use svn trunk (or the next nightly build) and confirm? On Mon, Jan 5, 2009 at 3:10 PM, Noble Paul നോബിള് नोब्ळ् < noble.p...@gmail.com> wrote: > looks like a bug w/ DIH with the recent fixes. > --Noble > > On Mon, Jan 5, 2009 at 2:36 PM, Marc Sturlese > wrote: > > > > Hey there, > > I was using the Deduplication patch with Solr 1.3 release and everything > was > > working perfectly. Now I upgraded to a nigthly build (20th december) to > be > > able to use new facet algorithm and other stuff and DeDuplication is not > > working any more. I have followed exactly the same steps to apply the > patch > > to the source code. I am geting this error: > > > > WARNING: Error reading data > > com.mysql.jdbc.CommunicationsException: Communications link failure due > to > > underlying exception: > > > > ** BEGIN NESTED EXCEPTION ** > > > > java.io.EOFException > > > > STACKTRACE: > > > > java.io.EOFException > >at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) > >at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) > >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) > >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) > >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) > >at > com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) > >at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) > >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) > >at > > > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) > >at > > > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) > >at > > > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) > >at > > > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) > >at > > > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) > >at > > > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) > >at > > > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) > >at > > > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) > >at > > > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) > >at > > > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) > >at > > > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) > > > > > > ** END NESTED EXCEPTION ** > > Last packet sent to the server was 202481 ms ago. > >at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) > >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) > >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) > >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) > >at > com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) > >at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) > >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) > >at > > > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) > >at > > > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) > >at > > > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) > >at > > > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) > >at > > > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) > >at > > > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) > >at > > > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) > >at > > > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) > >at > > > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) > >at > > > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) > >at > > > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) > > Jan 5, 2009 10:06:16 AM org.apache.solr.handler.dataimport.JdbcDataSource > > logError > > WARNING: Exception while closing result set > > com.mysql.jdbc.CommunicationsException: Communications link failure due > to > > underlying exception: > > > > ** BEGIN NESTED EXCEPTION ** > > > > java.io.EOFException > > > > STACKTRACE: > > > > java.io.EOFException > >at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) > >at com.mysql.jdbc.MysqlIO.reuseAndReadPa
Re: Deduplication patch not working in nightly build
looks like a bug w/ DIH with the recent fixes. --Noble On Mon, Jan 5, 2009 at 2:36 PM, Marc Sturlese wrote: > > Hey there, > I was using the Deduplication patch with Solr 1.3 release and everything was > working perfectly. Now I upgraded to a nigthly build (20th december) to be > able to use new facet algorithm and other stuff and DeDuplication is not > working any more. I have followed exactly the same steps to apply the patch > to the source code. I am geting this error: > > WARNING: Error reading data > com.mysql.jdbc.CommunicationsException: Communications link failure due to > underlying exception: > > ** BEGIN NESTED EXCEPTION ** > > java.io.EOFException > > STACKTRACE: > > java.io.EOFException >at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) >at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) >at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) >at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) >at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) >at > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) >at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) >at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) >at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) >at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) >at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) >at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) > > > ** END NESTED EXCEPTION ** > Last packet sent to the server was 202481 ms ago. >at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) >at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) >at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) >at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) >at > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) >at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) >at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) >at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) >at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) >at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) >at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) > Jan 5, 2009 10:06:16 AM org.apache.solr.handler.dataimport.JdbcDataSource > logError > WARNING: Exception while closing result set > com.mysql.jdbc.CommunicationsException: Communications link failure due to > underlying exception: > > ** BEGIN NESTED EXCEPTION ** > > java.io.EOFException > > STACKTRACE: > > java.io.EOFException >at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) >at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2351) >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) >at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) >at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) >at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) >at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) >at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:150) >at com.mysql.jdbc.R