DIH Full Index Issue
Good Morning List! I have an issue where my DIH full index is committed after a minute of indexing. My counts will fall from around 400K to 85K until the import is finished, usually about four (4) minutes later. This is problematic for us as there are 315K missing items in our searches. Versioning Info: solr-spec - 6.3.0 solr-impl - 6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02 19:52:42 lucene-spec - 6.3.0 lucene-impl - 6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02 19:47:11 solrconfig.xml snippet ${solr.ulog.dir:} -1 false -1 Any insights would be greatly appreciated. Let me know if more information is required. AJ
RE: DIH Full Index Issue
09 13:40:56.459 INFO (qtp2080166188-40636) [c:collectionXXX s:shard1 r:core_node1 x:collectionXXX_shard1_replica2] o.a.s.u.p.LogUpdateProcessorFactory [collectionXXX_shard1_replica2] webapp=/solr path=/update/json params={commit=true&wt=json}{} 0 0 2017-03-09 13:40:56.459 ERROR (qtp2080166188-40636) [c:collectionXXX s:shard1 r:core_node1 x:collectionXXX_shard1_replica2] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: [doc=553094612] missing required field: adtype 2017-03-09 13:41:00.053 INFO (qtp2080166188-41928) [c:collectionXXX s:shard1 r:core_node1 x:collectionXXX_shard1_replica2] o.a.s.c.S.Request [collectionXXX_shard1_replica2] webapp=/solr path=/schema params={wt=json} status=0 QTime=0 AJ -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Wednesday, March 8, 2017 9:33 AM To: solr-user Subject: Re: DIH Full Index Issue Are you perhaps indexing at the same time from the source other than DIH? Because the commit is global and all the changes from all the sources will become visible. Check the access logs perhaps to see the requests to /update handler or similar. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 8 March 2017 at 09:27, AJ Lemke wrote: > Good Morning List! > > I have an issue where my DIH full index is committed after a minute of > indexing. > My counts will fall from around 400K to 85K until the import is finished, > usually about four (4) minutes later. > > This is problematic for us as there are 315K missing items in our searches. > > Versioning Info: > solr-spec - 6.3.0 > solr-impl - 6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - > 2016-11-02 19:52:42 lucene-spec - 6.3.0 lucene-impl - 6.3.0 > a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02 > 19:47:11 > > solrconfig.xml snippet > class="solr.DirectUpdateHandler2"> > > ${solr.ulog.dir:} > > > -1 > false > > > -1 > > > > > Any insights would be greatly appreciated. > Let me know if more information is required. > > AJ
DIH Issues
Hey all, We are using 6.3.0 and we have issues with DIH throwing errors. We are seeing an intermittent issue where on a full index a single error will be thrown. The error is always "missing required field: fieldname". Our SQL database always has data in the field that comes up with the error. Most of the errors are coming on fields that SQL has marked as required. Would anyone have any hints or ideas where to look to remedy this situation. As always if you need more information let me know. Thanks AJ
RE: DIH Issues
Thanks for the thought Alex! The fields that have this happen most often are numeric and boolean fields. These fields have real data (id numbers, true/false, etc.) AJ -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Tuesday, April 25, 2017 8:27 AM To: solr-user Subject: Re: DIH Issues Maybe the content gets simplified away between the database and the Solr schema. For example if your field contains just spaces and you have UpdateRequestProcessors to do trim and removal of empty fields? Schemaless mode will remove empty fields, but will not trim for example. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 25 April 2017 at 09:21, AJ Lemke wrote: > Hey all, > > We are using 6.3.0 and we have issues with DIH throwing errors. We are > seeing an intermittent issue where on a full index a single error will be > thrown. The error is always "missing required field: fieldname". > Our SQL database always has data in the field that comes up with the error. > Most of the errors are coming on fields that SQL has marked as required. > > Would anyone have any hints or ideas where to look to remedy this situation. > > As always if you need more information let me know. > > Thanks > AJ
Missing Records
Hi All, We have a SOLR cloud instance that has been humming along nicely for months. Last week we started experiencing missing records. Admin DIH Example: Fetched: 903,993 (736/s), Skipped: 0, Processed: 903,993 (736/s) A *:* search claims that there are only 903,902 this is the first full index. Subsequent full indexes give the following counts for the *:* search 903,805 903,665 826,357 All the while the admin returns: Fetched: 903,993 (x/s), Skipped: 0, Processed: 903,993 (x/s) every time. ---records per second is variable I found an item that should be in the index but is not found in a search. Here are the referenced lines of the log file. DEBUG - 2014-10-30 15:10:51.160; org.apache.solr.update.processor.LogUpdateProcessor; PRE_UPDATE add{,id=750041421} {{params(debug=false&optimize=true&indent=true&commit=true&clean=true&wt=json&command=full-import&entity=ads&verbose=false),defaults(config=data-config.xml)}} DEBUG - 2014-10-30 15:10:51.160; org.apache.solr.update.SolrCmdDistributor; sending update to http://192.168.20.57:7574/solr/inventory_shard1_replica2/ retry:0 add{,id=750041421} params:update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica1%2F --- there are 746 lines of log between entries --- DEBUG - 2014-10-30 15:10:51.340; org.apache.http.impl.conn.Wire; >> "[0x2][0xc3][0xe0]¶ms[0xa2][0xe0].update.distrib(TOLEADER[0xe0],distrib.from?[0x17]http://192.168.20.57:8983/solr/inventory_shard1_replica1/[0xe0]&delByQ[0x0][0xe0]'docsMap[0xe][0x13][0x10]8[0x8]?[0x80][0x0][0x0][0xe0]#Zip%51106[0xe0]-IsReelCentric[0x2][0xe0](HasPrice[0x1][0xe0]*Make_Lower'ski-doo[0xe0])StateName$Iowa[0xe0]-OriginalModel/Summit Highmark[0xe0]/VerticalSiteIDs!2[0xe0]-ClassBinaryIDp@[0xe0]#lat(42.48929[0xe0]-SubClassFacet01704|Snowmobiles[0xe0](FuelType%Other[0xe0]2DivisionName_Lower,recreational[0xe0]&latlon042.4893,-96.3693[0xe0]*PhotoCount!8[0xe0](HasVideo[0x2][0xe0]"ID)750041421[0xe0]&Engine [0xe0]*ClassFacet.12|Snowmobiles[0xe0]$Make'Ski-Doo[0xe0]$City*Sioux City[0xe0]#lng*-96.369302[0xe0]-Certification!N[0xe0]0EmotionalTagline0162" Long Track [0xe0]*IsEnhanced[0x1][0xe0]*SubClassID$1704[0xe0](NetPrice$4500[0xe0]1IsInternetSpecial[0x2][0xe0](HasPhoto[0x1][0xe0]/DealerSortOrder!2[0xe0]+Description?VThis Bad boy will pull you through the deepest snow!With the 162" track and 1000cc of power you can fly up any hill!![0xe0],DealerRadius+8046.72[0xe0],Transmission [0xe0]*ModelFacet7Ski-Doo|Summit Highmark[0xe0]/DealerNameFacet9Certified Auto, Inc.|4150[0xe0])StateAbbr"IA[0xe0])ClassName+Snowmobiles[0xe0](DealerID$4150[0xe0]&AdCode$DX1Q[0xe0]*DealerName4Certified Auto, Inc.[0xe0])Condition$Used[0xe0]/Condition_Lower$used[0xe0]-ExteriorColor+Blue/Yellow[0xe0],DivisionName,Recreational[0xe0]$Trim(1000 SDI[0xe0](SourceID!1[0xe0]0HasAdEnhancement!0[0xe0]'ClassID"12[0xe0].FuelType_Lower%other[0xe0]$Year$2005[0xe0]+DealerFacet?[0x8]4150|Certified Auto, Inc.|Sioux City|IA[0xe0],SubClassName+Snowmobiles[0xe0]%Model/Summit Highmark[0xe0])EntryDate42011-11-17T10:46:00Z[0xe0]+StockNumber&000105[0xe0]+PriceRebate!0[0xe0]+Model_Lower/summit highmark[\n]" What could be the issue and how does one fix this issue? Thanks so much and if more information is needed I have preserved the log files. AJ
RE: Missing Records
I started this collection using this command: http://localhost:8983/solr/admin/collections?action=CREATE&name=inventory&numShards=1&replicationFactor=2&maxShardsPerNode=4 So 1 shard and replicationFactor of 2 AJ -Original Message- From: S.L [mailto:simpleliving...@gmail.com] Sent: Thursday, October 30, 2014 5:12 PM To: solr-user@lucene.apache.org Subject: Re: Missing Records I am curious , how many shards do you have and whats the replication factor you are using ? On Thu, Oct 30, 2014 at 5:27 PM, AJ Lemke wrote: > Hi All, > > We have a SOLR cloud instance that has been humming along nicely for > months. > Last week we started experiencing missing records. > > Admin DIH Example: > Fetched: 903,993 (736/s), Skipped: 0, Processed: 903,993 (736/s) A *:* > search claims that there are only 903,902 this is the first full > index. > Subsequent full indexes give the following counts for the *:* search > 903,805 > 903,665 > 826,357 > > All the while the admin returns: Fetched: 903,993 (x/s), Skipped: 0, > Processed: 903,993 (x/s) every time. ---records per second is variable > > > I found an item that should be in the index but is not found in a search. > > Here are the referenced lines of the log file. > > DEBUG - 2014-10-30 15:10:51.160; > org.apache.solr.update.processor.LogUpdateProcessor; PRE_UPDATE > add{,id=750041421} > {{params(debug=false&optimize=true&indent=true&commit=true&clean=true& > wt=json&command=full-import&entity=ads&verbose=false),defaults(config= > data-config.xml)}} > DEBUG - 2014-10-30 15:10:51.160; > org.apache.solr.update.SolrCmdDistributor; sending update to > http://192.168.20.57:7574/solr/inventory_shard1_replica2/ retry:0 > add{,id=750041421} > params:update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57 > %3A8983%2Fsolr%2Finventory_shard1_replica1%2F > > --- there are 746 lines of log between entries --- > > DEBUG - 2014-10-30 15:10:51.340; org.apache.http.impl.conn.Wire; >> > "[0x2][0xc3][0xe0]¶ms[0xa2][0xe0].update.distrib(TOLEADER[0xe0],di > strib.from?[0x17] > http://192.168.20.57:8983/solr/inventory_shard1_replica1/[0xe0]&delByQ > [0x0][0xe0]'docsMap[0xe][0x13][0x10]8[0x8]?[0x80][0x0][0x0][0xe0]#Zip% > 51106[0xe0]-IsReelCentric[0x2][0xe0](HasPrice[0x1][0xe0]*Make_Lower'sk > i-doo[0xe0])StateName$Iowa[0xe0]-OriginalModel/Summit > Highmark[0xe0]/VerticalSiteIDs!2[0xe0]-ClassBinaryIDp@[0xe0]#lat(42.48 > 929[0xe0]-SubClassFacet01704|Snowmobiles[0xe0](FuelType%Other[0xe0]2Di > visionName_Lower,recreational[0xe0]&latlon042.4893,-96.3693[0xe0]*Phot > oCount!8[0xe0](HasVideo[0x2][0xe0]"ID)750041421[0xe0]&Engine > [0xe0]*ClassFacet.12|Snowmobiles[0xe0]$Make'Ski-Doo[0xe0]$City*Sioux > City[0xe0]#lng*-96.369302[0xe0]-Certification!N[0xe0]0EmotionalTagline0162" > Long Track > [0xe0]*IsEnhanced[0x1][0xe0]*SubClassID$1704[0xe0](NetPrice$4500[0xe0] > 1IsInternetSpecial[0x2][0xe0](HasPhoto[0x1][0xe0]/DealerSortOrder!2[0x > e0]+Description?VThis Bad boy will pull you through the deepest > snow!With the 162" track and 1000cc of power you can fly up any > hill!![0xe0],DealerRadius+8046.72[0xe0],Transmission > [0xe0]*ModelFacet7Ski-Doo|Summit > Highmark[0xe0]/DealerNameFacet9Certified > Auto, > Inc.|4150[0xe0])StateAbbr"IA[0xe0])ClassName+Snowmobiles[0xe0](DealerI > D$4150[0xe0]&AdCode$DX1Q[0xe0]*DealerName4Certified > Auto, > Inc.[0xe0])Condition$Used[0xe0]/Condition_Lower$used[0xe0]-ExteriorCol > or+Blue/Yellow[0xe0],DivisionName,Recreational[0xe0]$Trim(1000 > SDI[0xe0](SourceID!1[0xe0]0HasAdEnhancement!0[0xe0]'ClassID"12[0xe0].F > uelType_Lower%other[0xe0]$Year$2005[0xe0]+DealerFacet?[0x8]4150|Certif > ied Auto, Inc.|Sioux > City|IA[0xe0],SubClassName+Snowmobiles[0xe0]%Model/Summit > Highmark[0xe0])EntryDate42011-11-17T10:46:00Z[0xe0]+StockNumber&000105 > [0xe0]+PriceRebate!0[0xe0]+Model_Lower/summit > highmark[\n]" > What could be the issue and how does one fix this issue? > > Thanks so much and if more information is needed I have preserved the > log files. > > AJ >
RE: Missing Records
Hi Erick: All of the records are coming out of an auto numbered field so the ID's will all be unique. Here is the the test I ran this morning: Indexing completed. Added/Updated: 903,993 documents. Deleted 0 documents. (Duration: 28m) Requests: 1 (0/s), Fetched: 903,993 (538/s), Skipped: 0, Processed: 903,993 (538/s) Started: 33 minutes ago Last Modified:4 minutes ago Num Docs:903829 Max Doc:903829 Heap Memory Usage:-1 Deleted Docs:0 Version:1517 Segment Count:16 Optimized: checked Current: checked If there were duplicates only one of the duplicates should be removed and I still should be able to search for the ID and find one correct? As it is right now I am missing records that should be in the collection. I also noticed this: org.apache.solr.common.SolrException: Bad Request request: http://192.168.20.57:7574/solr/inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica2%2F&wt=javabin&version=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) AJ -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, October 30, 2014 7:08 PM To: solr-user@lucene.apache.org Subject: Re: Missing Records First question: Is there any possibility that some of the docs have duplicate IDs (s)? If so, then some of the docs will be replaced, which will lower your returns. One way to figuring this out is to go to the admin screen and if numDocs < maxDoc, then documents have been replaced. Also, if numDocs is smaller than 903,993 then you probably have some docs being replaced. One warning, however. Even if docs are deleted, then this could still be the case because when segments are merged the deleted docs are purged. Best, Erick On Thu, Oct 30, 2014 at 3:12 PM, S.L wrote: > I am curious , how many shards do you have and whats the replication > factor you are using ? > > On Thu, Oct 30, 2014 at 5:27 PM, AJ Lemke wrote: > >> Hi All, >> >> We have a SOLR cloud instance that has been humming along nicely for >> months. >> Last week we started experiencing missing records. >> >> Admin DIH Example: >> Fetched: 903,993 (736/s), Skipped: 0, Processed: 903,993 (736/s) A >> *:* search claims that there are only 903,902 this is the first full >> index. >> Subsequent full indexes give the following counts for the *:* search >> 903,805 >> 903,665 >> 826,357 >> >> All the while the admin returns: Fetched: 903,993 (x/s), Skipped: 0, >> Processed: 903,993 (x/s) every time. ---records per second is >> variable >> >> >> I found an item that should be in the index but is not found in a search. >> >> Here are the referenced lines of the log file. >> >> DEBUG - 2014-10-30 15:10:51.160; >> org.apache.solr.update.processor.LogUpdateProcessor; PRE_UPDATE >> add{,id=750041421} >> {{params(debug=false&optimize=true&indent=true&commit=true&clean=true >> &wt=json&command=full-import&entity=ads&verbose=false),defaults(confi >> g=data-config.xml)}} >> DEBUG - 2014-10-30 15:10:51.160; >> org.apache.solr.update.SolrCmdDistributor; sending update to >> http://192.168.20.57:7574/solr/inventory_shard1_replica2/ retry:0 >> add{,id=750041421} >> params:update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.5 >> 7%3A8983%2Fsolr%2Finventory_shard1_replica1%2F >> >> --- there are 746 lines of log between entries --- >> >> DEBUG - 2014-10-30 15:10:51.340; org.apache.http.impl.conn.Wire; >> >> "[0x2][0xc3][0xe0]¶ms[0xa2][0xe0].update.distrib(TOLEADER[0xe0],d >> istrib.from?[0x17] >> http://192.168.20.57:8983/solr/inventory_shard1_replica1/[0xe0]&delBy >> Q[0x0][0xe0]'docsMap[0xe][0x13][0x10]8[0x8]?[0x80][0x0][0x0][0xe0]#Zi >> p%51106[0xe0]-IsReelCentric[0x2][0xe0](HasPrice[0x1][0xe0]*Make_Lower >> 'ski-doo[0xe0])StateName$Iowa[0xe0]-OriginalModel/Summit >> Highmark[0xe0]/VerticalSiteIDs!2[0xe0]-ClassBinaryIDp@[0xe0]#lat(42.4 >> 8929[0xe0]-SubClassFacet01704|Snowmobiles[0xe0](FuelType%Other[0xe0]2 >> DivisionName_Lower,recreational[0xe0]&latlon042.4893,-96.3693[0xe0]*P >> hotoCount!8[0xe0](HasVideo[0x2][0xe0]"ID)750041421[0xe0]&Engine >> [0xe0]*ClassFacet.12|Snowmobiles[0xe0]$Make'Ski-Doo[0xe0]$City*Sioux >> City[0xe0]#lng*-96.369302[0xe0]-Certification!N[0xe0]0Emotion
RE: Missing Records
I have run some more tests so the numbers have changed a bit. Index Results done on Node 1: Indexing completed. Added/Updated: 903,993 documents. Deleted 0 documents. (Duration: 31m 47s) Requests: 1 (0/s), Fetched: 903,993 (474/s), Skipped: 0, Processed: 903,993 Node 1: Last Modified: 44 minutes ago Num Docs: 824216 Max Doc: 824216 Heap Memory Usage: -1 Deleted Docs: 0 Version: 1051 Segment Count: 1 Optimized: checked Current: checked Node 2: Last Modified: 44 minutes ago Num Docs: 824216 Max Doc: 824216 Heap Memory Usage: -1 Deleted Docs: 0 Version: 1051 Segment Count: 1 Optimized: checked Current: checked Search results are the same as the doc numbers above. Logs only have one instance of an error: ERROR - 2014-10-31 10:47:12.867; org.apache.solr.update.StreamingSolrServers$1; error org.apache.solr.common.SolrException: Bad Request request: http://192.168.20.57:7574/solr/inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica2%2F&wt=javabin&version=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Some info that may be of help This is on my local vm using jetty with the embedded zookeeper. Commands to start cloud: java -DzkRun -jar start.jar java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar sh zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir ~/development/configs/inventory/ -confname config_ inventory sh zkcli.sh -zkhost localhost:9983 -cmd linkconfig -collection inventory -confname config_ inventory curl "http://localhost:8983/solr/admin/collections?action=CREATE&name=inventory&numShards=1&replicationFactor=2&maxShardsPerNode=4"; curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name= inventory " AJ -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, October 31, 2014 9:49 AM To: solr-user@lucene.apache.org Subject: Re: Missing Records OK, that is puzzling. bq: If there were duplicates only one of the duplicates should be removed and I still should be able to search for the ID and find one correct? Correct. Your bad request error is puzzling, you may be on to something there. What it looks like is that somehow some of the documents you're sending to Solr aren't getting indexed, either being dropped through the network or perhaps have invalid fields, field formats (i.e. a date in the wrong format, whatever) or some such. When you complete the run, what are the maxDoc and numDocs numbers on one of the nodes? What else do you see in the logs? They're pretty big after that many adds, but maybe you can grep for ERROR and see something interesting like stack traces. Or even "org.apache.solr". This latter will give you some false hits, but at least it's better than paging through a huge log file Personally, in this kind of situation I sometimes use SolrJ to do my indexing rather than DIH, I find it easier to debug so that's another possibility. In the worst case with SolrJ, you can send the docs one at a time Best, Erick On Fri, Oct 31, 2014 at 7:37 AM, AJ Lemke wrote: > Hi Erick: > > All of the records are coming out of an auto numbered field so the ID's will > all be unique. > > Here is the the test I ran this morning: > > Indexing completed. Added/Updated: 903,993 documents. Deleted 0 > documents. (Duration: 28m) > Requests: 1 (0/s), Fetched: 903,993 (538/s), Skipped: 0, Processed: > 903,993 (538/s) > Started: 33 minutes ago > > Last Modified:4 minutes ago > Num Docs:903829 > Max Doc:903829 > Heap Memory Usage:-1 > Deleted Docs:0 > Version:1517 > Segment Count:16 > Optimized: checked > Current: checked > > If there were duplicates only one of the duplicates should be removed and I > still should be able to search for the ID and find one correct? > As it is right now I am missing records that should be in the collection. > > I also noticed this: > > org.apache.solr.common.SolrException: Bad Request > > > > request: > http://192.168.20.57:7574/solr/inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica2%2F&wt=javabin&version=2 > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadP
RE: Missing Records
So I jumped back on this. I have not been using the optimize option on this new set of tests. If I run the full index on the leader I seem to get all of the items in the database minus 3 that have a missing field. Indexing completed. Added/Updated: 903,990 documents. Deleted 0 documents. (Duration: 25m 11s) Requests: 1 (0/s), Fetched: 903,993 (598/s), Skipped: 0, Processed: 903,990 Last Modified:2 minutes ago Num Docs:903990 Max Doc:903990 Heap Memory Usage:2625744 Deleted Docs:0 Version:3249 Segment Count:7 Optimized: Current: If I run it on the other node I get: Indexing completed. Added/Updated: 903,993 documents. Deleted 0 documents. (Duration: 27m 08s) Requests: 1 (0/s), Fetched: 903,993 (555/s), Skipped: 0, Processed: 903,993 (555/s) Last Modified:about a minute ago Num Docs:897791 Max Doc:897791 Heap Memory Usage:2621072 Deleted Docs:0 Version:3285 Segment Count:7 Optimized: Current: Any ideas? If there is any more info that is needed let me know. AJ -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, October 31, 2014 1:44 PM To: solr-user@lucene.apache.org Subject: Re: Missing Records Sorry to say this, but I don't think the numDocs/maxDoc numbers are telling you anything. because it looks like you've optimized which purges any data associated with deleted docs, including the internal IDs which are the numDocs/maxDocs figures. So if there were deletions, we can't see any evidence of same. Siih. On Fri, Oct 31, 2014 at 9:56 AM, AJ Lemke wrote: > I have run some more tests so the numbers have changed a bit. > > Index Results done on Node 1: > Indexing completed. Added/Updated: 903,993 documents. Deleted 0 > documents. (Duration: 31m 47s) > Requests: 1 (0/s), Fetched: 903,993 (474/s), Skipped: 0, Processed: > 903,993 > > Node 1: > Last Modified: 44 minutes ago > Num Docs: 824216 > Max Doc: 824216 > Heap Memory Usage: -1 > Deleted Docs: 0 > Version: 1051 > Segment Count: 1 > Optimized: checked > Current: checked > > Node 2: > Last Modified: 44 minutes ago > Num Docs: 824216 > Max Doc: 824216 > Heap Memory Usage: -1 > Deleted Docs: 0 > Version: 1051 > Segment Count: 1 > Optimized: checked > Current: checked > > Search results are the same as the doc numbers above. > > Logs only have one instance of an error: > > ERROR - 2014-10-31 10:47:12.867; > org.apache.solr.update.StreamingSolrServers$1; error > org.apache.solr.common.SolrException: Bad Request > > > > request: > http://192.168.20.57:7574/solr/inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica2%2F&wt=javabin&version=2 > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > Some info that may be of help > This is on my local vm using jetty with the embedded zookeeper. > Commands to start cloud: > > java -DzkRun -jar start.jar > java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar > > sh zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir > ~/development/configs/inventory/ -confname config_ inventory sh > zkcli.sh -zkhost localhost:9983 -cmd linkconfig -collection inventory > -confname config_ inventory > > curl > "http://localhost:8983/solr/admin/collections?action=CREATE&name=inventory&numShards=1&replicationFactor=2&maxShardsPerNode=4"; > curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name= > inventory " > > AJ > > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, October 31, 2014 9:49 AM > To: solr-user@lucene.apache.org > Subject: Re: Missing Records > > OK, that is puzzling. > > bq: If there were duplicates only one of the duplicates should be removed and > I still should be able to search for the ID and find one correct? > > Correct. > > Your bad request error is puzzling, you may be on to something there. > What it looks like is that somehow some of the documents you're > sending to Solr aren't getting indexed, either being dropped through > the network or perhaps have invalid fields, field formats (i.e. a date > in the wrong format, > whatever) or some such. When you complete the run, what are the maxDoc and > numDocs numbers on one of the nodes? > > What else do you see in the logs? They're pretty big after that
RE: Missing Records
Another round of tests this morning. Ten rounds of imports all done on the non-leader node: 902294 900089 899267 898127 901945 901055 899638 899392 899880 901812 The expected number of records is 903990 I am getting this error: org.apache.solr.common.SolrException: Bad Request request: http://192.168.20.51:8983/solr/Inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.51%3A7574%2Fsolr%2FInventory_shard1_replica2%2F&wt=javabin&version=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) And I am getting this warning: org.apache.solr.common.SolrException: Bad Request request: http://192.168.20.51:8983/solr/Inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.51%3A7574%2Fsolr%2FInventory_shard1_replica2%2F&wt=javabin&version=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) These are both from the admin logging section. I have retained the logs files if it would help AJ -Original Message- From: AJ Lemke [mailto:aj.le...@securitylabs.com] Sent: Monday, November 3, 2014 5:31 PM To: solr-user@lucene.apache.org Subject: RE: Missing Records So I jumped back on this. I have not been using the optimize option on this new set of tests. If I run the full index on the leader I seem to get all of the items in the database minus 3 that have a missing field. Indexing completed. Added/Updated: 903,990 documents. Deleted 0 documents. (Duration: 25m 11s) Requests: 1 (0/s), Fetched: 903,993 (598/s), Skipped: 0, Processed: 903,990 Last Modified:2 minutes ago Num Docs:903990 Max Doc:903990 Heap Memory Usage:2625744 Deleted Docs:0 Version:3249 Segment Count:7 Optimized: Current: If I run it on the other node I get: Indexing completed. Added/Updated: 903,993 documents. Deleted 0 documents. (Duration: 27m 08s) Requests: 1 (0/s), Fetched: 903,993 (555/s), Skipped: 0, Processed: 903,993 (555/s) Last Modified:about a minute ago Num Docs:897791 Max Doc:897791 Heap Memory Usage:2621072 Deleted Docs:0 Version:3285 Segment Count:7 Optimized: Current: Any ideas? If there is any more info that is needed let me know. AJ -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, October 31, 2014 1:44 PM To: solr-user@lucene.apache.org Subject: Re: Missing Records Sorry to say this, but I don't think the numDocs/maxDoc numbers are telling you anything. because it looks like you've optimized which purges any data associated with deleted docs, including the internal IDs which are the numDocs/maxDocs figures. So if there were deletions, we can't see any evidence of same. Siiggggh. On Fri, Oct 31, 2014 at 9:56 AM, AJ Lemke wrote: > I have run some more tests so the numbers have changed a bit. > > Index Results done on Node 1: > Indexing completed. Added/Updated: 903,993 documents. Deleted 0 > documents. (Duration: 31m 47s) > Requests: 1 (0/s), Fetched: 903,993 (474/s), Skipped: 0, Processed: > 903,993 > > Node 1: > Last Modified: 44 minutes ago > Num Docs: 824216 > Max Doc: 824216 > Heap Memory Usage: -1 > Deleted Docs: 0 > Version: 1051 > Segment Count: 1 > Optimized: checked > Current: checked > > Node 2: > Last Modified: 44 minutes ago > Num Docs: 824216 > Max Doc: 824216 > Heap Memory Usage: -1 > Deleted Docs: 0 > Version: 1051 > Segment Count: 1 > Optimized: checked > Current: checked > > Search results are the same as the doc numbers above. > > Logs only have one instance of an error: > > ERROR - 2014-10-31 10:47:12.867; > org.apache.solr.update.StreamingSolrServers$1; error > org.apache.solr.common.SolrException: Bad Request > > > > request: > http://192.168.20.57:7574/solr/inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica2%2F&wt=javabin&version=2 > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.r
sorlj indexing problem
Hi All, I am getting an error when using solrj to index records. Exception in thread "main" org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException: Exception writing document id 529241050 to the index; possible analysis error. at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:360) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at Importer.commit(Importer.java:96) at Importer.putData(Importer.java:81) at Importer.main(Importer.java:25) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception writing document id 529241050 to the index; possible analysis error. at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.impl.LBHttpSolrServer.doRequest(LBHttpSolrServer.java:340) at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:301) at org.apache.solr.client.solrj.impl.CloudSolrServer$1.call(CloudSolrServer.java:341) at org.apache.solr.client.solrj.impl.CloudSolrServer$1.call(CloudSolrServer.java:338) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) I am connecting to a local solrCloud instance (localhost:9983) and a schemaless collection. Is this possible or do I have to use a schema? Thanks! AJ
DIH Commit Issue
Hi All, I have a DIH issue where the index will commit after 1 and 2 minutes then will not commit again until the end. We would like the commit to happen at the end so the index does not lose 75% or more of the records until the end of the process. We went from 370,000+ records to around 27,000 records then back to 370,000+ when the process ended. I have change the update handler to the following. ${solr.ulog.dir:} Is there something else that I should do? Thanks All! AJ
String Cast Error
Hello all! I have a strange issue with my local SOLR install. I have a search that sorts on a boolean field. This search is pulling the following error: "java.lang.String cannot be cast to org.apache.lucene.util.BytesRef". The search is over the dummy data that is included in the exampledocs. I am searching on a 2 node, 2 shard environment. I have included what I think is the pertinent information below. Thanks for any insights AJ Pertinent Information: Query string: http://localhost:8983/solr/collection1/select?q=*%3A*&sort=inStock+desc&wt=json&indent=true Release info: solr-spec: 4.7.0 solr-impl: 4.7.0 1570806 - simon - 2014-02-22 08:36:23 lucene-spec: 4.7.0 lucene-impl: 4.7.0 1570806 - simon - 2014-02-22 08:25:23 Environment Start Up: java -Dbootstrap_confdir=solr/collection1/conf -Dcollection.configName=collection1 -DzkRun -DnumShards=2 -jar start.jar java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar java -Dtype=text/xml -jar post.jar *.xml java -Dtype=text/csv -jar post.jar *.csv java -Dtype=application/json -jar post.jar *.json Stack Trace: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.lucene.util.BytesRef at org.apache.lucene.search.FieldComparator$TermOrdValComparator.compareValues(FieldComparator.java:940) at org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:245) at org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:237) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:162) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:104) at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:159) at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:909) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:661) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:640) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:321) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744)
RE: String Cast Error
Did you change the schema at all? No Did you upgrade Solr from a previous version with the same index? No This was fresh install from the website. Ran "ant run-example" Killed that instance Copied Example to Node1 Copied Example to Node2 Switched into Node1 java -Dbootstrap_confdir=solr/collection1/conf -Dcollection.configName=collection1 -DzkRun -DnumShards=2 -jar start.jar Switched into Node2 java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar Added documents using the following: Switched into example/exampledocs java -Dtype=text/xml -jar post.jar *.xml java -Dtype=text/csv -jar post.jar *.csv java -Dtype=application/json -jar post.jar *.json If you need more info let me know. AJ -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Tuesday, March 18, 2014 5:14 PM To: solr-user@lucene.apache.org Subject: Re: String Cast Error On 3/18/2014 3:51 PM, AJ Lemke wrote: > I have a strange issue with my local SOLR install. > I have a search that sorts on a boolean field. This search is pulling the > following error: "java.lang.String cannot be cast to > org.apache.lucene.util.BytesRef". > > The search is over the dummy data that is included in the exampledocs. I am > searching on a 2 node, 2 shard environment. > > I have included what I think is the pertinent information below. > Thanks for any insights Info gathering questions: Did you change the schema at all? If so, did you reindex? A reindex is required for virtually all schema changes. Did you upgrade Solr from a previous version with the same index?Solr is supposed to handle this with no problems. If you did ugprade Solr, whether the index was rebuilt or not, there may be leftover components from the previous war extraction. It looks like you're using the example jetty included with Solr, so this would apply: It's best to delete anything in the solr-webapp directory for each upgrade. You also want to be sure that you don't have old jars hanging around in your classpath. You could be running into a bug that I don't know about yet, but we can't assume that's the case without ruling other problems out. Thanks, Shawn
RE: String Cast Error
An update to this. If I change my search and add a parameter the error seems to go away. Error: /solr/collection1/select?q=*:* &wt=json&sort=inStock desc No Error: /solr/collection1/select?q=Samsung&wt=json&sort=inStock desc AJ -Original Message- From: AJ Lemke [mailto:aj.le...@securitylabs.com] Sent: Tuesday, March 18, 2014 5:28 PM To: solr-user@lucene.apache.org Subject: RE: String Cast Error Did you change the schema at all? No Did you upgrade Solr from a previous version with the same index? No This was fresh install from the website. Ran "ant run-example" Killed that instance Copied Example to Node1 Copied Example to Node2 Switched into Node1 java -Dbootstrap_confdir=solr/collection1/conf -Dcollection.configName=collection1 -DzkRun -DnumShards=2 -jar start.jar Switched into Node2 java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar Added documents using the following: Switched into example/exampledocs java -Dtype=text/xml -jar post.jar *.xml java -Dtype=text/csv -jar post.jar *.csv java -Dtype=application/json -jar post.jar *.json If you need more info let me know. AJ -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Tuesday, March 18, 2014 5:14 PM To: solr-user@lucene.apache.org Subject: Re: String Cast Error On 3/18/2014 3:51 PM, AJ Lemke wrote: > I have a strange issue with my local SOLR install. > I have a search that sorts on a boolean field. This search is pulling the > following error: "java.lang.String cannot be cast to > org.apache.lucene.util.BytesRef". > > The search is over the dummy data that is included in the exampledocs. I am > searching on a 2 node, 2 shard environment. > > I have included what I think is the pertinent information below. > Thanks for any insights Info gathering questions: Did you change the schema at all? If so, did you reindex? A reindex is required for virtually all schema changes. Did you upgrade Solr from a previous version with the same index?Solr is supposed to handle this with no problems. If you did ugprade Solr, whether the index was rebuilt or not, there may be leftover components from the previous war extraction. It looks like you're using the example jetty included with Solr, so this would apply: It's best to delete anything in the solr-webapp directory for each upgrade. You also want to be sure that you don't have old jars hanging around in your classpath. You could be running into a bug that I don't know about yet, but we can't assume that's the case without ruling other problems out. Thanks, Shawn