Solr 5.2: Same Document in Multiple Shard
We have recently upgraded solr from 4.8 to 5.2. We have 2 shard and 2 replica in solr cloud. It shows correctly in SolrCloud via Solr Admin Panel. We found that sometimes same document is available in both the shards. We confirmed via querying individual shard (from solr admin by passing shards parameter). Can it be due to some configuration issue? How can we fix it? -Maulin [CC Award Winners 2014]
Indexing Fixed length file
Hello, i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files they have a fixed length and no withespace to seperate the words. How can i Programm a Template or so for my fields? Or can i edit the schema.xml for my Problem? This ist one record from one file, in this file are 40 - 100 records. AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 130445 Thanks! Tim -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: no default request handler is registered
-Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Thursday, August 27, 2015 3:51 PM To: solr-user@lucene.apache.org Subject: Re: no default request handler is registered On 8/27/2015 1:10 PM, Scott Hollenbeck wrote: I'm doing some experimenting with Solr 5.3 and the 7.x-1.x-dev version of the Apache Solr Search module for Drupal. Things seem to be working fine, except that this warning message appears in the Solr admin logging window and in the server log: no default request handler is registered (either '/select' or 'standard') Looking at the solrconfig.xml file that comes with the Drupal module I see a requestHandler named standard: requestHandler name=standard class=solr.SearchHandler lst name=defaults str name=dfcontent/str str name=echoParamsexplicit/str bool name=omitHeadertrue/bool /lst /requestHandler I also see a handler named pinkPony with a default attribute set to true: snip So it seems like there are both standard and default requestHandlers specified. Why is the warning produced? What am I missing? I think the warning message may be misworded, or logged in incorrect circumstances, and might need some attention. The solrconfig.xml that you are using (which I assume came from the Drupal project) is geared towards a 3.x version of Solr prior to 3.6.x (the last minor version in the 3.x line). Starting in the 3.6 version, all request handlers in examples have names that start with a forward slash, like /select, none of them have the default attribute, and the handleSelect parameter found elsewhere in the solrconfig.xml is false. You should bring this up with the Drupal folks and ask them to upgrade their config/schema and their code for modern versions of Solr. Solr 3.6.0 (which deprecated their handler naming convention and the default attribute) was released over three years ago. Thanks for the replies. The config files I'm using came from a Drupal sandbox project that's focused on Solr 5.x compatibility. I've added an issue to that project's queue. We'll see how it goes. Scott Hollenbeck
Re: What is the correct path for mysql jdbc connector on Solr?
On 8/28/2015 6:18 AM, Merlin Morgenstern wrote: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Could not load driver: com.mysql.jdbc.Driver Processing Document # 1 How many directories do I have to go up inside the config ../... ? This is the way I always recommend dealing with extra required jars: Remove all lib directives from solrconfig.xml. On each server, create a lib directory in the solr home. (the solr home is where solr.xml lives) Copy all required extra jars to that lib directory. There is currently a problem with this approach when using the Lucene ICU analysis components. You must use the full class name instead of something like solr.ICUFoldingFilterFactory. This doesn't seem to affect any classes other than the ICU analysis components. https://issues.apache.org/jira/browse/SOLR-6188 Thanks, Shawn
What is the correct path for mysql jdbc connector on Solr?
I have solrcloud installation running on 3 machines where I would like to import data from mysql. Unfortunatelly the import failes due to the missing jdbc connector. My guess is, that I am having trouble with the right directory. solrconfig.xml: lib dir=${solr.install.dir:../../..}/dist/ regex=solr-dataimporthandler-.*\.jar / file location: node1:/opt/solr-5.2.1/dist/mysql-connector-java-5.1.36-bin.jar error message: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Could not load driver: com.mysql.jdbc.Driver Processing Document # 1 How many directories do I have to go up inside the config ../... ? The config is uploaded OK within zookeeper and solr has been restarted. Thank you for any help on this!
solrcloud and core swapping
Is core swapping supported in SolrCloud? If I have a 5 nodes SolrCloud cluster and I do a core swap on the leader, will the core be swapped on the other 4 nodes as well? Or do I need to do a core swap on each node? Bill
Re: Indexing Fixed length file
Hi Tim, I haven’t heard of people indexing this kind of input with Solr, but the format is quite similar to CSV/TSV files, with the exception that the field separators have fixed positions and are omitted. You could write a short script to insert separators (e.g. commas) at these points (but be sure to escape quotation marks and the separators) and then use Solr’s CSV update functionality: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates. I think dealing with fixed-width fields directly would be a nice addition to Solr’s CSV update capabilities - feel free to make an issue - see http://wiki.apache.org/solr/HowToContribute. Steve www.lucidworks.com On Aug 28, 2015, at 3:19 AM, timmsn tim.hammac...@web.de wrote: Hello, i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files they have a fixed length and no withespace to seperate the words. How can i Programm a Template or so for my fields? Or can i edit the schema.xml for my Problem? This ist one record from one file, in this file are 40 - 100 records. AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 130445 Thanks! Tim -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrcloud and core swapping
On 8/28/2015 8:10 AM, Bill Au wrote: Is core swapping supported in SolrCloud? If I have a 5 nodes SolrCloud cluster and I do a core swap on the leader, will the core be swapped on the other 4 nodes as well? Or do I need to do a core swap on each node? When you're running SolrCloud, swapping any of the cores might really screw things up. I think it might be a good idea for Solr to return a not supported in cloud mode failure on certain CoreAdmin actions. Instead, use collection aliasing. Create collections named something like foo_0 and foo_1, and update the alias foo to point to whichever of them is currently live. Your queries and update requests will never need to know about foo_0 and foo_1 ... only the coordinating part of your system, where you would normally do your core swapping, needs to know about those. Thanks, Shawn
Re: Indexing Fixed length file
Solr doesn't know anything about such a file. The post program expects well-defined structures, see the xml and json formats in example/exampledocs. So you either have to transform the data into the form expected by the bin/post tool or perhaps you can use the CSV import, see: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates Best, Erick On Fri, Aug 28, 2015 at 12:19 AM, timmsn tim.hammac...@web.de wrote: Hello, i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files they have a fixed length and no withespace to seperate the words. How can i Programm a Template or so for my fields? Or can i edit the schema.xml for my Problem? This ist one record from one file, in this file are 40 - 100 records. AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 130445 Thanks! Tim -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 5.2: Same Document in Multiple Shard
Have you done anything special in terms of routing or are you using the default compositeId? How are you indexing? Docs are considered identical in Solr based solely on the uniqueKey field. If that's the absolute same (possibly including extra whitespace) then this shouldn't be happening, nobody else has reported this, so I suspect there's something about your setup that's odd. The clusterstate for the collection would be interesting to see, as well as your schema definition for your ID field. Best, Erick On Fri, Aug 28, 2015 at 12:52 AM, Maulin Rathod mrat...@asite.com wrote: We have recently upgraded solr from 4.8 to 5.2. We have 2 shard and 2 replica in solr cloud. It shows correctly in SolrCloud via Solr Admin Panel. We found that sometimes same document is available in both the shards. We confirmed via querying individual shard (from solr admin by passing shards parameter). Can it be due to some configuration issue? How can we fix it? -Maulin [CC Award Winners 2014]
Re: solrcloud and core swapping
On 8/28/2015 8:25 AM, Shawn Heisey wrote: Instead, use collection aliasing. Create collections named something like foo_0 and foo_1, and update the alias foo to point to whichever of them is currently live. Your queries and update requests will never need to know about foo_0 and foo_1 ... only the coordinating part of your system, where you would normally do your core swapping, needs to know about those. You might also want to have a foo_build alias pointing to the *other* collection for any full rebuild functionality, so it can also use a static collection name. Thanks, Shawn
Re: Indexing Fixed length file
If you use DataImportHandler, you can combine LineEntityProcessor with RegexTransformer to split each line into a bunch of fields: https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-TheRegexTransformer You could then trim the whitespace in the UpdateRequestProcessor chain that you can setup to run after DIH and use TrimFieldUpdate URP http://www.solr-start.com/info/update-request-processors/#TrimFieldUpdateProcessorFactory I think this should do the job. With bin/post, you could setup a custom URP chain as well, but it does not have an equivalent of RegexTransformer that splits into multiple other fields. Not that it would be hard to write one, just nobody did yet. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 28 August 2015 at 03:19, timmsn tim.hammac...@web.de wrote: Hello, i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files they have a fixed length and no withespace to seperate the words. How can i Programm a Template or so for my fields? Or can i edit the schema.xml for my Problem? This ist one record from one file, in this file are 40 - 100 records. AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 130445 Thanks! Tim -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query timeAllowed and its behavior.
As we reported, we are having issues with timeAllowed on 5.2.1. If we set a timeAllowed=1 and then run the same query with timeAllowed=3 we get the # of rows that was returned on the first query. It appears the results are cached when exceeding the timeAllowed, like the results are correct - when they are truncated. SEEMS LIKE A BUG TO ME. On Tue, Aug 25, 2015 at 5:16 AM, Jonathon Marks (BLOOMBERG/ LONDON) jmark...@bloomberg.net wrote: timeAllowed applies to the time taken by the collector in each shard (TimeLimitingCollector). Once timeAllowed is exceeded the collector terminates early, returning any partial results it has and freeing the resources it was using. From Solr 5.0 timeAllowed also applies to the query expansion phase and SolrClient request retry. From: solr-user@lucene.apache.org At: Aug 25 2015 10:18:07 Subject: Re:Query timeAllowed and its behavior. Hi, Kindly help me understand the query time allowed attribute. The following is set in solrconfig.xml. int name=timeAllowed30/int Does this setting stop the query from running after the timeAllowed is reached? If not is there a way to stop it as it will occupy resources in background for no benefit. Thanks, Modassar -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Query timeAllowed and its behavior.
On 8/28/2015 10:47 PM, William Bell wrote: As we reported, we are having issues with timeAllowed on 5.2.1. If we set a timeAllowed=1 and then run the same query with timeAllowed=3 we get the # of rows that was returned on the first query. It appears the results are cached when exceeding the timeAllowed, like the results are correct - when they are truncated. SEEMS LIKE A BUG TO ME. That sounds like a bug to me, too. Is there any indication in the results the first time that the query was aborted before it finished? If Solr can detect that it aborted the query, it should not be caching the results. Thanks, Shawn
Re: Indexing Fixed length file
How about this incantation: $ bin/solr create -c fw $ echo Q36 | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | bin/post -c fw -params fieldnames=id,valheader=false -type text/csv -d $ curl 'http://localhost:8983/solr/fw/select?q=*:*wt=csv' val,_version_,id 36,1510767115252006912,Q With a big bunch of data, the stdin detection of bin/post doesn’t work well so I’d certainly recommend going to an intermediate real file (awk... data.csv ; bin/post … data.csv) instead. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com On Aug 28, 2015, at 3:19 AM, timmsn tim.hammac...@web.de wrote: Hello, i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files they have a fixed length and no withespace to seperate the words. How can i Programm a Template or so for my fields? Or can i edit the schema.xml for my Problem? This ist one record from one file, in this file are 40 - 100 records. AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 130445 Thanks! Tim -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html Sent from the Solr - User mailing list archive at Nabble.com.
Sorting by function
Hi, I'm trying to apply the Sort by function https://wiki.apache.org/solr/FunctionQuery#Sort_By_Function solr capabilities to solve the following use case : I have a country field in my index, with values like 'US', 'FR', 'UK', etc... Then I want our users to be able to define the order of their preferred countries so that grouped results are sorted according to their preference. I need something like the map function, that assigns a number to each country code and use that for sorting, based on the users' preference. I tried to sort my groups by adding something like map(country, 'FR', 'FR', 1) to the field list, but map seems to only work for numerical values. I get errors like : Error parsing fieldname: Expected float instead of quoted string:FR Is there any other function that would allow me to map from a predefined String constant into an Integer that I can sort on ? Thanks in advance.
Re: Sorting by function
: I have a country field in my index, with values like 'US', 'FR', 'UK', : etc... : : Then I want our users to be able to define the order of their preferred : countries so that grouped results are sorted according to their preference. ... : Is there any other function that would allow me to map from a predefined : String constant into an Integer that I can sort on ? Because of how they evolved, and most of the common usecases for them, there aren't a lot of functions that operate on strings. Assuming your country field is a single valued (indexed) string field, then what you want can be done fairly simply using the the termfreq() function. termfreq(country,US) will return the (raw integer) term frequency for Term(country,US) for each doc -- assuming it's single valued (and not tokenized) that means for every doc it will be either a 0 or a 1. so you can either modify your earlier attempt at using map on the string values to do a map over the termfreq output, or you can simplify things to just multiply take the max value -- where max is just a short hand for the non 0 value ... max(mul(9,termfreq(country,US)), mul(8,termfreq(country,FR)), mul(7,termfreq(country,UK)), ...) Things get more interesting/complicated if the field isn't single valued, or is tokenized -- then individual values (like US) might have a termfreq that is greater then 1, or a doc might have more then one value, and you have to decide what kind of math operation you want to apply over those... * ignore termfreqs and ony look at if term exists? - wrap each termfreq in map to force value to either 0 or 1 * want to sort by sum of (weights * termfreq) for each term? - change max to sum in above example * ignore all but the main term that has hte highest freq for each doc? - not easy at query time - best to figure out the main term at index time and put in it's own field. -Hoss http://www.lucidworks.com/
Re: Sorting by function
Thanks Chris ! I have the country as a single valued field so your solution works perfectly ! On Fri, Aug 28, 2015 at 1:22 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I have a country field in my index, with values like 'US', 'FR', 'UK', : etc... : : Then I want our users to be able to define the order of their preferred : countries so that grouped results are sorted according to their preference. ... : Is there any other function that would allow me to map from a predefined : String constant into an Integer that I can sort on ? Because of how they evolved, and most of the common usecases for them, there aren't a lot of functions that operate on strings. Assuming your country field is a single valued (indexed) string field, then what you want can be done fairly simply using the the termfreq() function. termfreq(country,US) will return the (raw integer) term frequency for Term(country,US) for each doc -- assuming it's single valued (and not tokenized) that means for every doc it will be either a 0 or a 1. so you can either modify your earlier attempt at using map on the string values to do a map over the termfreq output, or you can simplify things to just multiply take the max value -- where max is just a short hand for the non 0 value ... max(mul(9,termfreq(country,US)), mul(8,termfreq(country,FR)), mul(7,termfreq(country,UK)), ...) Things get more interesting/complicated if the field isn't single valued, or is tokenized -- then individual values (like US) might have a termfreq that is greater then 1, or a doc might have more then one value, and you have to decide what kind of math operation you want to apply over those... * ignore termfreqs and ony look at if term exists? - wrap each termfreq in map to force value to either 0 or 1 * want to sort by sum of (weights * termfreq) for each term? - change max to sum in above example * ignore all but the main term that has hte highest freq for each doc? - not easy at query time - best to figure out the main term at index time and put in it's own field. -Hoss http://www.lucidworks.com/
Re: Indexing Fixed length file
Erik's version might be better with tabs though to avoid CSV's requirements on escaping comas, quotes, etc. And maybe trim those fields a bit either in awk or in URP inside Solr. But it would definitely work. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 28 August 2015 at 12:39, Erik Hatcher erik.hatc...@gmail.com wrote: How about this incantation: $ bin/solr create -c fw $ echo Q36 | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | bin/post -c fw -params fieldnames=id,valheader=false -type text/csv -d $ curl 'http://localhost:8983/solr/fw/select?q=*:*wt=csv' val,_version_,id 36,1510767115252006912,Q With a big bunch of data, the stdin detection of bin/post doesn’t work well so I’d certainly recommend going to an intermediate real file (awk... data.csv ; bin/post … data.csv) instead. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com On Aug 28, 2015, at 3:19 AM, timmsn tim.hammac...@web.de wrote: Hello, i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files they have a fixed length and no withespace to seperate the words. How can i Programm a Template or so for my fields? Or can i edit the schema.xml for my Problem? This ist one record from one file, in this file are 40 - 100 records. AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 130445 Thanks! Tim -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html Sent from the Solr - User mailing list archive at Nabble.com.
ping handler very doubtful
So, I tested that the PingRequestHandler works in the following fashion: cd server/corename/data/index # some work with ls and awk to produce a script, and then it runs dd if=/dev/urandom of=`pwd`/segments_10 bs=160 count=1 dd if=/dev/urandom of=`pwd`/_u.fdt bs=41512 count=1 dd if=/dev/urandom of=`pwd`/_u.fdx bs=100 count=1 dd if=/dev/urandom of=`pwd`/_u.fnm bs=1573 count=1 dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.doc bs=16824 count=1 dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.pos bs=17677 count=1 dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.tim bs=54010 count=1 dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.tip bs=1466 count=1 dd if=/dev/urandom of=`pwd`/_u.nvd bs=966 count=1 dd if=/dev/urandom of=`pwd`/_u.nvm bs=140 count=1 dd if=/dev/urandom of=`pwd`/_u.si bs=409 count=1 This did not cause any immediate problems with the PingRequestHandler because the query was cached. Worth a bug? It did of course cause problems for the health monitor following RELOAD or complete solr restart; which was enough for me. Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, National Library of Medicine, NIH
Re: Indexing Fixed length file
Ah yes, I should have made my example use tabs, though that currently would have required also adding “separator=%09” to the params. I definitely support the use of tabs for what they were intended, delimiting columns of data. +1, thanks for that mention Alex On Aug 28, 2015, at 1:38 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Erik's version might be better with tabs though to avoid CSV's requirements on escaping comas, quotes, etc. And maybe trim those fields a bit either in awk or in URP inside Solr. But it would definitely work. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 28 August 2015 at 12:39, Erik Hatcher erik.hatc...@gmail.com wrote: How about this incantation: $ bin/solr create -c fw $ echo Q36 | awk -v OFS=, '{ print substr($0, 1, 1), substr($0, 2, 2) }' | bin/post -c fw -params fieldnames=id,valheader=false -type text/csv -d $ curl 'http://localhost:8983/solr/fw/select?q=*:*wt=csv' val,_version_,id 36,1510767115252006912,Q With a big bunch of data, the stdin detection of bin/post doesn’t work well so I’d certainly recommend going to an intermediate real file (awk... data.csv ; bin/post … data.csv) instead. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com On Aug 28, 2015, at 3:19 AM, timmsn tim.hammac...@web.de wrote: Hello, i use Solr 5.2.1 and the bin/post tool. I try to set the index of some files they have a fixed length and no withespace to seperate the words. How can i Programm a Template or so for my fields? Or can i edit the schema.xml for my Problem? This ist one record from one file, in this file are 40 - 100 records. AB134364312 58553521789 245678923521234130311G11222345610711MUELLER, MAX -00014680Q1-24579021-204052667980002 EEUR 0223/123835062 130445 Thanks! Tim -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Fixed-length-file-tp4225807.html Sent from the Solr - User mailing list archive at Nabble.com.
Dynamic field rule plugin?
Hi, I am new to Solr and am trying to create dynamic field rules in my Schema. I would like to use file name suffix to indicate other properties besides the data type and multivalued as provided in the default schema. It appears that specifying this via a pattern leads to duplication as there are various combinations that need to be specified here. It would help to have code where I can build parts of the rule e.g. if suffix has '_s' then set stored=true if suffix has '_m' then set multivalued=true and so on From the documentation and various implementation examples (drupal etc) I can only see them specifying all combinations. Is there any way (plugin?) to incrementally build the rule? Thanks, Hari
RE: Data Import Handler Stays Idle
Only a month late to respond, and the response likely won't help. I agree with Shawn that Tika can be a memory hog. I try to leave 1GB per thread, but your mileage will vary dramatically depending on your docs. I'd expect that you'd get an OOM, though, somewhere... There have been rare bugs in various parsers, including the PDFParser, in various versions of Tika that cause permanent hangs. I haven't experimented with DIH and known trigger files, but I suspect you'd get the behavior that you're seeing if this were to happen. So, short of rolling your own ETL'r in lieu of DIH or hardening DIH to run tika in a different process (tika-server, perhaps -- https://issues.apache.org/jira/browse/SOLR-7632) or going big with Hadoop, morphlines, etc, your only hope is to upgrade Tika and hope that that was one of the bugs that we've already identified and fixed. If you do go with morphlines...I don't think this has been fixed yet: https://github.com/kite-sdk/kite/issues/397 Did you ever figure out what was going wrong? Best, Tim -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Tuesday, July 21, 2015 10:41 AM To: solr-user@lucene.apache.org Subject: Re: Data Import Handler Stays Idle On 7/21/2015 8:17 AM, Paden wrote: There are some zip files inside the directory and have been addressed to in the database. I'm thinking those are the one's it's jumping right over. They are not the issue. At least I'm 95% sure. And Shawn if you're still watching I'm sorry I'm using solr-5.1.0. Have you started Solr with a larger heap than the default 512MB in Solr 5.x? Tika can require a lot of memory. I would have expected there to be OutOfMemoryError exceptions in the log if that were the problem, though. You may need to use the -m option on the startup scripts to increase the max heap. Starting with -m 2g would be a good idea. Also, seeing the entire multi-line IOException from the log (which may be dozens of lines) could be important. Thanks, Shawn
RE: Data Import Handler Stays Idle
There are some zip files inside the directory and have been addressed to in the database. I'm thinking those are the one's it's jumping right over. With SOLR-7189, which should have kicked in for 5.1, Tika shouldn't skip over Zip files, it should process all the contents of those zips and concatenate the extracted text into one string. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Tuesday, July 21, 2015 10:41 AM To: solr-user@lucene.apache.org Subject: Re: Data Import Handler Stays Idle On 7/21/2015 8:17 AM, Paden wrote: There are some zip files inside the directory and have been addressed to in the database. I'm thinking those are the one's it's jumping right over. They are not the issue. At least I'm 95% sure. And Shawn if you're still watching I'm sorry I'm using solr-5.1.0. Have you started Solr with a larger heap than the default 512MB in Solr 5.x? Tika can require a lot of memory. I would have expected there to be OutOfMemoryError exceptions in the log if that were the problem, though. You may need to use the -m option on the startup scripts to increase the max heap. Starting with -m 2g would be a good idea. Also, seeing the entire multi-line IOException from the log (which may be dozens of lines) could be important. Thanks, Shawn
PingRequestHandler and file corruption
This is a resend to correct my awful subject. From: Davis, Daniel (NIH/NLM) [C] Sent: Friday, August 28, 2015 2:15 PM To: solr-user@lucene.apache.org Subject: ping handler very doubtful So, I tested that the PingRequestHandler works in the following fashion: cd server/corename/data/index # some work with ls and awk to produce a script, and then it runs dd if=/dev/urandom of=`pwd`/segments_10 bs=160 count=1 dd if=/dev/urandom of=`pwd`/_u.fdt bs=41512 count=1 dd if=/dev/urandom of=`pwd`/_u.fdx bs=100 count=1 dd if=/dev/urandom of=`pwd`/_u.fnm bs=1573 count=1 dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.doc bs=16824 count=1 dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.pos bs=17677 count=1 dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.tim bs=54010 count=1 dd if=/dev/urandom of=`pwd`/_u_Lucene50_0.tip bs=1466 count=1 dd if=/dev/urandom of=`pwd`/_u.nvd bs=966 count=1 dd if=/dev/urandom of=`pwd`/_u.nvm bs=140 count=1 dd if=/dev/urandom of=`pwd`/_u.si bs=409 count=1 This did not cause any immediate problems with the PingRequestHandler because the query was cached. Worth a bug? It did of course cause problems for the health monitor following RELOAD or complete solr restart; which was enough for me. Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, National Library of Medicine, NIH