Re: Indexing gets significantly slower after every batch commit
Hi Angel, a while ago I had issues with VMWare VM - somehow snapshots were created regularly which dragged down the machine. So I think is is a good idea to baseline the performance on physical box before moving to VMs, production boxes or whatever is thrown at you Cheers, Siegfried Goeschl > On 22 May 2015, at 11:15, Angel Todorov wrote: > > Thanks for the feedback guys. What i am going to try now is deploying my > SOLR server on a physical machine with more RAM, and checking out this > scenario there. I have some suspicion it could well be a hypervisor issue, > but let's see. Just for the record - I've noticed those issues on a Win > 2008R2 VM with 8 GB of RAM and 2 cores. > > I don't see anything strange in the logs. One thing that I need to change, > though, is the verbosity of logs in the console - looks like by default > SOLR outputs text in the log for every single document that's indexed, as > well as for every query that's executed. > > Angel > > > On Fri, May 22, 2015 at 1:03 AM, Erick Erickson > wrote: > >> bq: Which is logical as index growth and time needed to put something >> to it is log(n) >> >> Not really. Solr indexes to segments, each segment is a fully >> consistent "mini index". >> When a segment gets flushed to disk, a new one is started. Of course >> there'll be a >> _little bit_ of added overyead, but it shouldn't be all that noticeable. >> >> Furthermore, they're "append only". In the past, when I've indexed the >> Wiki example, >> my indexing speed actually goes faster. >> >> So on the surface this sounds very strange to me. Are you seeing >> anything at all in the >> Solr logs that's supsicious? >> >> Best, >> Erick >> >> On Thu, May 21, 2015 at 12:22 PM, Sergey Shvets >> wrote: >>> Hi Angel >>> >>> We also noticed that kind of performance degrade in our workloads. >>> >>> Which is logical as index growth and time needed to put something to it >> is >>> log(n) >>> >>> >>> >>> четверг, 21 мая 2015 г. пользователь Angel Todorov написал: >>> >>>> hi Shawn, >>>> >>>> Thanks a bunch for your feedback. I've played with the heap size, but I >>>> don't see any improvement. Even if i index, say , a million docs, and >> the >>>> throughput is about 300 docs per sec, and then I shut down solr >> completely >>>> - after I start indexing again, the throughput is dropping below 300. >>>> >>>> I should probably experiment with sharding those documents to multiple >> SOLR >>>> cores - that should help, I guess. I am talking about something like >> this: >>>> >>>> >>>> >> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud >>>> >>>> Thanks, >>>> Angel >>>> >>>> >>>> On Thu, May 21, 2015 at 11:36 AM, Shawn Heisey >>> > wrote: >>>> >>>>> On 5/21/2015 2:07 AM, Angel Todorov wrote: >>>>>> I'm crawling a file system folder and indexing 10 million docs, and >> I >>>> am >>>>>> adding them in batches of 5000, committing every 50 000 docs. The >>>>> problem I >>>>>> am facing is that after each commit, the documents per sec that are >>>>> indexed >>>>>> gets less and less. >>>>>> >>>>>> If I do not commit at all, I can index those docs very quickly, and >>>> then >>>>> I >>>>>> commit once at the end, but once i start indexing docs _after_ that >>>> (for >>>>>> example new files get added to the folder), indexing is also slowing >>>>> down a >>>>>> lot. >>>>>> >>>>>> Is it normal that the SOLR indexing speed depends on the number of >>>>>> documents that are _already_ indexed? I think it shouldn't matter >> if i >>>>>> start from scratch or I index a document in a core that already has >> a >>>>>> couple of million docs. Looks like SOLR is either doing something >> in a >>>>>> linear fashion, or there is some magic config parameter that I am >> not >>>>> aware >>>>>> of. >>>>>> >>>>>> I've read all perf docs, and I've tried changing mergeFactor, >>>>>> autowarmCounts, and the buffer sizes - to no avail. >>>>>> >>>>>> I am using SOLR 5.1 >>>>> >>>>> Have you changed the heap size? If you use the bin/solr script to >> start >>>>> it and don't change the heap size with the -m option or another >> method, >>>>> Solr 5.1 runs with a default size of 512MB, which is *very* small. >>>>> >>>>> I bet you are running into problems with frequent and then ultimately >>>>> constant garbage collection, as Java attempts to free up enough memory >>>>> to allow the program to continue running. If that is what is >> happening, >>>>> then eventually you will see an OutOfMemoryError exception. The >>>>> solution is to increase the heap size. I would probably start with at >>>>> least 4G for 10 million docs. >>>>> >>>>> Thanks, >>>>> Shawn >>>>> >>>>> >>>> >>
Re: New article on ZK "Poison Packet"
Cool stuff - thanks for sharing Siegfried Goeschl > On 09 May 2015, at 08:43, steve wrote: > > While very technical and unusual, a very interesting view of the world of > Linux and ZooKeeper Clusters... > http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-packet/ >
Re: Indexing PDF and MS Office files
Hi Vijay, I know the this road too well :-) For PDF you can fallback to other tools for text extraction * ps2ascii.ps * XPDF's pdftotext CLI utility (more comfortable than Ghostscript) * some other tools exists as well (pdflib) If you start command line tools from your JVM please have a look at commons-exec :-) Cheers, Siegfried Goeschl PS: one more thing - please, tell your management that you will never ever successfully all real-world PDFs and cater for that fact in your requirements :-) On 16.04.15 13:10, Vijaya Narayana Reddy Bhoomi Reddy wrote: Erick, I tried indexing both ways - SolrJ / Tika's AutoParser and as well as SolrCell's ExtractRequestHandler. Majority of the PDF and Word documents are getting parsed properly and indexed into Solr. However, a minority of them keep failing wither PDFParser or OfficeParser error. Not sure if this behaviour can be modified so that all the documents can be indexed. The business requirement we have is to index all the documents. However, if a small percentage of them fails, not sure what other ways exist to index them. Any help please? Thanks & Regards Vijay On 15 April 2015 at 15:20, Erick Erickson wrote: There's quite a discussion here: https://issues.apache.org/jira/browse/SOLR-7137 But, I personally am not a huge fan of pushing all the work on to Solr, in a production environment the Solr server is responsible for indexing, parsing the docs through Tika, perhaps searching etc. This doesn't scale all that well. So an alternative is to use SolrJ with Tika, which is totally independent of what version of Tika is on the Solr server. Here's an example. http://lucidworks.com/blog/indexing-with-solrj/ Best, Erick On Wed, Apr 15, 2015 at 4:46 AM, Vijaya Narayana Reddy Bhoomi Reddy wrote: Thanks everyone for the responses. Now I am able to index PDF documents successfully. I have implemented manual extraction using Tika's AutoParser and PDF functionality is working fine. However, the error with some MS office word documents still persist. The error message is "java.lang.IllegalArgumentException: This paragraph is not the first one in the table" which will eventually result in "Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser" Upon some reading, it looks like its a bug with Tika 1.5 and seems to have been fixed with Tika 1.6 ( https://issues.apache.org/jira/browse/TIKA-1251 ). I am new to Solr / Tika and hence wondering whether I can change the Tika library alone to v1.6 without impacting any of the libraries within Solr 4.10.2? Please let me know your response and how to get away with this issue. Many thanks in advance. Thanks & Regards Vijay On 15 April 2015 at 05:14, Shyam R wrote: Vijay, You could try different excel files with different formats to rule out the issue is with TIKA version being used. Thanks Murthy On Wed, Apr 15, 2015 at 9:35 AM, Terry Rhodes wrote: Perhaps the PDF is protected and the content can not be extracted? i have an unverified suspicion that the tika shipped with solr 4.10.2 may not support some/all office 2013 document formats. On 4/14/2015 8:18 PM, Jack Krupansky wrote: Try doing a manual extraction request directly to Solr (not via SolrJ) and use the extractOnly option to see if the content is actually extracted. See: https://cwiki.apache.org/confluence/display/solr/ Uploading+Data+with+Solr+Cell+using+Apache+Tika Also, some PDF files actually have the content as a bitmap image, so no text is extracted. -- Jack Krupansky On Tue, Apr 14, 2015 at 10:57 AM, Vijaya Narayana Reddy Bhoomi Reddy < vijaya.bhoomire...@whishworks.com> wrote: Hi, I am trying to index PDF and Microsoft Office files (.doc, .docx, .ppt, .pptx, .xlx, and .xlx) files into Solr. I am facing the following issues. Request to please let me know what is going wrong with the indexing process. I am using solr 4.10.2 and using the default example server configuration that comes with Solr distribution. PDF Files - Indexing as such works fine, but when I query using *.* in the Solr Query console, metadata information is displayed properly. However, the PDF content field is empty. This is happening for all PDF files I have tried. I have tried with some proprietary files, PDF eBooks etc. Whatever be the PDF file, content is not being displayed. MS Office files - For some office files, everything works perfect and the extracted content is visible in the query console. However, for others, I see the below error message during the indexing process. *Exception in thread "main" org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser* I am using SolrJ to index the documents and below is the code snippet related to indexing. Pleas
Re: Measuring QPS
Hi Walter, sort of shameless plug - I ran into similar issues and wrote a JMeter SLA Reporting Backend - https://github.com/sgoeschl/jmeter-sla-report <https://github.com/sgoeschl/jmeter-sla-report> * It reads the CSV/XML JMeter report file and sorts the response times in logarithmic buckets * the XML processor uses a Stax parser to handle huge JTL files (exceeding 1 GB) * it also caters for merging JTL files when running multiple JMeter instances Cheers, Siegfried Goeschl > On 06 Apr 2015, at 22:57, Walter Underwood wrote: > > The load testing is the easiest part. > > We use JMeter to replay the prod logs. We start about a hundred threads and > use ConstantThroughputTimer to control the traffic level. JMeter tends to > fall over with two much data graphing, so we run it headless. Then we post > process with JMeter Plugins to get percentiles. > > The complicated part of the servlet filter was getting it configured in > Tomcat. The code itself is not too bad. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > On Apr 6, 2015, at 1:49 PM, Siegfried Goeschl wrote: > >> The good-sounding thing - you can do that easily with JMeter running the GUI >> or the command-line >> >> Cheers, >> >> Siegfried Goeschl >> >>> On 06 Apr 2015, at 21:35, Davis, Daniel (NIH/NLM) [C] >>> wrote: >>> >>> This sounds really good: >>> >>> "For load testing, we replay production logs to test that we meet the SLA >>> at a given traffic level." >>> >>> The rest sounds complicated. Ah well, that's the job. >>> >>> -Original Message- >>> From: Walter Underwood [mailto:wun...@wunderwood.org] >>> Sent: Monday, April 06, 2015 2:48 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Measuring QPS >>> >>> We built a servlet request filter that is configured in front of the Solr >>> servlets. It reports response times to metricsd, using the Codahale library. >>> >>> That gives us counts, rates, and response time metrics. We mostly look at >>> percentiles, because averages are thrown off by outliers. Average is just >>> the wrong metric for a one-sided distribution like response times. >>> >>> We use Graphite to display the 95th percentile response time for each >>> request handler. We use Tattle for alerting on those metrics. >>> >>> We also use New Relic for a different look at the performance. It is good >>> at tracking from the front end through to Solr. >>> >>> For load testing, we replay production logs to test that we meet the SLA at >>> a given traffic level. >>> >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>> On Apr 6, 2015, at 11:31 AM, Davis, Daniel (NIH/NLM) [C] >>> wrote: >>> >>>> OK, >>>> >>>> I have a lot of chutzpah posting that here ;)The other guys answering >>>> the questions can probably explain it better. >>>> I love showing off, however, so please forgive me. >>>> >>>> -Original Message- >>>> From: Davis, Daniel (NIH/NLM) [C] >>>> Sent: Monday, April 06, 2015 2:25 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: RE: Measuring QPS >>>> >>>> Its very common to do autocomplete based on popular queries/titles over >>>> some sliding time window. Some enterprise search systems even apply age >>>> weighting so that they don't need to re-index but continuously add to the >>>> index. This way, they can do autocomplete based on what's popular these >>>> days. >>>> >>>> We use relevance/field boosts/phrase matching etc. to get the best guess >>>> about what results they want to see. This is similar - we use relevance, >>>> field boosting to guess what users want to search for. Zipf's law >>>> applies to searches as well as results. >>>> >>>> -Original Message- >>>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at] >>>> Sent: Monday, April 06, 2015 2:17 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Measuring QPS >>>> >>>> Hi Daniel, >>>> >>>> interesting - I never thought of autocompletion but for keeping track >>>> of user behaviour :-) >
Re: Measuring QPS
The good-sounding thing - you can do that easily with JMeter running the GUI or the command-line Cheers, Siegfried Goeschl > On 06 Apr 2015, at 21:35, Davis, Daniel (NIH/NLM) [C] > wrote: > > This sounds really good: > > "For load testing, we replay production logs to test that we meet the SLA at > a given traffic level." > > The rest sounds complicated. Ah well, that's the job. > > -Original Message- > From: Walter Underwood [mailto:wun...@wunderwood.org] > Sent: Monday, April 06, 2015 2:48 PM > To: solr-user@lucene.apache.org > Subject: Re: Measuring QPS > > We built a servlet request filter that is configured in front of the Solr > servlets. It reports response times to metricsd, using the Codahale library. > > That gives us counts, rates, and response time metrics. We mostly look at > percentiles, because averages are thrown off by outliers. Average is just the > wrong metric for a one-sided distribution like response times. > > We use Graphite to display the 95th percentile response time for each request > handler. We use Tattle for alerting on those metrics. > > We also use New Relic for a different look at the performance. It is good at > tracking from the front end through to Solr. > > For load testing, we replay production logs to test that we meet the SLA at a > given traffic level. > > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > On Apr 6, 2015, at 11:31 AM, Davis, Daniel (NIH/NLM) [C] > wrote: > >> OK, >> >> I have a lot of chutzpah posting that here ;)The other guys answering >> the questions can probably explain it better. >> I love showing off, however, so please forgive me. >> >> -Original Message- >> From: Davis, Daniel (NIH/NLM) [C] >> Sent: Monday, April 06, 2015 2:25 PM >> To: solr-user@lucene.apache.org >> Subject: RE: Measuring QPS >> >> Its very common to do autocomplete based on popular queries/titles over some >> sliding time window. Some enterprise search systems even apply age >> weighting so that they don't need to re-index but continuously add to the >> index. This way, they can do autocomplete based on what's popular these >> days. >> >> We use relevance/field boosts/phrase matching etc. to get the best guess >> about what results they want to see. This is similar - we use relevance, >> field boosting to guess what users want to search for. Zipf's law applies >> to searches as well as results. >> >> -Original Message- >> From: Siegfried Goeschl [mailto:sgoes...@gmx.at] >> Sent: Monday, April 06, 2015 2:17 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Measuring QPS >> >> Hi Daniel, >> >> interesting - I never thought of autocompletion but for keeping track >> of user behaviour :-) >> >> * the numbers are helpful for the online advertisement team to sell >> campaigns >> * it is used for sanity checks - sensible queries returning no results >> or returning too many results >> >> Cheers, >> >> Siegfried Goeschl >> >>> On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C] >>> wrote: >>> >>> Siegfried, >>> >>> It is early days as yet. I don't think we need a code drop. AFAIK, none >>> of our current Solr applications autocomplete the search box based on >>> popular query/title keywords. We have other applications that do that, >>> but they don't use Solr. >>> >>> Thanks again, >>> >>> Dan >>> >>> -Original Message- >>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at] >>> Sent: Monday, April 06, 2015 1:42 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Measuring QPS >>> >>> Hi Dan, >>> >>> at willhaben.at (customer of mine) two SOLR components were written >>> for SOLR 3 and ported to SORL 4 >>> >>> 1) SlowQueryLog which dumps long-running search requests into a log >>> file >>> >>> 2) Most Frequent Search Terms allowing to query & filter the most >>> frequent user search terms over the browser >>> >>> Some notes along the line >>> >>> >>> * For both components I have the "GO" to open source them but I never >>> had enough time to do that (shame on me) - see >>> https://issues.apache.org/jira/browse/SOLR-4056 >>> >>> * The Most
Re: Measuring QPS
Appreciated :-) Siegfried Goeschl > On 06 Apr 2015, at 20:31, Davis, Daniel (NIH/NLM) [C] > wrote: > > OK, > > I have a lot of chutzpah posting that here ;)The other guys answering the > questions can probably explain it better. > I love showing off, however, so please forgive me. > > -Original Message- > From: Davis, Daniel (NIH/NLM) [C] > Sent: Monday, April 06, 2015 2:25 PM > To: solr-user@lucene.apache.org > Subject: RE: Measuring QPS > > Its very common to do autocomplete based on popular queries/titles over some > sliding time window. Some enterprise search systems even apply age > weighting so that they don't need to re-index but continuously add to the > index. This way, they can do autocomplete based on what's popular these > days. > > We use relevance/field boosts/phrase matching etc. to get the best guess > about what results they want to see. This is similar - we use relevance, > field boosting to guess what users want to search for. Zipf's law applies > to searches as well as results. > > -Original Message- > From: Siegfried Goeschl [mailto:sgoes...@gmx.at] > Sent: Monday, April 06, 2015 2:17 PM > To: solr-user@lucene.apache.org > Subject: Re: Measuring QPS > > Hi Daniel, > > interesting - I never thought of autocompletion but for keeping track of user > behaviour :-) > > * the numbers are helpful for the online advertisement team to sell campaigns > * it is used for sanity checks - sensible queries returning no results or > returning too many results > > Cheers, > > Siegfried Goeschl > >> On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C] >> wrote: >> >> Siegfried, >> >> It is early days as yet. I don't think we need a code drop. AFAIK, none >> of our current Solr applications autocomplete the search box based on >> popular query/title keywords. We have other applications that do that, but >> they don't use Solr. >> >> Thanks again, >> >> Dan >> >> -Original Message- >> From: Siegfried Goeschl [mailto:sgoes...@gmx.at] >> Sent: Monday, April 06, 2015 1:42 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Measuring QPS >> >> Hi Dan, >> >> at willhaben.at (customer of mine) two SOLR components were written >> for SOLR 3 and ported to SORL 4 >> >> 1) SlowQueryLog which dumps long-running search requests into a log >> file >> >> 2) Most Frequent Search Terms allowing to query & filter the most >> frequent user search terms over the browser >> >> Some notes along the line >> >> >> * For both components I have the “GO" to open source them but I never >> had enough time to do that (shame on me) - see >> https://issues.apache.org/jira/browse/SOLR-4056 >> >> * The Most Frequent Search Term component actually mimics a SOLR >> server you feed the user search terms so this might be a better >> solution in the long run. But this requires to have a separate SOLR >> core & ingest plus GUI (check out SILK or ELK) - in other words more >> moving parts in production :-) >> >> * If there is sufficient interest I can make a code drop on GitHub >> >> Cheers, >> >> Siegfried Goeschl >> >> >> >>> On 06 Apr 2015, at 16:25, Davis, Daniel (NIH/NLM) [C] >>> wrote: >>> >>> Siegfried, >>> >>> This is a wonderful find. The second presentation is a nice write-up of a >>> large number of free tools. The first presentation prompts a question - >>> did you add custom request handlers/code to automate determination of best >>> user search terms? Did any of your custom work end-up in Solr? >>> >>> Thank you so much, >>> >>> Dan >>> >>> P.S. - your first presentation takes me back to seeing "Angrif der >>> Klonkrieger" in Berlin after a conference - Hayden Christensen was less >>> annoying in German, because my wife and I don't speak German ;) I haven't >>> thought of that in a while. >>> >>> -Original Message- >>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at] >>> Sent: Saturday, April 04, 2015 4:54 AM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Measuring QPS >>> >>> Hi Dan, >>> >>> I’m using JavaMelody for my SOLR production servers - gives you the >>> relevant HTTP stats (what’s happening now & historical data)
Re: Measuring QPS
Hi Daniel, interesting - I never thought of autocompletion but for keeping track of user behaviour :-) * the numbers are helpful for the online advertisement team to sell campaigns * it is used for sanity checks - sensible queries returning no results or returning too many results Cheers, Siegfried Goeschl > On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C] > wrote: > > Siegfried, > > It is early days as yet. I don't think we need a code drop. AFAIK, none > of our current Solr applications autocomplete the search box based on popular > query/title keywords. We have other applications that do that, but they > don't use Solr. > > Thanks again, > > Dan > > -Original Message- > From: Siegfried Goeschl [mailto:sgoes...@gmx.at] > Sent: Monday, April 06, 2015 1:42 PM > To: solr-user@lucene.apache.org > Subject: Re: Measuring QPS > > Hi Dan, > > at willhaben.at (customer of mine) two SOLR components were written for SOLR > 3 and ported to SORL 4 > > 1) SlowQueryLog which dumps long-running search requests into a log file > > 2) Most Frequent Search Terms allowing to query & filter the most frequent > user search terms over the browser > > Some notes along the line > > > * For both components I have the “GO" to open source them but I never had > enough time to do that (shame on me) - see > https://issues.apache.org/jira/browse/SOLR-4056 > > * The Most Frequent Search Term component actually mimics a SOLR server you > feed the user search terms so this might be a better solution in the long > run. But this requires to have a separate SOLR core & ingest plus GUI (check > out SILK or ELK) - in other words more moving parts in production :-) > > * If there is sufficient interest I can make a code drop on GitHub > > Cheers, > > Siegfried Goeschl > > > >> On 06 Apr 2015, at 16:25, Davis, Daniel (NIH/NLM) [C] >> wrote: >> >> Siegfried, >> >> This is a wonderful find. The second presentation is a nice write-up of a >> large number of free tools. The first presentation prompts a question - >> did you add custom request handlers/code to automate determination of best >> user search terms? Did any of your custom work end-up in Solr? >> >> Thank you so much, >> >> Dan >> >> P.S. - your first presentation takes me back to seeing "Angrif der >> Klonkrieger" in Berlin after a conference - Hayden Christensen was less >> annoying in German, because my wife and I don't speak German ;) I haven't >> thought of that in a while. >> >> -Original Message- >> From: Siegfried Goeschl [mailto:sgoes...@gmx.at] >> Sent: Saturday, April 04, 2015 4:54 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Measuring QPS >> >> Hi Dan, >> >> I’m using JavaMelody for my SOLR production servers - gives you the >> relevant HTTP stats (what’s happening now & historical data) plus JVM >> monitoring as additional benefit. The servers are deployed on Tomcat >> so I’m of little help regarding Jetty - having said that >> >> * you need two Jars (javamelody & robin) >> * tinker with web.xml >> >> Here are two of my presentations mentioning JavaMelody (plus some >> other stuff) >> >> http://people.apache.org/~sgoeschl/presentations/solr-from-development >> -to-production-20121210.pdf >> <http://people.apache.org/~sgoeschl/presentations/solr-from-developmen >> t-to-production-20121210.pdf> >> http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-perform >> ance-monitoring.pdf >> <http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-perfor >> mance-monitoring.pdf> >> >> Cheers, >> >> Siegfried Goeschl >> >>> On 03 Apr 2015, at 17:53, Shawn Heisey wrote: >>> >>> On 4/3/2015 9:37 AM, Davis, Daniel (NIH/NLM) [C] wrote: >>>> I wanted to gather QPS for our production Solr instances, but I was >>>> surprised that the Admin UI did not contain this information. We are >>>> running a mix of versions, but mostly 4.10 at this point. We are not >>>> using SolrCloud at present; that's part of why I'm checking - I want to >>>> validate the size of our existing setup and what sort of SolrCloud setup >>>> would be needed to centralize several of them. >>>> >>>> What is the best way to gather QPS information? >>>> >>>> What is the best way to add information like this to the Admin UI, if I >>>> decide to take that step? >>> >>> As of Solr 4.1 (three years ago), request rate information is >>> available in the admin UI and via JMX. In the admin UI, choose a >>> core from the dropdown, click on Plugins/Stats, then QUERYHANDLER, >>> and open the handler you wish to examine. You have >>> avgRequestsPerSecond, which is calculated for the entire runtime of >>> the SolrCore, as well as 5minRateReqsPerSecond and >>> 15minRateReqsPerSecond, which are far more useful pieces of information. >>> >>> https://issues.apache.org/jira/browse/SOLR-1972 >>> >>> Thanks, >>> Shawn >>> >> >
Re: Measuring QPS
Hi Dan, at willhaben.at (customer of mine) two SOLR components were written for SOLR 3 and ported to SORL 4 1) SlowQueryLog which dumps long-running search requests into a log file 2) Most Frequent Search Terms allowing to query & filter the most frequent user search terms over the browser Some notes along the line * For both components I have the “GO" to open source them but I never had enough time to do that (shame on me) - see https://issues.apache.org/jira/browse/SOLR-4056 * The Most Frequent Search Term component actually mimics a SOLR server you feed the user search terms so this might be a better solution in the long run. But this requires to have a separate SOLR core & ingest plus GUI (check out SILK or ELK) - in other words more moving parts in production :-) * If there is sufficient interest I can make a code drop on GitHub Cheers, Siegfried Goeschl > On 06 Apr 2015, at 16:25, Davis, Daniel (NIH/NLM) [C] > wrote: > > Siegfried, > > This is a wonderful find. The second presentation is a nice write-up of a > large number of free tools. The first presentation prompts a question - did > you add custom request handlers/code to automate determination of best user > search terms? Did any of your custom work end-up in Solr? > > Thank you so much, > > Dan > > P.S. - your first presentation takes me back to seeing "Angrif der > Klonkrieger" in Berlin after a conference - Hayden Christensen was less > annoying in German, because my wife and I don't speak German ;) I haven't > thought of that in a while. > > -Original Message- > From: Siegfried Goeschl [mailto:sgoes...@gmx.at] > Sent: Saturday, April 04, 2015 4:54 AM > To: solr-user@lucene.apache.org > Subject: Re: Measuring QPS > > Hi Dan, > > I’m using JavaMelody for my SOLR production servers - gives you the relevant > HTTP stats (what’s happening now & historical data) plus JVM monitoring as > additional benefit. The servers are deployed on Tomcat so I’m of little help > regarding Jetty - having said that > > * you need two Jars (javamelody & robin) > * tinker with web.xml > > Here are two of my presentations mentioning JavaMelody (plus some other stuff) > > http://people.apache.org/~sgoeschl/presentations/solr-from-development-to-production-20121210.pdf > > <http://people.apache.org/~sgoeschl/presentations/solr-from-development-to-production-20121210.pdf> > http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-performance-monitoring.pdf > > <http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-performance-monitoring.pdf> > > > Cheers, > > Siegfried Goeschl > >> On 03 Apr 2015, at 17:53, Shawn Heisey wrote: >> >> On 4/3/2015 9:37 AM, Davis, Daniel (NIH/NLM) [C] wrote: >>> I wanted to gather QPS for our production Solr instances, but I was >>> surprised that the Admin UI did not contain this information. We are >>> running a mix of versions, but mostly 4.10 at this point. We are not >>> using SolrCloud at present; that's part of why I'm checking - I want to >>> validate the size of our existing setup and what sort of SolrCloud setup >>> would be needed to centralize several of them. >>> >>> What is the best way to gather QPS information? >>> >>> What is the best way to add information like this to the Admin UI, if I >>> decide to take that step? >> >> As of Solr 4.1 (three years ago), request rate information is >> available in the admin UI and via JMX. In the admin UI, choose a core >> from the dropdown, click on Plugins/Stats, then QUERYHANDLER, and open >> the handler you wish to examine. You have avgRequestsPerSecond, which >> is calculated for the entire runtime of the SolrCore, as well as >> 5minRateReqsPerSecond and 15minRateReqsPerSecond, which are far more >> useful pieces of information. >> >> https://issues.apache.org/jira/browse/SOLR-1972 >> >> Thanks, >> Shawn >> >
Re: Measuring QPS
Hi Dan, I’m using JavaMelody for my SOLR production servers - gives you the relevant HTTP stats (what’s happening now & historical data) plus JVM monitoring as additional benefit. The servers are deployed on Tomcat so I’m of little help regarding Jetty - having said that * you need two Jars (javamelody & robin) * tinker with web.xml Here are two of my presentations mentioning JavaMelody (plus some other stuff) http://people.apache.org/~sgoeschl/presentations/solr-from-development-to-production-20121210.pdf <http://people.apache.org/~sgoeschl/presentations/solr-from-development-to-production-20121210.pdf> http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-performance-monitoring.pdf <http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-performance-monitoring.pdf> Cheers, Siegfried Goeschl > On 03 Apr 2015, at 17:53, Shawn Heisey wrote: > > On 4/3/2015 9:37 AM, Davis, Daniel (NIH/NLM) [C] wrote: >> I wanted to gather QPS for our production Solr instances, but I was >> surprised that the Admin UI did not contain this information. We are >> running a mix of versions, but mostly 4.10 at this point. We are not using >> SolrCloud at present; that's part of why I'm checking - I want to validate >> the size of our existing setup and what sort of SolrCloud setup would be >> needed to centralize several of them. >> >> What is the best way to gather QPS information? >> >> What is the best way to add information like this to the Admin UI, if I >> decide to take that step? > > As of Solr 4.1 (three years ago), request rate information is available > in the admin UI and via JMX. In the admin UI, choose a core from the > dropdown, click on Plugins/Stats, then QUERYHANDLER, and open the > handler you wish to examine. You have avgRequestsPerSecond, which is > calculated for the entire runtime of the SolrCore, as well as > 5minRateReqsPerSecond and 15minRateReqsPerSecond, which are far more > useful pieces of information. > > https://issues.apache.org/jira/browse/SOLR-1972 > > Thanks, > Shawn >
Re: Trending functionality in Solr
Hi folks, I implemented something similar but never got around to contribute it - see https://issues.apache.org/jira/browse/SOLR-4056 The code was initially for SOLR3 but was recently ported to SOLR4 * capturing the most frequent search terms per core * supports ad-hoc queries * CSV export If you are interested we could team up and make a proper SOLR contribution :-) Cheers, Siegfried Goeschl On 08.02.15 05:26, S.L wrote: Folks, Is there a way to implement the trending functionality using Solr , to give the results using a query for say the most searched terms in the past hours or so , if the most searched terms is not possible is it possible to at least the get results for the last 100 terms? Thanks
Re: OutOfMemoryError for PDF document upload into Solr
Hi Dan, neat idea - made a mental note :-) That brings us back to the point that in complex setups you should not do the document pre-processing directly in SOLR but have an import process which can safely crash when processing a 4GB PDF file Cheers, Siegfried Goeschl On 16.01.15 05:02, Dan Davis wrote: Why re-write all the document conversion in Java ;) Tika is very slow. 5 GB PDF is very big. If you have a lot of PDF like that try pdftotext in HTML and UTF-8 output mode. The HTML mode captures some meta-data that would otherwise be lost. If you need to go faster still, you can also write some stuff linked directly against poppler library. Before you jump down by through about Tika being slow - I wrote a PDF indexer that ran at 36 MB/s per core. Different indexer, all C, lots of getjmp/longjmp. But fast... On Thu, Jan 15, 2015 at 1:54 PM, wrote: Siegfried and Michael Thank you for your replies and help. -Original Message- From: Siegfried Goeschl [mailto:sgoes...@gmx.at] Sent: Thursday, January 15, 2015 3:45 AM To: solr-user@lucene.apache.org Subject: Re: OutOfMemoryError for PDF document upload into Solr Hi Ganesh, you can increase the heap size but parsing a 4 GB PDF document will very likely consume A LOT OF memory - I think you need to check if that large PDF can be parsed at all :-) Cheers, Siegfried Goeschl On 14.01.15 18:04, Michael Della Bitta wrote: Yep, you'll have to increase the heap size for your Tomcat container. http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial -heap-size-correctly Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions <https://plus.google.com/u/0/b/112002776285509593336/11200277628550959 3336/posts> w: appinions.com <http://www.appinions.com/> On Wed, Jan 14, 2015 at 12:00 PM, wrote: Hello, Can someone pass on the hints to get around following error? Is there any Heap Size parameter I can set in Tomcat or in Solr webApp that gets deployed in Solr? I am running Solr webapp inside Tomcat on my local machine which has RAM of 12 GB. I have PDF document which is 4 GB max in size that needs to be loaded into Solr Exception in thread "http-apr-8983-exec-6" java.lang.: Java heap space at java.util.AbstractCollection.toArray(Unknown Source) at java.util.ArrayList.(Unknown Source) at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518) at org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(Coyo
Re: OutOfMemoryError for PDF document upload into Solr
Hi Ganesh, you can increase the heap size but parsing a 4 GB PDF document will very likely consume A LOT OF memory - I think you need to check if that large PDF can be parsed at all :-) Cheers, Siegfried Goeschl On 14.01.15 18:04, Michael Della Bitta wrote: Yep, you'll have to increase the heap size for your Tomcat container. http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial-heap-size-correctly Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions <https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> w: appinions.com <http://www.appinions.com/> On Wed, Jan 14, 2015 at 12:00 PM, wrote: Hello, Can someone pass on the hints to get around following error? Is there any Heap Size parameter I can set in Tomcat or in Solr webApp that gets deployed in Solr? I am running Solr webapp inside Tomcat on my local machine which has RAM of 12 GB. I have PDF document which is 4 GB max in size that needs to be loaded into Solr Exception in thread "http-apr-8983-exec-6" java.lang.: Java heap space at java.util.AbstractCollection.toArray(Unknown Source) at java.util.ArrayList.(Unknown Source) at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518) at org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1070) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2462) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2451) Thanks Ganesh
Re: Slow queries
Hi, using Jetty is the recommended approach while using Tomcat is not recommend (unless you are a Tomcat shop). But any discussion comes back to the original question - why is it slow now? Are you I/O-bound, are CPU-bound, how many documents are committed/deleted over the time, do you having expensive SOLR queries, what is your server code is doing - many questions and even more answers to that - in other words nobody can help you when the basic work is not done. And when you know your application performance-wise you probably also the solution :-) Cheers, Siegfried Goeschl > On 08 Dec 2014, at 11:00, melb wrote: > > THnks for the answer > A dedicated box will be a great solution but I will wait for that solution, > I have restricted sources > Is Optimze action can improve performance? > Is using default servlet engine Jetty can be harmful for the performance, > SHould I use an independant tomcat engine? > > rgds, > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032p4173092.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow queries
It might be a good idea to * move SOLR to a dedicated box :-) * load your SOLR server with 20.000.000 documents (the estimated number of documents after three years) and do performance testing & tuning Afterwards you have some hard facts about hardware sizing and expected performance for the next three years :-) Cheers, Siegfried Goeschl > On 02 Dec 2014, at 10:02, melb wrote: > > Yes performance degraded over the time, I can raise the memory but I can't > do it every time and the volume will keep growing > Is it better to put the solr on dedicated machine? > Is there any thing else that can be done to the solr instance for example > deviding the collection? > > rgds, > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032p4172039.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow queries
If you performance was fine but degraded over the time it might be easier to check / increase the memory to have better disk caching. Cheers, Siegfried Goeschl On 02.12.14 09:27, melb wrote: Hi, I have a solr collection with 16 millions documents and growing daily with 1 documents recently it is becoming slow to answer my request ( several seconds) specially when I use multi-words query I am running solr on a machine with 32G RAM but heavy used one What are my options to optimize the collection and speed up querying it is it normal with this volume of data? is sharding is a good solution? regards, -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AW: AW: slorj -> httpclient 4, but we already have httpclient 3 in use
Lucky you :-) Siegfried Goeschl On 19.09.14 07:31, Clemens Wyss DEV wrote: I'd like to mention, that substituting the httpcore.jar with the latest (4.3) "sufficed"... -Ursprüngliche Nachricht- Von: Guido Medina [mailto:guido.med...@temetra.com] Gesendet: Donnerstag, 18. September 2014 18:20 An: solr-user@lucene.apache.org Betreff: Re: AW: slorj -> httpclient 4, but we already have httpclient 3 in use SolrJ client after 4.8 I think requires HTTP client 4.3.x so why not just start there as base version? Guido. On 18/09/14 16:49, Siegfried Goeschl wrote: AFAIK even the different minor versions are source/binary compatible so you might need to tinker with the right "version" to get your server running Cheers, Siegfried Goeschl On 18.09.14 17:45, Guido Medina wrote: Hi Clemens, If you are going thru the effort of migrating from SolrJ 3 to 4 and HTTP client 3 to 4 make sure you do it using HTTP client 4.3.x (Latest is 4.3.5) since there are deprecations and stuff from 3.x to 4.0.x, to 4.1.x, to ..., to 4.3.x It will be painful but it is better do it one time and not later needed to do it again. I was on a similar situation (well my company) and I had to suffer such migration (not my company but myself since I'm the one that keeps all those things up to date) Best regards, Guido. On 18/09/14 16:14, Clemens Wyss DEV wrote: I guess you are right ;) -Ursprüngliche Nachricht- Von: Siegfried Goeschl [mailto:sgoes...@gmx.at] Gesendet: Donnerstag, 18. September 2014 16:38 An: solr-user@lucene.apache.org Betreff: Re: slorj -> httpclient 4, but we already have httpclient 3 in use Hi Clemens, I think you need to upgrade you framework * AFAIK is httpclient 3 & 4 uses the same package names - which is slightly unfortunate * assuming that they are using the same package name it is non-deterministic which httpclient library is loaded - might work on your local box but not on the production server or might change to a change in the project Cheers, Siegfried Goeschl On 18.09.14 15:08, Clemens Wyss DEV wrote: I doing initial steps with solrj which is based on httpclient 4. Unfortunately parts of our framework are based on httpclient 3. So when I instantiate an HttpSolrServer I run into: java.lang.VerifyError: Cannot inherit from final class ... at org.apache.http.impl.client.DefaultHttpClient.createHttpParams(Defa ultHttpClient.java:157) at org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHt tpClient.java:447) at org.apache.solr.client.solrj.impl.HttpClientUtil.setFollowRedirects (Ht tpClientUtil.java:255) ... Can these be run side-by-side at all?
Re: AW: slorj -> httpclient 4, but we already have httpclient 3 in use
AFAIK even the different minor versions are source/binary compatible so you might need to tinker with the right "version" to get your server running Cheers, Siegfried Goeschl On 18.09.14 17:45, Guido Medina wrote: Hi Clemens, If you are going thru the effort of migrating from SolrJ 3 to 4 and HTTP client 3 to 4 make sure you do it using HTTP client 4.3.x (Latest is 4.3.5) since there are deprecations and stuff from 3.x to 4.0.x, to 4.1.x, to ..., to 4.3.x It will be painful but it is better do it one time and not later needed to do it again. I was on a similar situation (well my company) and I had to suffer such migration (not my company but myself since I'm the one that keeps all those things up to date) Best regards, Guido. On 18/09/14 16:14, Clemens Wyss DEV wrote: I guess you are right ;) -Ursprüngliche Nachricht- Von: Siegfried Goeschl [mailto:sgoes...@gmx.at] Gesendet: Donnerstag, 18. September 2014 16:38 An: solr-user@lucene.apache.org Betreff: Re: slorj -> httpclient 4, but we already have httpclient 3 in use Hi Clemens, I think you need to upgrade you framework * AFAIK is httpclient 3 & 4 uses the same package names - which is slightly unfortunate * assuming that they are using the same package name it is non-deterministic which httpclient library is loaded - might work on your local box but not on the production server or might change to a change in the project Cheers, Siegfried Goeschl On 18.09.14 15:08, Clemens Wyss DEV wrote: I doing initial steps with solrj which is based on httpclient 4. Unfortunately parts of our framework are based on httpclient 3. So when I instantiate an HttpSolrServer I run into: java.lang.VerifyError: Cannot inherit from final class ... at org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttpClient.java:157) at org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClient.java:447) at org.apache.solr.client.solrj.impl.HttpClientUtil.setFollowRedirects(Ht tpClientUtil.java:255) ... Can these be run side-by-side at all?
Re: slorj -> httpclient 4, but we already have httpclient 3 in use
Hi Clemens, I think you need to upgrade you framework * AFAIK is httpclient 3 & 4 uses the same package names - which is slightly unfortunate * assuming that they are using the same package name it is non-deterministic which httpclient library is loaded - might work on your local box but not on the production server or might change to a change in the project Cheers, Siegfried Goeschl On 18.09.14 15:08, Clemens Wyss DEV wrote: I doing initial steps with solrj which is based on httpclient 4. Unfortunately parts of our framework are based on httpclient 3. So when I instantiate an HttpSolrServer I run into: java.lang.VerifyError: Cannot inherit from final class ... at org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttpClient.java:157) at org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClient.java:447) at org.apache.solr.client.solrj.impl.HttpClientUtil.setFollowRedirects(HttpClientUtil.java:255) ... Can these be run side-by-side at all?
Re: Mongo DB Users
remove please On 16.09.14 15:42, Karolina Dobromiła Jeleń wrote: remove please On Tue, Sep 16, 2014 at 9:35 AM, Amey Patil wrote: Remove. On Tue, Sep 16, 2014 at 12:58 PM, Joan wrote: Remove please 2014-09-16 6:59 GMT+02:00 Patti Kelroe-Cooke : Remove Kind regards Patti On Mon, Sep 15, 2014 at 5:35 PM, Aaron Susan wrote: Hi, I am here to inform you that we are having a contact list of *Mongo DB Users *would you be interested in it? Data Field’s Consist Of: Name, Job Title, Verified Phone Number, Verified Email Address, Company Name & Address Employee Size, Revenue size, SIC Code, Industry Type etc., We also provide other technology users as well depends on your requirement. For Example: *Red Hat * *Terra data * *Net-app * *NuoDB* *MongoHQ ** and many more* We also provide IT Decision Makers, Sales and Marketing Decision Makers, C-level Titles and other titles as per your requirement. Please review and let me know your interest if you are looking for above mentioned users list or other contacts list for your campaigns. Waiting for a positive response! Thanks *Aaron Susan* Data Specialist If you are not the right person, feel free to forward this email to the right person in your organization. To opt out response Remove
Re: external indexer for Solr Cloud
Hi folks, we are using Apache Camel but could use Spring Integration with the option to upgrade to Apache BatchEE or Spring Batch later on - especially Tikka document extraction can kill you server due to CPU consumption, memory usage and plain memory leaks AFAIK Douf Turnbull also improved the Camel Solr Integration http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/99739 Cheers, Siegfried Goeschl On 01.09.14 18:05, Jack Krupansky wrote: Packaging SolrCell in the same manner, with parallel threads and able to talk to multiple SolrCloud servers in parallel would have a lot of the same benefits as well. And maybe there could be some more generic Java framework for indexing as well, that "external indexers" in general could use. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Monday, September 1, 2014 11:42 AM To: solr-user@lucene.apache.org Subject: Re: external indexer for Solr Cloud On 9/1/2014 7:19 AM, Jack Krupansky wrote: It would be great to have a "standalone DIH" that runs as a separate server and then sends standard Solr update requests to a Solr cluster. This has been discussed, and I thought we had an issue in Jira, but I can't find it. A completely standalone DIH app would be REALLY nice. I already know that the JDBC ResultSet is not the bottleneck for indexing, at least for me. I once built a simple single-threaded SolrJ application that pulls data from JDBC and indexes it in Solr. It works in batches, typically 500 or 1000 docs at a time. When I comment out the "solr.add(docs)" line (so input object manipulation, casting, and building of the SolrInputDocument objects is still happening), it can read and manipulate our entire database (99.8 million documents) in about 20 minutes, but if I leave that in, it takes many hours. The bottleneck is that each DIH has only a single thread indexing to Solr. I've theorized that it should be *relatively* easy for me to write an application that pulls records off the JDBC ResultSet with multiple threads (say 10-20), have each thread figure out which shard its document lands on, and send it there with SolrJ. It might even be possible for the threads to collect several documents for each shard before indexing them in the same request. As with most multithreaded apps, the hard part is figuring out all the thread synchronization, making absolutely certain that thread timing is perfect without unnecessary delays. If I can figure out a generic approach (with a few configurable bells and whistles available), it might be something suitable for inclusion in the project, followed with improvements by all the smart people in our community. Thanks, Shawn
Re: SOLR Performance benchmarking
Hi Rashi, abnormal behaviour depends on your data, system and work load - I have seen abnormal behaviour at customers sites and it turned out to be a miracle that they the customer had no serious problems before :-) * running out of sockets - you might need to check if you have enough sockets (system quota) and that the sockets are closed properly (mostly a Windows/networking issue - CLOSED_WAIT) * understand your test setup - usually a test box is much smaller in terms of CPU/memory than you production box ** you might be forced to tweak your test configuration (e.g. production SOLR cache configuration can overwhelm a small server) * understand your work-load ** if you have long-running queries within your performance tests they tend to bring down your server under high-load and your “abnormal” condition looks very normal at hindsight ** spot your long-running queries, optimise them, re-run your tests ** check your cache warming and how fast you start your load injector threads Cheers, Siegfried Goeschl On 13 Jul 2014, at 09:53, rashi gandhi wrote: > Hi, > > I am using SolrMeter for load/stress testing solr performance. > Tomcat is configured with default "maxThreads" (i.e. 200). > > I set Intended Request per min in SolrMeter to 1500 and performed testing. > > I found that sometimes it works with this much load on solr but sometimes > it gives error "Sever Refused Connection" in solr. > On getting this error, i increased maxThreads to some higher value, and > then it works again. > > I would like to know why solr is behaving abnormally, initially when it was > working with maxThreads=200. > > Please provide me some pointers.
Re: SOLR: getting documents in the given order
Assuming that you just want to sort - have you tried using sort=id desc Cheers, Siegfried Goeschl On 04 Jun 2014, at 06:19, sachinpkale wrote: > I have a following field in SOLR schema. > > > required="false" multiValued="false"/> > > If I issue following query: > > id:(1234 OR 2345 OR 3456) > > SOLR does not return the documents in that order. It is giving document with > id 3456, then with 1234 and then with 2345. > > How do I get it in the same order as in the query? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-getting-documents-in-the-given-order-tp4139722.html > Sent from the Solr - User mailing list archive at Nabble.com.
iText hitting infinite loop - Was Re: pdfs
Hi folks, Brian was so kind and sent me the troublesome PDF document I gave it a try with PDFBox directly in order to extract the text (PDFBox is used by Tikka to extract the textual content of a PDF document) * hitting an infinite loop with PDFBox 1.8.3 * no problems with PDFBox 1.8.4 & 1.8.5 * PDFBox 1.8.4 is part of Apache Tika 1.5 (see http://www.apache.org/dist/tika/CHANGES-1.5.txt) * Apache SOLR 4.8 uses Tika 1.5 (see https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika) In short the problem with this particular PDF is solved by * Apache PDFBox 1.8.4 onwards * Apache Tika 1.5 * Apache SOLR 4.8 Cheers, Siegfried Goeschl On 26.05.14 18:20, Erick Erickson wrote: Brian: Yeah, if you can share the PDF that would be great. Parsing via Tika should not bring down Solr, although I supposed there could be something in Tika that is pathologically bad. You could also try using Tika itself in SolrJ and indexing from a client. That might let you 1> more gracefully handle this without shutting down Solr 2> use different versions of Tika. Personally I like offloading the document parsing to clients anyway since it lessens the load on the Solr server and scales much better, but YMMV. It's not actually very difficult, here's a skeleton (rip out the DB parts) http://searchhub.org/2012/02/14/indexing-with-solrj/ Best, Erick On Sun, May 25, 2014 at 2:07 AM, Siegfried Goeschl wrote: Sorry typo :- can you send me the PDF by email directly :-) Siegfried Goeschl On 25 May 2014, at 10:06, Siegfried Goeschl wrote: Hi Brian, can you send me the email? I would like to play around :-) Have you opened a JIRA for PdfBox? If not I willl open one if I can reproduce the issue … Thanks in advance Siegfried Goeschl On 25 May 2014, at 04:18, Brian McDowell wrote: Our feeding (indexing) tool halts because Solr becomes unresponsive after getting some really bad pdfs. There are levels of pdf "badness." Some just will not parse and that's fine, but others are more problematic in that our Operations team has to restart Solr because it just hangs and accepts no more documents. I actually have identified a pdf that will bring down Solr every time. Does anyone think that doing pre-validation using the pdfbox jar will work? Or, will trying to validate just hang as well? Any help is appreciated. On Thu, May 22, 2014 at 8:47 AM, Jack Krupansky wrote: Yeah, I recall running into infinite loop issues with PDFBox in Solr years ago. They keep fixing these issues, but they keep popping up again. Sigh. -- Jack Krupansky -Original Message- From: Siegfried Goeschl Sent: Thursday, May 22, 2014 4:35 AM To: solr-user@lucene.apache.org Subject: Re: pdfs Hi folks, for a small customer project I'm running SOLR with embedded Tikka. * memory consumption is an issue but can be handled * there is an issue with PDFBox hitting an infinite loop which causes excessive CPU usage - requires SOLR restart but happens only once withing 400.000 documents (PDF, Word, ect) but is seems a little bit erratic since I was never able to track the problem back to a particular PDF document Having said that we wire SOLR with Nagios to get an alarm when CPU consumption goes through the roof If you doing really serious stuff I would recommend * moving the document extraction stuff out of SOLR * provide monitoring and recovery and stuck document extractions ** killing worker threads ** using external processed and kill them when spinning out of control Cheers, Siegfried Goeschl On 22.05.14 06:46, Jack Krupansky wrote: Yeah, PDF extraction has always been at least somewhat problematic. It has improved over the years, but still not likely to be perfect. That said, I'm not aware of any specific PDF extraction issue that would bring down Solr - as opposed to causing a 500 status with an exception in PDF extraction, with the exception of memory usage. Some PDF documents, especially those which are graphic-intense can require a lot of memory. The rest of Solr could be adversely affected if all available JVM heap is consumed. The solution is to give the JVM more heap space. So, what is your specific symptom? -- Jack Krupansky -Original Message- From: Brian McDowell Sent: Thursday, May 22, 2014 12:24 AM To: solr-user@lucene.apache.org Subject: pdfs Has anyone had issues with indexing pdf files? Some pdfs are bringing down Solr completely so that it actually needs to be manually restarted. We are using Solr 4.4 and thought that upgrading to Solr 4.8 would solve the problem because the release notes associated with the new tika version and also the new pdfbox indicate fixes for pdf issues. It didn't work and now this issue is causing us to reevaluate using Solr. Any help on this matter would be greatly appreciated. Thank you!
Re: ExtractingRequestHandler indexing zip files
Hi Sergio, your either do the stuff on the caller side (which is probably a good idea since you are off-load the SOLR server) or extend the ExtractingRequestHandler Cheers, Siegfried Goeschl On 27 May 2014, at 10:37, marotosg wrote: > Hi, > > Thanks for your answer Alexandre. > I have zip files with only one document inside per zip file. These documents > are mainly pdf,xml,html. > > I tried to index "tini.txt.gz" file which is located in the trunk to be used > by extraction tests > \trunk\solr\contrib\extraction\src\test-files\extraction\tini.txt.gz > > I get the same issue only the name of the file inside "tini.txt.gz gets > indexed as content. That means ExtractRequesthandler can open the file > because it's getting the name inside but for some reason is not reading the > content. > > Any suggestions? > > Thanks > Sergio > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/ExtractingRequestHandler-indexing-zip-files-tp4138172p4138255.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Nodes autoSoftCommit and (temporary) missing documents
Hi folks, I think that the timestamp should be rounded down to a minute (or whatever) to avoid trashing the filter query cache Cheers, Siegfried Goeschl On 25 May 2014, at 18:19, Steve McKay wrote: > Solr can add the filter for you: > > > >timestamp:[* TO NOW-30SECOND] > > > > Increasing soft commit frequency isn't a bad idea, though. I'd probably do > both. :) > > On May 23, 2014, at 6:51 PM, Michael Tracey wrote: > >> Hey all, >> >> I've got a number of nodes (Solr 4.4 Cloud) that I'm balancing with HaProxy >> for queries. I'm indexing pretty much constantly, and have autoCommit and >> autoSoftCommit on for Near Realtime Searching. All works nicely, except >> that occasionally the auto-commit cycles are far enough off that one node >> will return a document that another node doesn't. I don't want to have to >> add something like this: timestamp:[* TO NOW-30MINUTE] to every query to >> make sure that all the nodes have the record. Ideas? autoSoftCommit more >> often? >> >> >> 10 >> 720 >> false >> >> >> >> 3 >> 5000 >> >> >> Thanks, >> >> M. >
Re: pdfs
Sorry typo :- can you send me the PDF by email directly :-) Siegfried Goeschl On 25 May 2014, at 10:06, Siegfried Goeschl wrote: > Hi Brian, > > can you send me the email? I would like to play around :-) > > Have you opened a JIRA for PdfBox? If not I willl open one if I can reproduce > the issue … > > Thanks in advance > > Siegfried Goeschl > > > On 25 May 2014, at 04:18, Brian McDowell wrote: > >> Our feeding (indexing) tool halts because Solr becomes unresponsive after >> getting some really bad pdfs. There are levels of pdf "badness." Some just >> will not parse and that's fine, but others are more problematic in that our >> Operations team has to restart Solr because it just hangs and accepts no >> more documents. I actually have identified a pdf that will bring down Solr >> every time. Does anyone think that doing pre-validation using the pdfbox >> jar will work? Or, will trying to validate just hang as well? Any help is >> appreciated. >> >> >> On Thu, May 22, 2014 at 8:47 AM, Jack Krupansky >> wrote: >> >>> Yeah, I recall running into infinite loop issues with PDFBox in Solr years >>> ago. They keep fixing these issues, but they keep popping up again. Sigh. >>> >>> -- Jack Krupansky >>> >>> -Original Message- From: Siegfried Goeschl >>> Sent: Thursday, May 22, 2014 4:35 AM >>> To: solr-user@lucene.apache.org >>> Subject: Re: pdfs >>> >>> >>> Hi folks, >>> >>> for a small customer project I'm running SOLR with embedded Tikka. >>> >>> * memory consumption is an issue but can be handled >>> * there is an issue with PDFBox hitting an infinite loop which causes >>> excessive CPU usage - requires SOLR restart but happens only once >>> withing 400.000 documents (PDF, Word, ect) but is seems a little bit >>> erratic since I was never able to track the problem back to a particular >>> PDF document >>> >>> Having said that we wire SOLR with Nagios to get an alarm when CPU >>> consumption goes through the roof >>> >>> If you doing really serious stuff I would recommend >>> * moving the document extraction stuff out of SOLR >>> * provide monitoring and recovery and stuck document extractions >>> ** killing worker threads >>> ** using external processed and kill them when spinning out of control >>> >>> Cheers, >>> >>> Siegfried Goeschl >>> >>> On 22.05.14 06:46, Jack Krupansky wrote: >>> >>>> Yeah, PDF extraction has always been at least somewhat problematic. It >>>> has improved over the years, but still not likely to be perfect. >>>> >>>> That said, I'm not aware of any specific PDF extraction issue that would >>>> bring down Solr - as opposed to causing a 500 status with an exception >>>> in PDF extraction, with the exception of memory usage. Some PDF >>>> documents, especially those which are graphic-intense can require a lot >>>> of memory. The rest of Solr could be adversely affected if all available >>>> JVM heap is consumed. The solution is to give the JVM more heap space. >>>> >>>> So, what is your specific symptom? >>>> >>>> -- Jack Krupansky >>>> >>>> -Original Message- From: Brian McDowell >>>> Sent: Thursday, May 22, 2014 12:24 AM >>>> To: solr-user@lucene.apache.org >>>> Subject: pdfs >>>> >>>> Has anyone had issues with indexing pdf files? Some pdfs are bringing down >>>> Solr completely so that it actually needs to be manually restarted. We are >>>> using Solr 4.4 and thought that upgrading to Solr 4.8 would solve the >>>> problem because the release notes associated with the new tika version and >>>> also the new pdfbox indicate fixes for pdf issues. It didn't work and now >>>> this issue is causing us to reevaluate using Solr. Any help on this matter >>>> would be greatly appreciated. Thank you! >>>> >>> >>> >
Re: pdfs
Hi Brian, can you send me the email? I would like to play around :-) Have you opened a JIRA for PdfBox? If not I willl open one if I can reproduce the issue … Thanks in advance Siegfried Goeschl On 25 May 2014, at 04:18, Brian McDowell wrote: > Our feeding (indexing) tool halts because Solr becomes unresponsive after > getting some really bad pdfs. There are levels of pdf "badness." Some just > will not parse and that's fine, but others are more problematic in that our > Operations team has to restart Solr because it just hangs and accepts no > more documents. I actually have identified a pdf that will bring down Solr > every time. Does anyone think that doing pre-validation using the pdfbox > jar will work? Or, will trying to validate just hang as well? Any help is > appreciated. > > > On Thu, May 22, 2014 at 8:47 AM, Jack Krupansky > wrote: > >> Yeah, I recall running into infinite loop issues with PDFBox in Solr years >> ago. They keep fixing these issues, but they keep popping up again. Sigh. >> >> -- Jack Krupansky >> >> -Original Message- From: Siegfried Goeschl >> Sent: Thursday, May 22, 2014 4:35 AM >> To: solr-user@lucene.apache.org >> Subject: Re: pdfs >> >> >> Hi folks, >> >> for a small customer project I'm running SOLR with embedded Tikka. >> >> * memory consumption is an issue but can be handled >> * there is an issue with PDFBox hitting an infinite loop which causes >> excessive CPU usage - requires SOLR restart but happens only once >> withing 400.000 documents (PDF, Word, ect) but is seems a little bit >> erratic since I was never able to track the problem back to a particular >> PDF document >> >> Having said that we wire SOLR with Nagios to get an alarm when CPU >> consumption goes through the roof >> >> If you doing really serious stuff I would recommend >> * moving the document extraction stuff out of SOLR >> * provide monitoring and recovery and stuck document extractions >> ** killing worker threads >> ** using external processed and kill them when spinning out of control >> >> Cheers, >> >> Siegfried Goeschl >> >> On 22.05.14 06:46, Jack Krupansky wrote: >> >>> Yeah, PDF extraction has always been at least somewhat problematic. It >>> has improved over the years, but still not likely to be perfect. >>> >>> That said, I'm not aware of any specific PDF extraction issue that would >>> bring down Solr - as opposed to causing a 500 status with an exception >>> in PDF extraction, with the exception of memory usage. Some PDF >>> documents, especially those which are graphic-intense can require a lot >>> of memory. The rest of Solr could be adversely affected if all available >>> JVM heap is consumed. The solution is to give the JVM more heap space. >>> >>> So, what is your specific symptom? >>> >>> -- Jack Krupansky >>> >>> -Original Message- From: Brian McDowell >>> Sent: Thursday, May 22, 2014 12:24 AM >>> To: solr-user@lucene.apache.org >>> Subject: pdfs >>> >>> Has anyone had issues with indexing pdf files? Some pdfs are bringing down >>> Solr completely so that it actually needs to be manually restarted. We are >>> using Solr 4.4 and thought that upgrading to Solr 4.8 would solve the >>> problem because the release notes associated with the new tika version and >>> also the new pdfbox indicate fixes for pdf issues. It didn't work and now >>> this issue is causing us to reevaluate using Solr. Any help on this matter >>> would be greatly appreciated. Thank you! >>> >> >>
Re: pdfs
Hi folks, for a small customer project I'm running SOLR with embedded Tikka. * memory consumption is an issue but can be handled * there is an issue with PDFBox hitting an infinite loop which causes excessive CPU usage - requires SOLR restart but happens only once withing 400.000 documents (PDF, Word, ect) but is seems a little bit erratic since I was never able to track the problem back to a particular PDF document Having said that we wire SOLR with Nagios to get an alarm when CPU consumption goes through the roof If you doing really serious stuff I would recommend * moving the document extraction stuff out of SOLR * provide monitoring and recovery and stuck document extractions ** killing worker threads ** using external processed and kill them when spinning out of control Cheers, Siegfried Goeschl On 22.05.14 06:46, Jack Krupansky wrote: Yeah, PDF extraction has always been at least somewhat problematic. It has improved over the years, but still not likely to be perfect. That said, I'm not aware of any specific PDF extraction issue that would bring down Solr - as opposed to causing a 500 status with an exception in PDF extraction, with the exception of memory usage. Some PDF documents, especially those which are graphic-intense can require a lot of memory. The rest of Solr could be adversely affected if all available JVM heap is consumed. The solution is to give the JVM more heap space. So, what is your specific symptom? -- Jack Krupansky -Original Message- From: Brian McDowell Sent: Thursday, May 22, 2014 12:24 AM To: solr-user@lucene.apache.org Subject: pdfs Has anyone had issues with indexing pdf files? Some pdfs are bringing down Solr completely so that it actually needs to be manually restarted. We are using Solr 4.4 and thought that upgrading to Solr 4.8 would solve the problem because the release notes associated with the new tika version and also the new pdfbox indicate fixes for pdf issues. It didn't work and now this issue is causing us to reevaluate using Solr. Any help on this matter would be greatly appreciated. Thank you!
Re: Indexing PDF in Apache Solr 4.8.0 - Problem.
Hi Vignesh, can you check your SOLR Server Log?! Not all PDF documents on this planet can be processed using Tikka :-) Cheers, Siegfried Goeschl On 07 May 2014, at 09:40, vignesh wrote: > Dear Team, > > I am Vignesh using the latest version 4.8.0 Apache Solr and am > Indexing my PDF but getting an error and have posted that below for your > reference. Kindly guide me to solve this error. > > D:\IPCB\solr>java -Durl=http://localhost:8082/solr/ipcb/update/extract > -Dparams= > literal.id=herald060214_001 -Dtype=application/pdf -jar post.jar > "D:/IPCB/ipcbpd > f/herald060214_001.pdf" > SimplePostTool version 1.5 > Posting files to base url > http://localhost:8082/solr/ipcb/update/extract?literal > .id=herald060214_001 using content-type application/pdf.. > POSTing file herald060214_001.pdf > SimplePostTool: WARNING: Solr returned an error #500 Internal Server Error > SimplePostTool: WARNING: IOException while reading response: > java.io.IOException > : Server returned HTTP response code: 500 for URL: > http://localhost:8082/solr/ip > cb/update/extract?literal.id=herald060214_001 > 1 files indexed. > COMMITting Solr index changes to > http://localhost:8082/solr/ipcb/update/extract? > literal.id=herald060214_001.. > SimplePostTool: WARNING: Solr returned an error #500 Internal Server Error > for u > rl > http://localhost:8082/solr/ipcb/update/extract?literal.id=herald060214_001&co > mmit=true > Time spent: 0:00:00.062 > > > > Thanks & Regards. > Vignesh.V > > > Ninestars Information Technologies Limited., > 72, Greams Road, Thousand Lights, Chennai - 600 006. India. > Landline : +91 44 2829 4226 / 36 / 56 X: 144 > www.ninestars.in > > > STOP Virus, STOP SPAM, SAVE Bandwidth! > www.safentrix.com >
Re: Export big extract from Solr to [My]SQL
Hi Per, basically I see three options * use a lot of memory to scope with huge result sets * user result set paging * SOLR 4.7 supports cursors (https://issues.apache.org/jira/browse/SOLR-5463) Cheers, Siegfried Goeschl On 02.05.14 13:32, Per Steffensen wrote: Hi I want to make extracts from my Solr to MySQL. Any tools around that can help med perform such a task? I find a lot about data-import from SQL when googling, but nothing about export/extract. It is not all of the data in Solr I need to extract. It is only documents that full fill a normal Solr query, but the number of documents fulfilling it will (potentially) be huge. Regards, Per Steffensen
Re: Having trouble with German compound words in Solr 4.7
Hi Alistair, it seems that there are many ways to skin the cat so I describe the approach I used with SOLR 3.6 :-) * Using a patched DictionaryCompoundWordTokenFilterFactory in the "index" phase - so the german compound noun "Leinenhose" (linen trousers) would be indexed in addition to "Leinen" & "Hose". Afterwards the three tokens go trough stemming. * One hint which might be useful - I only split words which I consider proper german compound nouns. E.g. if your indexed text contains the token "schwarzkleid" I would NOT split it since it is NOT a proper noun - the proper noun would be "Schwarzkleid" - please note that even "Schwarzkleid" is not a proper german noun anyway :-) * I use a custom dictionary for splitting consisting of 7.000 entries which contains a lot of customer-specific entries I do not tinker with DictionaryCompoundWordTokenFilterFactory in the "query" phase of the field so the following queries would work with the indexed word "Leinenhose" * "leinenhosen" * "leinenhose" * "leinen hose" * "leinen hosen" Cheers, Siegfried Goeschl On 22.04.14 12:13, Alistair wrote: I've managed to solve this (in a quite hacky sort of way) by using filter queries and the edismax queryparser. I added in my solrconfig.xml the following parameters: edismax 75% Then when searching for multiple keywords (for example: schwarzkleid wenz, where wenz is a german brand name), I use the first keyword as a query and anything after that I add as a filterquery. So my final query looks something like this: fl=id&sort=popular+desc&indent=on&q=keywords:'schwarzkleide'+&wt=json&fq={!edismax}+keywords:'wenz'&fq=deleted:0 My compound splitter filter splits schwarzkleide correctly and it is parsed as edismax with mm=75%, then the filterqueries are added, for keywords they are also parsed as edismax. The returned result is all the black dresses from 'Wenz'. If anybody has a better solution to what I've posted I would be more than happy to read up on it as I'm quite new to Solr and I think my way is a bit convoluted to be honest. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Having-trouble-with-German-compound-words-in-Solr-4-7-tp4131964p4132478.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Having trouble with German compound words in Solr 4.7
Hi Alistair, quick email before getting my plane - I worked with similar requirements in the past and tuning SOLR can be tricky * are you hitting the same SOLR query handler (application versus manual checking)? * turn on debugging for your application SOLR queries so you see what query is actually executed * one thing I always do for prototyping is setting up the Solritas GUI using the same query handler as the application server Cheers, Siegfried Goeschl On 18 Apr 2014, at 06:06, Alistair wrote: > Hey Jack, > > thanks for the reply. I added autoGeneratePhraseQueries="true" to the > fieldType and now it's giving me even more results! I'm not sure if the > debug of my query will be helpful but I'll paste it just in case someone > might have an idea. This produces 113524 results, whereas if I manually > enter the query as keyword:schwarz AND keyword:kleid I only get 20283 > results (which is the correct one). > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Having-trouble-with-German-compound-words-in-Solr-4-7-tp4131964p4131973.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: No route to host
Hi folks, the URL looks wrong (misconfigured) http://:8080/solr/collection1 Cheers, Siegfried Goeschl On 09 Apr 2014, at 14:28, Rallavagu wrote: > All, > > I see the following error in the log file. The host that it is trying to find > is itself. Wondering if anybody experienced this before or any other info > would helpful. Thanks. > > 709703139 [http-bio-8080-exec-43] ERROR > org.apache.solr.update.SolrCmdDistributor – > org.apache.solr.client.solrj.SolrServerException: IOException occured when > talking to server at: http://:8080/solr/collection1 > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:503) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197) > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:293) > at > org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:212) > at > org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:181) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1260) > at > org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) > at > org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) > at > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) > at > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023) > at > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) > at > org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.net.NoRouteToHostException: No route to host > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at > org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) > at > org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) > at > org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) > at > org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643) > at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479) > at > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > at > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) > at > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:393)
Re: Anyone going to ApacheCon in Denver next week?
Hi folks, I’m already here and would love to join :-) Cheers, Siegfried Goeschl On 05 Apr 2014, at 20:43, Doug Turnbull wrote: > I'll be there. I'd love to meet up. Let me know! > > Sent from my Windows Phone From: William Bell > Sent: 4/5/2014 10:40 PM > To: solr-user@lucene.apache.org > Subject: Anyone going to ApacheCon in Denver next week? > Thoughts on getting together for breakfast? a little Solr meet up? > > > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076
Re: Apache Solr.
Hi Vignesh, a few keywords for further investigations * Solr Data Import Handler * Apache Tikka * Apache PDFBox Cheers, Siegfried Goeschl On 03.02.14 09:15, vignesh wrote: Hi Team, I am Vignesh, am using Apache Solr 3.6 and able to Index XML file and now trying to Index PDF file and not able to index .Can you give me the steps to carry out PDF indexing it will be very useful. Kindly guide me through this process. Thanks & Regards. Vignesh.V cid:image001.jpg@01CA4872.39B33D40 Ninestars Information Technologies Limited., 72, Greams Road, Thousand Lights, Chennai - 600 006. India. Landline : +91 44 2829 4226 / 36 / 56 X: 144 http://www.ninestars.in/> www.ninestars.in -- 30 Million Advertisements displayed. Is yours there? http://www.safentrixads.com/adlink?cid=13 --
Re: Why do people want to deploy to Tomcat?
Hi ALex, in my case * ignorance that Tomcat is not fully supported * Tomcat configuration and operations know-how inhouse * could migrate to Jetty but need approved change request to do so Cheers, Siegfried Goeschl On 12.11.13 04:54, Alexandre Rafalovitch wrote: Hello, I keep seeing here and on Stack Overflow people trying to deploy Solr to Tomcat. We don't usually ask why, just help when where we can. But the question happens often enough that I am curious. What is the actual business case. Is that because Tomcat is well known? Is it because other apps are running under Tomcat and it is ops' requirement? Is it because Tomcat gives something - to Solr - that Jetty does not? It might be useful to know. Especially, since Solr team is considering making the server part into a black box component. What use cases will that break? So, if somebody runs Solr under Tomcat (or needed to and gave up), let's use this thread to collect this knowledge. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: how to debug my own analyzer in solr
Thread Dump and/or Remote Debugging?! Cheers, Siegfried Goeschl On 21.10.13 11:58, Mingzhu Gao wrote: More information about this , the custom analyzer just implement "createComponents" of Analyzer. And my configure in schema.xml is just something like : From the log I cannot see any error information , however , when I want to analysis or add document data , it always hang there . Any way to debug or narrow down the problem ? Thanks in advance . -Mingz On 10/21/13 4:35 PM, "Mingzhu Gao" wrote: Dear solr expert , I would like to write my own analyser ( Chinese analyser ) and integrate them into solr as solr plugin . From the log information , the custom analyzer can be loaded into solr successfully . I define my with this custom analyzer. Now the problem is that , when I try this analyzer from http://localhost:8983/solr/#/collection1/analysis , click the analysis , then choose my FieldType , then input some text . After I click "Analyse Value" button , the solr hang there , I cannot get any result or response in a few minutes. I also try to add some data by "curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" , or by "post.sh" in exampledocs folder , The same issue , the solr hang there , no result and not response . Can anybody give me some suggestions on how to debug solr to work with my own custom analyzer ? By the way , I write a java program to call my custom analyzer , the result is okay , for example , the following code can work well . == Analyzer analyzer = new MyAnalyzer() ; TokenStream ts = analyzer.tokenStream() ; CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class); ts.reset(); while (ts.incrementToken()){ System.out.println(ta.toString()); } = Thanks, -Mingz
Re: solr 4.4 config trouble
Hi Marc, what exactly is not working - no obvious problemsin the logs as as I see Cheers, Siegfried Goeschl Am 30.09.2013 um 11:44 schrieb Marc des Garets : > Hi, > > I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I can't > get it to work. If someone can point me at what I'm doing wrong. > > tomcat context: > crossContext="true"> > value="/opt/solr4.4/solr_address" override="true" /> > > > > core.properties: > name=address > collection=address > coreNodeName=address > dataDir=/opt/indexes4.1/address > > > solr.xml: > > > > ${host:} > 8080 > solr_address > ${zkClientTimeout:15000} > false > > > class="HttpShardHandlerFactory"> > ${socketTimeout:0} > ${connTimeout:0} > > > > > In solrconfig.xml I have: > 4.1 > > /opt/indexes4.1/address > > > And the log4j logs in catalina.out: > ... > INFO: Deploying configuration descriptor solr_address.xml > 0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter – > SolrDispatchFilter.init() > 24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI > solr.home: /opt/solr4.4/solr_address > 26 [main] INFO org.apache.solr.core.SolrResourceLoader – new > SolrResourceLoader for directory: '/opt/solr4.4/solr_address/' > 176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container > configuration from /opt/solr4.4/solr_address/solr.xml > 272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores > in /opt/solr4.4/solr_address > 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores > in /opt/solr4.4/solr_address/conf > 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores > in /opt/solr4.4/solr_address/conf/xslt > 277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores > in /opt/solr4.4/solr_address/conf/lang > 278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores > in /opt/solr4.4/solr_address/conf/velocity > 283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer > 991552899 > 284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into > CoreContainer [instanceDir=/opt/solr4.4/solr_address/] > 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – > Setting socketTimeout to: 0 > 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – > Setting urlScheme to: http:// > 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – > Setting connTimeout to: 0 > 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – > Setting maxConnectionsPerHost to: 20 > 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – > Setting corePoolSize to: 0 > 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – > Setting maximumPoolSize to: 2147483647 > 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – > Setting maxThreadIdleTime to: 5 > 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – > Setting sizeOfQueue to: -1 > 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – > Setting fairnessPolicy to: false > 320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating > new http client, > config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false > 420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log Listener > [Log4j (org.slf4j.impl.Log4jLoggerFactory)] > 422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper > client=192.168.10.206:2181 > 429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating > new http client, > config:maxConnections=500&maxConnectionsPerHost=16&socketTimeout=0&connTimeout=0 > 487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting for > client to connect to ZooKeeper > 540 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager – > Watcher org.apache.solr.common.cloud.ConnectionManager@7dc21ece > name:ZooKeeperConnection Watcher:192.168.10.206:2181 got event WatchedEvent > state:SyncConnected type:None path:null path:null type:None > 541 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Client is > connected to ZooKeeper > 562 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: > /overseer/queue > 578 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: > /overseer/collection-queue-work > 591 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: > /live_nodes > 59
Re: how to suppress result
Hi Evgeniy +) delete the documents if you really don't need need them +) create a field "ignored" and build an appropriate query to exclude the documents where 'ignored' is true Cheers, Siegfried Goeschl Evgeniy Strokin wrote: Hello,.. I have odd problem. I use Solr for regular search, and it works fine for my task, but my client has a list of IDs in a flat separate file (he could have huge amount of the IDs, up to 1M) and he wants to exclude those IDs from result of the search. What is the right way to do this? Any thoughts are greatly appreciated. Thank you Gene
Re: Can We append a field to the response that is not in the index but computed at runtime.
Hi folks, I had to solve a similiar problem with SOLR 1.2 and used a custom org.apache.solr.request.QueryResponseWriter - you can trigger your custom response writer using SOLR admin but it is not an elegant solution (I think the XMWriter is a final class therefore some copy&waste code) Cheers, Siegfried Goeschl Umar Shah wrote: On Mon, Mar 31, 2008 at 7:38 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: Two approaches: 1. make a map and add it to the response: rb.rsp.add( "mystuff", mymap ); I tried using both Map/ NamedList it appends to the results I have to attach each document with corresponding field. 2. Augment the documents with a field value -- this is a bit more complex and runs the risk of name collisions with fields in your documents. You can pull the docLIst out from the response and add fields to each document. this seems more appropriate, I'm okay, to resolve name collision , how do I add the field.. any specific methods to do that? If #1 works, go with that... ryan On Mar 31, 2008, at 9:51 AM, Umar Shah wrote: thanks ryan for the reply. I have looked at the prepare and process methods in SearchComponents(Query, Filter etc). I'm using all the default components to prepare and then process the reults. and then prepare a custom field after iterating through all the documents in the result set. After having created this field for each document how do I add corresponding custom field to each document in the response set. On Mon, Mar 31, 2008 at 6:25 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: Without writing any custom code, no. If you write a "SearchComponent" http://wiki.apache.org/solr/SearchComponent -- you can programatically change the response at runtime. ryan On Mar 28, 2008, at 3:38 AM, Umar Shah wrote: Hi, I wanted to know whether we can append a field (Fdyn say) to each doc in the returned set Fdyn is computed as some complex function of the fields stored in the index during the runtime in SOLR. -umar
Re: Solr interprets UTF-8 as ISO-8859-1
Hi Daniel, the following topic might help (at least it did the trick for me using german chararcters) http://wiki.apache.org/solr/FAQ - Why don't International Characters Work? So I wrote the following servlet (taken from Wiki/mailing list) import org.apache.solr.servlet.SolrDispatchFilter; import javax.servlet.ServletRequest; import javax.servlet.ServletResponse; import javax.servlet.FilterChain; import javax.servlet.ServletException; import java.io.IOException; /** * A work around that the URL parameters are encoded using UTF-8 but no character * encoding is defined. So enforce UTF-8 to make it work with German characters. */ public class CdpSolrDispatchFilter extends SolrDispatchFilter { public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException { String encoding = request.getCharacterEncoding(); if (null == encoding) { // Set your default encoding here request.setCharacterEncoding("UTF-8"); } else { request.setCharacterEncoding(encoding); } super.doFilter(request, response, chain); } } Cheers, Siegfried Goeschl Daniel Löfquist wrote: Hello, We're building a webapplication that uses Solr for searching and I've come upon a problem that I can't seem to get my head around. We have a servlet that accepts input via XML-RPC and based on that input constructs the correct URL to perform a search with the Solr-servlet. I know that the call to Solr (the URL) from our servlet looks like this (which is what it should look like): http://myserver:8080/solrproducts/select/?q=all_SV:ljusblå+status:online&fl=id%2Cartno%2Ctitle_SV%2CtitleSort_SV%2Cdescription_SV%2C&sort=titleSort_SV+asc,id+asc&start=0&q.op=AND&rows=25 But Solr reports the input-fields (the GET-variables in the URL) as: INFO: /select/ fl=id,artno,title_SV,titleSort_SV,description_SV,&sort=titleSort_SV+asc,id+asc&start=0&q=all_SV:ljusblÃ¥+status:online&q.op=AND&rows=25 which is all fine except where it says "ljusblÃ¥". Apparently Solr is interpreting the UTF-8 string "ljusblå" as ISO-8859-1 and thus creates this garbage that makes the search return 0 when it should in reality return 3 hits. All other searches that don't use special characters work 100% fine. I'm new to Solr so I'm not sure what I'm doing wrong here. Can anybody help me out and point me in the direction of a solution? Sincerely, Daniel Löfquist
Re: Combining SOLR and JAMon to monitor query execution times from a browser
Hi Noberto, JAMon is all about aggregating statistical data and displaying the information for a web browser - the main beauty is that it is easy to define what you are monitoring such as querying domain objects per customer. Cheers, Siegfried Goeschl Norberto Meijome wrote: On Tue, 27 Nov 2007 18:18:16 +0100 Siegfried Goeschl <[EMAIL PROTECTED]> wrote: Hi folks, working on a closed source project for an IP concerned company is not always fun ... we combined SOLR with JAMon (http://jamonapi.sourceforge.net/) to keep an eye of the query times and this might be of general interest +) JAMon comes with a ready-to-use ServletFilter +) we extended this implementation to keep track for queries issued by a customer and the requested domain objects, e.g. "artist", "album", "track" +) this allows us to keep track of the execution times and their distribution to find quickly long running queries without having access to the access.log from a web browser +) a small presentation can be found at http://people.apache.org/~sgoeschl/presentations/jamon-20070717.pdf +) if it is of general I can rewrite the code as contribution Thanks Siegfried, I am further interested in plugging this information into something like Nagios , Cacti , Zenoss , bigsister , Openview or your monitoring system of choice, but I haven't had much time to look into this yet. How does JAMon compare to JMX ( http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/) ? cheers, B _ {Beto|Norberto|Numard} Meijome There are no stupid questions, but there are a LOT of inquisitive idiots. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Combining SOLR and JAMon to monitor query execution times from a browser
Hi folks, working on a closed source project for an IP concerned company is not always fun ... we combined SOLR with JAMon (http://jamonapi.sourceforge.net/) to keep an eye of the query times and this might be of general interest +) JAMon comes with a ready-to-use ServletFilter +) we extended this implementation to keep track for queries issued by a customer and the requested domain objects, e.g. "artist", "album", "track" +) this allows us to keep track of the execution times and their distribution to find quickly long running queries without having access to the access.log from a web browser +) a small presentation can be found at http://people.apache.org/~sgoeschl/presentations/jamon-20070717.pdf +) if it is of general I can rewrite the code as contribution Cheers, Siegfried Goeschl
Re: Any clever ideas to inject into solr? Without http?
Hi Kevin, I'm also a newbie but some thoughts along the line ... +) for evaluating SOLR we used a less exotic setup for data import base on Pnuts (a JVM based scripting language) ... :-) ... but Groovy would do as well if you feel at home with Java. +) my colleague just finished a database import service running within the servlet container to avoid writing out the data to the file system and transmitting it over HTTP. +) I think there were some discussion regarding a generic database importer but nothing I'm aware of Cheers, Siegfried Goeschl Kevin Holmes wrote: I inherited an existing (working) solr indexing script that runs like this: Python script queries the mysql DB then calls bash script Bash script performs a curl POST submit to solr We're injecting about 1000 records / minute (constantly), frequently pushing the edge of our CPU / RAM limitations. I'm in the process of building a Perl script to use DBI and lwp::simple::post that will perform this all from a single script (instead of 3). Two specific questions 1: Does anyone have a clever (or better) way to perform this process efficiently? 2: Is there a way to inject into solr without using POST / curl / http? Admittedly, I'm no solr expert - I'm starting from someone else's setup, trying to reverse-engineer my way out. Any input would be greatly appreciated.
Re: Need question to configure Log4j for solr
Hi Ken, and we stopped using Resin's support for daily rolling log files since it blocks the server for 20 minutes when rotating a 20 GB logfile - please don't ask what we are doing with the daily 20 GB ... :-( Cheers, Siegfried Goeschl Ken Krugler wrote: : the troubles comes when you integrate third-party stuff depending on : log4j (as I currently do). Having said this you have a strong point when : looking at http://www.qos.ch/logging/classloader.jsp there have been several discussions baout changing the logger used by Solr ... the best summation i can give to these discussions is: * JDK logging is universal * using any other logging framework would add a dependency without adding functionality The one issue I ran into was with daily rolling log files - maybe I missed it, but I didn't find that functionality in the JDK logging package, however it is in log4j. I'm not advocating a change, just noting this. We worked around it by leveraging Resin's support for wrapping a logger (set up for daily rolling log files) around a webapp. -- Ken
Re: Need question to configure Log4j for solr
Hi Erik, the troubles comes when you integrate third-party stuff depending on log4j (as I currently do). Having said this you have a strong point when looking at http://www.qos.ch/logging/classloader.jsp Cheers, Siegfried Goeschl Erik Hatcher wrote: On Jul 12, 2007, at 9:03 AM, Siegfried Goeschl wrote: would be using commons-logging an improvement? It is a common requirement to hook up different logging infrastructure .. My personal take on it is *adding* a dependency to keep functionality the same isn't an improvement. JDK logging, while not with as many bells and whistles as Commons Logging, log4j, etc, is plenty good enough and keeps us away from many of logging JARmageddon headaches. I'm not against a logging change should others have different opinions with a strong case of improvement. Erik
Re: Need question to configure Log4j for solr
Hi folks, would be using commons-logging an improvement? It is a common requirement to hook up different logging infrastructure .. Cheers, Siegfried Goeschl Erik Hatcher wrote: On Jul 11, 2007, at 9:07 PM, solruser wrote: How do I configure solr to use log4j logging. I am able to configure tomcat 5.5.23 to use log4j. But I could not get solr to use log4j. I have 3 context of solr running in tomcat which refers to war file in commons. Solr uses standard JDK logging. I'm sure it could be bridged to log4j somehow, but rather I'd recommend you just configure JDK logging how you'd like. Erik
Re: How to use bit fields to narrow a search
Hi Yonik, looks intersting - I give it a try Cheers, Siegfried Goeschl Yonik Seeley wrote: On 6/26/07, Siegfried Goeschl <[EMAIL PROTECTED]> wrote: Hi folks, I'm currently evaluating SOLR to implement fulltext search and within 8 hours I have my content imported and able to benchmark the queries ... :-) As a beginner with Lucence/SOLR I have a problem where to add my "special features" - little bit overloaded with "Lucene in Action" and SOLR over the weekend ... Some background ... +) I have 4 millions document indexed +) each document has 3 long variables (stored but not indexed) representing a 64 bit mask each +) I have to filter the Hits based on the bit mask using BIT AND with application supplied parameters Any suggestions/ideas where to add this processing within SOLR ... Due to the nature of an inverted index, it could actually be more efficient to store the bits separately. You could also then use Solr w/o any custom java code. Index a field called bits, which just contains the bit numbers set, separated by whitespace. At query time, use filters on the required bit numbers: q=foo&fq=bits:12&fq=bits:45 -Yonik
How to use bit fields to narrow a search
Hi folks, I'm currently evaluating SOLR to implement fulltext search and within 8 hours I have my content imported and able to benchmark the queries ... :-) As a beginner with Lucence/SOLR I have a problem where to add my "special features" - little bit overloaded with "Lucene in Action" and SOLR over the weekend ... Some background ... +) I have 4 millions document indexed +) each document has 3 long variables (stored but not indexed) representing a 64 bit mask each +) I have to filter the Hits based on the bit mask using BIT AND with application supplied parameters Any suggestions/ideas where to add this processing within SOLR ... Thanks in advance Siegfried Goeschl