Re: Solr 3.3 crashes after ~18 hours?
Am 19.08.2011 16:43, schrieb Yonik Seeley: On Fri, Aug 19, 2011 at 10:36 AM, alexander sulz wrote: using lsof I think I pinned down the problem: too many open files! I already doubled from 512 to 1024 once but it seems there are many SOCKETS involved, which are listed as "can't identify protocol", instead of "real files". over time, the list grows and grows with these entries until.. it "crashs". So Ive read several times the fix for this problem is to set the limit to a ridiculous high number but that seems a little bit of a crude fix. Why so many open sockets in the first place? What are you using as a client to talk to solr? You need to look at both the update side and the query side. Using persistent connections is the best all-around, but if not, be sure to close the connections in the client. -Yonik http://www.lucidimagination.com I use PHP to talk to solr, this one to be exact http://code.google.com/p/solr-php-client/ version r22 i guess. I'll try updating it and see what happens..
Re: Solr 3.3 crashes after ~18 hours?
Am 19.08.2011 15:48, schrieb alexander sulz: Am 10.08.2011 17:11, schrieb Yonik Seeley: On Wed, Aug 10, 2011 at 11:00 AM, alexander sulz wrote: Okay, with this command it hangs. It doesn't look like a hang from this thread dump. It doesn't look like any solr requests are executing at the time the dump was taken. Did you do this from the command line? curl "http://localhost:8983/solr/update?commit=true"; Are you saying that the curl command just hung and never returned? -Yonik http://www.lucidimagination.com Also: I managed to get a Thread Dump (attached). regards Am 05.08.2011 15:08, schrieb Yonik Seeley: On Fri, Aug 5, 2011 at 7:33 AM, alexander sulz wrote: Usually you get a XML-Response when doing commits or optimize, in this case I get nothing in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T load forever or anything. It doesn't hang! I just get a blank page / empty response. Sounds like you are doing it from a browser? Can you try it from the command line? It should give back some sort of response (or hang waiting for a response). curl "http://localhost:8983/solr/update?commit=true"; -Yonik http://www.lucidimagination.com I use the stuff in the example folder, the only changes i made was enable logging and changing the port to 8985. I'll try getting a thread dump if it happens again! So far its looking good with having allocated more memory to it. Am 04.08.2011 16:08, schrieb Yonik Seeley: On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz wrote: Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting /solr/admin, I'd get a blank page! The server ignores any incoming updates or commits, "ignores" means what? The request hangs? If so, could you get a thread dump? Do queries work (like /solr/select?q=*:*) ? thous throwing no errors, no 503's.. It's like the server has a blackout and stares blankly into space. Are you using a different servlet container than what is shipped with solr? If you did start with the solr "example" server, what jetty configuration changes have you made? -Yonik http://www.lucidimagination.com Sigh it happened again, but I have a clue: before the crash I was deleting some entries but haven't optimized afterwards, then, when I tried indexing something, solr "crashed" again (responsive but just blank/empty returns). I've just tried it again (doing the curl command while solr is its "zombie state") and i get the following reply from curl: "curl: (52) Empty reply from server" Also, I updated my Java so the HotSpot version is now 20.1-b3 using lsof I think I pinned down the problem: too many open files! I already doubled from 512 to 1024 once but it seems there are many SOCKETS involved, which are listed as "can't identify protocol", instead of "real files". over time, the list grows and grows with these entries until.. it "crashs". So Ive read several times the fix for this problem is to set the limit to a ridiculous high number but that seems a little bit of a crude fix. Why so many open sockets in the first place?
Re: Solr 3.3 crashes after ~18 hours?
Am 10.08.2011 17:11, schrieb Yonik Seeley: On Wed, Aug 10, 2011 at 11:00 AM, alexander sulz wrote: Okay, with this command it hangs. It doesn't look like a hang from this thread dump. It doesn't look like any solr requests are executing at the time the dump was taken. Did you do this from the command line? curl "http://localhost:8983/solr/update?commit=true"; Are you saying that the curl command just hung and never returned? -Yonik http://www.lucidimagination.com Also: I managed to get a Thread Dump (attached). regards Am 05.08.2011 15:08, schrieb Yonik Seeley: On Fri, Aug 5, 2011 at 7:33 AM, alexander sulz wrote: Usually you get a XML-Response when doing commits or optimize, in this case I get nothing in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T load forever or anything. It doesn't hang! I just get a blank page / empty response. Sounds like you are doing it from a browser? Can you try it from the command line? It should give back some sort of response (or hang waiting for a response). curl "http://localhost:8983/solr/update?commit=true"; -Yonik http://www.lucidimagination.com I use the stuff in the example folder, the only changes i made was enable logging and changing the port to 8985. I'll try getting a thread dump if it happens again! So far its looking good with having allocated more memory to it. Am 04.08.2011 16:08, schrieb Yonik Seeley: On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz wrote: Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting /solr/admin, I'd get a blank page! The server ignores any incoming updates or commits, "ignores" means what? The request hangs? If so, could you get a thread dump? Do queries work (like /solr/select?q=*:*) ? thous throwing no errors, no 503's.. It's like the server has a blackout and stares blankly into space. Are you using a different servlet container than what is shipped with solr? If you did start with the solr "example" server, what jetty configuration changes have you made? -Yonik http://www.lucidimagination.com Sigh it happened again, but I have a clue: before the crash I was deleting some entries but haven't optimized afterwards, then, when I tried indexing something, solr "crashed" again (responsive but just blank/empty returns). I've just tried it again (doing the curl command while solr is its "zombie state") and i get the following reply from curl: "curl: (52) Empty reply from server" Also, I updated my Java so the HotSpot version is now 20.1-b3
Re: Solr 3.3 crashes after ~18 hours?
Okay, with this command it hangs. Also: I managed to get a Thread Dump (attached). regards Am 05.08.2011 15:08, schrieb Yonik Seeley: On Fri, Aug 5, 2011 at 7:33 AM, alexander sulz wrote: Usually you get a XML-Response when doing commits or optimize, in this case I get nothing in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T load forever or anything. It doesn't hang! I just get a blank page / empty response. Sounds like you are doing it from a browser? Can you try it from the command line? It should give back some sort of response (or hang waiting for a response). curl "http://localhost:8983/solr/update?commit=true"; -Yonik http://www.lucidimagination.com I use the stuff in the example folder, the only changes i made was enable logging and changing the port to 8985. I'll try getting a thread dump if it happens again! So far its looking good with having allocated more memory to it. Am 04.08.2011 16:08, schrieb Yonik Seeley: On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz wrote: Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting /solr/admin, I'd get a blank page! The server ignores any incoming updates or commits, "ignores" means what? The request hangs? If so, could you get a thread dump? Do queries work (like /solr/select?q=*:*) ? thous throwing no errors, no 503's.. It's like the server has a blackout and stares blankly into space. Are you using a different servlet container than what is shipped with solr? If you did start with the solr "example" server, what jetty configuration changes have you made? -Yonik http://www.lucidimagination.com Full thread dump Java HotSpot(TM) Server VM (19.1-b02 mixed mode): "DestroyJavaVM" prio=10 tid=0x6e32e800 nid=0x5aeb waiting on condition [0x] java.lang.Thread.State: RUNNABLE "Timer-2" daemon prio=10 tid=0x6e3ff800 nid=0x5b0b in Object.wait() [0x6e6e5000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb0260108> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Unknown Source) - locked <0xb0260108> (a java.util.TaskQueue) at java.util.TimerThread.run(Unknown Source) "pool-1-thread-1" prio=10 tid=0x6e32dc00 nid=0x5b0a waiting on condition [0x6dae] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xb02680e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source) at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) "Timer-1" daemon prio=10 tid=0x0874e000 nid=0x5b07 in Object.wait() [0x6eb6d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb02601c0> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Unknown Source) - locked <0xb02601c0> (a java.util.TaskQueue) at java.util.TimerThread.run(Unknown Source) "8106640@qtp-25094328-9 - Acceptor0 SocketConnector@0.0.0.0:8985" prio=10 tid=0x0832dc00 nid=0x5b06 runnable [0x6ecc7000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(Unknown Source) - locked <0xb0260288> (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(Unknown Source) at java.net.ServerSocket.accept(Unknown Source) at org.mortbay.jetty.bio.SocketConnector.accept(SocketConnector.java:99) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) "9097070@qtp-25094328-8" prio=10 tid=0x0832c400 nid=0x5b05 in Object.wait() [0x6ed18000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb0264018> (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:626) - locked <0xb0264018> (a org.mortbay.thread.QueuedThreadPool$PoolThread) "409849
Re: Solr 3.3 crashes after ~18 hours?
Usually you get a XML-Response when doing commits or optimize, in this case I get nothing in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T load forever or anything. It doesn't hang! I just get a blank page / empty response. I use the stuff in the example folder, the only changes i made was enable logging and changing the port to 8985. I'll try getting a thread dump if it happens again! So far its looking good with having allocated more memory to it. Am 04.08.2011 16:08, schrieb Yonik Seeley: On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz wrote: Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting /solr/admin, I'd get a blank page! The server ignores any incoming updates or commits, "ignores" means what? The request hangs? If so, could you get a thread dump? Do queries work (like /solr/select?q=*:*) ? thous throwing no errors, no 503's.. It's like the server has a blackout and stares blankly into space. Are you using a different servlet container than what is shipped with solr? If you did start with the solr "example" server, what jetty configuration changes have you made? -Yonik http://www.lucidimagination.com
Re: Solr 3.3 crashes after ~18 hours?
Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting /solr/admin, I'd get a blank page! The server ignores any incoming updates or commits, thous throwing no errors, no 503's.. It's like the server has a blackout and stares blankly into space. I just gave allocated more memory like proposed and will keep an eye on it if the problem still persists. Thank you guys, you are awesome. Am 02.08.2011 15:23, schrieb François Schiettecatte: Assuming you are running on Linux, you might want to check /var/log/messages too (the location might vary), I think the kernel logs forced process termination there. I recall that the kernel will usually picks the process consuming the most memory, there may be other factors involved too. François On Aug 2, 2011, at 9:04 AM, wakemaster 39 wrote: Monitor your memory usage. I use to encounter a problem like this before where nothing was in the logs and the process was just gone. Turned out my system was out odd memory and swap got used up because of another process which then forced the kernel to start killing off processes. Google OOM linux and you will find plenty of other programs and people with a similar problem. Cameron On Aug 2, 2011 6:02 AM, "alexander sulz" wrote: Hello folks, I'm using the latest stable Solr release -> 3.3 and I encounter strange phenomena with it. After about 19 hours it just crashes, but I can't find anything in the logs, no exceptions, no warnings, no suspicious info entries.. I have an index-job running from 6am to 8pm every 10 minutes. After each job there is a commit. An optimize-job is done twice a day at 12:15pm and 9:15pm. Does anyone have an idea what could possibly be wrong or where to look for further debug info? regards and thank you alex
Re: Solr 3.3 crashes after ~18 hours?
Nope, none :/ Am 02.08.2011 12:33, schrieb Bernd Fehling: Any JAVA_OPTS set? Do not use "-XX:+OptimizeStringConcat" or "-XX:+AggressiveOpts" flags. Am 02.08.2011 12:01, schrieb alexander sulz: Hello folks, I'm using the latest stable Solr release -> 3.3 and I encounter strange phenomena with it. After about 19 hours it just crashes, but I can't find anything in the logs, no exceptions, no warnings, no suspicious info entries.. I have an index-job running from 6am to 8pm every 10 minutes. After each job there is a commit. An optimize-job is done twice a day at 12:15pm and 9:15pm. Does anyone have an idea what could possibly be wrong or where to look for further debug info? regards and thank you alex
Solr 3.3 crashes after ~18 hours?
Hello folks, I'm using the latest stable Solr release -> 3.3 and I encounter strange phenomena with it. After about 19 hours it just crashes, but I can't find anything in the logs, no exceptions, no warnings, no suspicious info entries.. I have an index-job running from 6am to 8pm every 10 minutes. After each job there is a commit. An optimize-job is done twice a day at 12:15pm and 9:15pm. Does anyone have an idea what could possibly be wrong or where to look for further debug info? regards and thank you alex
Jetty Logs - Max line size?
Hello I enabled Jetty Logs but my GET requests seem so long that they get truncated and without a line break, so in the end it looks like this: notice the logged ping and where it begins. How can i change this? thank you very much 000.000.000.000 - - [27/Jul/2011:17:38:04 +0100] "GET /solr/select?fl=id%2Cdoc_feature%2Cmandant%2Cd_id%2Cdoc_title%2Cdoc_id%2Cscore%2Ccategory%2Canrede%2Ckeyword_a%2Ckeyword_a_name%2Cmenu_id%2Cprice%2Caprice%2Cstandort_id%2Cdate_start%2Cdate_end%2Cdlpath%2Cdownloads%2Cdoc_type%2Cdoc_id%2Cmenu_path_text%2Ctitle%2Csummary%2Csubtitle%2Cdate_online%2Ctables%2Ckdnrzen%2Cplz%2Cort%2Cadresse%2Cbundesland%2Ctelefonnummer%2Cmobil%2Cfax%2Cemail%2Cnachname%2Cvorname%2Chauptfunktion%2Cbereichname%2Cfilialename&sort=date_online+desc&rows=0&version=1.2&wt=json&json.nl=map&q=text_copy%3A%28schuhe%29+%28category%3A%22Lagerhaus%22+OR+%28category%3ASortiment+AND+mandant%3A%28%22Lagerhaus%22+OR+Portal%29%29+OR+kdnrzen%3A%28%2A%29%29+AND+-%28doc_feature%3Abroschuere+AND+doc_type%3Acontent%29+AND+-%28menu_path_text%3A%2ATables%2A%29+AND+-%28id%3Acld_%2A%29+date_online%3A%5B2006-06-21T00%3A00%3A00Z+TO+2011-7-27T23%3A59%3A59Z%5D+date_offline%3A%5B2011-7-27T00%3A00%3A00Z+TO+%2A%5D+%28doc_type%3Acontent+AND+-doc_feature%3A%28ange000.000.000.000 - - [27/Jul/2011:17:38:04 +0100] "HEAD /solr/admin/ping HTTP/1.0" 200 0
Re: Average PDF index time
Am 12.07.2011 10:08, schrieb alexander sulz: Hi all, Are there some kind of average indexing times or PDF's in relation to its size? I have here a 10MB PDF (50 pages) which takes about 30 seconds to index! Is that normal? Depends on you hardware. PDF parsing is a lot more tedious than XML and besides parsing it's also analyzed and stored and maybe even committed. Is it a problem or do you have many thousands of files with this size? Luckily I don't there just about 500 of them all in all and about 100 of them are bigger, 10 of them even problematicly big so that my php script stops working but thats another problem. Unfortunatly I don't have a clue about the server spec's or know anyone who does. greetings alex So I figured out I had my "bleeding-edge" Version of Solr running. It was 3.3 with the latest tika pulled from SVN (tika1.0-SNAPSHOT). I reverted back to the stable 0.9 release and now I get 2 seconds index time for the same PDF! Still, why the PHP stops working correctly is beyond me, but it seems to be fixed now. regards alex
Re: Average PDF index time
Hi all, Are there some kind of average indexing times or PDF's in relation to its size? I have here a 10MB PDF (50 pages) which takes about 30 seconds to index! Is that normal? Depends on you hardware. PDF parsing is a lot more tedious than XML and besides parsing it's also analyzed and stored and maybe even committed. Is it a problem or do you have many thousands of files with this size? Luckily I don't there just about 500 of them all in all and about 100 of them are bigger, 10 of them even problematicly big so that my php script stops working but thats another problem. Unfortunatly I don't have a clue about the server spec's or know anyone who does. greetings alex
Average PDF index time
Hi all, Are there some kind of average indexing times or PDF's in relation to its size? I have here a 10MB PDF (50 pages) which takes about 30 seconds to index! Is that normal? greetings alex
Re: Controlling Tika's metadata
I have the same problem with discarding the metadata title. I thought the parameter "captureAttr" (can be provided at the solrconfig.xml and via get/post as a parameter) is responsible for that? I set it to false in in the xml and as a parameter, still, I get "not multivalued field" errors due to metadata & literals delivering content to a "no multivalued" field. ;( using 3.1 though. On 02.02.2011 17:13, Grant Ingersoll wrote: On Jan 28, 2011, at 5:38 PM, Andreas Kemkes wrote: Just getting my feet wet with the text extraction using both schema and solrconfig settings from the example directory in the 1.4 distribution, so I might miss something obvious. Trying to provide my own title (and discarding the one received through Tika's metadata) wasn't straightforward. I had to use the following: fmap.title=tika_title (to discard the Tika title) literal.attr_title=New Title (to provide the correct one) fmap.attr_title=title (to map it back to the field as I would like to use title in searches) Is there anything easier than the above? How can this best be generalized to other metadata provided by Tika (which in our use case will be mostly ignored, as it is provided separately)? You can provide your own ContentHandler (see the wiki docs). I think it would be reasonable to patch the ExtractingRequestHandler to have a no metadata option and it wouldn't be that hard.
Search Cloud , store stemmed Tokens?
Hello dear Solr Users.. As far as I understand, I am able to process stuff with analyzers (and in there with tokenizers and filters and whatnot) before indexing, but is it also possible to do that before storing the input into a field? What I want to do is to store some search words from users to make a Search Cloud! Ideally, before storing those words, I want to stem them into their base form,.. So if some people search for "howls", "howling", "howled".. only "howl" will be stored into the field. With this, I can do a facet query and easily make a cloud out of that. (sorry for double posting, seems my mail ended up somwhere else, sorry about that) thanks for your patience alex
Search Cloud , store stemmed Tokens?
Hello dear Solr Users.. As far as I understand, I am able to process stuff with analyzers (and in there with tokenizers and filters and whatnot) before indexing, but is it also possible to do that before storing the input into a field? What I want to do is to store some search words from users to make a Search Cloud! Ideally, before storing those words, I want to stem them into their base form,.. So if some people search for "howls", "howling", "howled".. only "howl" will be stored into the field. With this, I can do a facet query and easily make a cloud out of that. thanks for your patience alex
Umlaut in facet name attribute
Good Evening and Morning. I noticed that if I do a facet search on a field which value contains umlaute (öäü), the facet list returned converted the value of the field into a normal character (oau).. How do I precent this from happening? I cant seem to find the configuration for faceting in theschema or config xml files. thx alex
Re: Search the mailinglist?
Many thank yous to all of you :) Am 17.09.2010 17:24, schrieb Walter Underwood: Or, for a fascinating multi-dimensional UI to mailing list archives: http://markmail.org/ --wunder On Sep 17, 2010, at 7:15 AM, Markus Jelsma wrote: http://www.lucidimagination.com/search/?q= On Friday 17 September 2010 16:10:23 alexander sulz wrote: Im sry to bother you all with this, but is there a way to search through the mailinglist archive? Ive found http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ so far but there isnt any convinient way to search through the archive. Thanks for your help Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Indexing PDF - literal field already there & many "null"'s in text field
Hi everyone. Im successfully indexing PDF files right now but I still got some problems. 1. Tika seems to map some content to appropiate fields in my schema.xml If I pass on a literal.title=blabla parameter, tika may have parsed some information out of the pdf to fill in the field "title" itself. Now title is not a multiValued field, so I get an error. How can I change this behaviour, making tika stop filling fields for example. 2. My "text" field is successfully filled with content parsed by tika, but it contains many "null" strings. Here is a little extract: nullommen nullie mit diesem ausgefnuten nulleratungs-nullutschein nullu einem Lagerhaus nullaustoffnullerater in einem Lagerhaus in nullhrer Nnullhe und fragen nullie nach dem Energiesnullar-Potennullial fnull nullhr Eigenheimnull Die kostenlose Energiespar-Beratung ist gültig bis nullunull nullnullDenullenullber nullnullnullnullunnullin nullenuller Lagernullaus-Baustoffe nullbteilung einlnullsbarnullDie persnullnlinullnulle Energiespar- Beratung erfolgt aussnullnulllienulllinullnullinullLagernullausnullDieser Beratungs-nullutsnullnullein ist eine kostenlose Sernullinulleleistung für nullie Erstellung eines unnullerbinnulllinullnullen nullngebotes nullur Optinullierung nuller EnergieeffinulliennullInullres Eigennulleinulles für nullen oben nullefinierten nulleitraunullnull Quelle: Fachverband Wärmedämm-Verbundsysteme, Baden-Baden nie nulli enull er Fa ss anull en ris senull anull snull anulll null nullm anull nullinullnull spr eis einull e F enulls nuller nullanull nullnullnullnull ei null enullnull re anullnullinullnullsfenullsnullernullanullnull 1nullm nullnuller null5m nullanullimale nullualitätnull • für innen und aunullen • langlebig und nulletterfest • nullarm und pnullegeleicht nullunullenfensterbanknullnullnull,null cm 1nullnullnullnullnulllfm nullelnullpal cnullnullnullacnullminullnullnullfacnulls cnullnullnullnull fnull m anullernullrnullnullFassanulle nullFenullsnuller Thanks for your time
Search the mailinglist?
Im sry to bother you all with this, but is there a way to search through the mailinglist archive? Ive found http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ so far but there isnt any convinient way to search through the archive. Thanks for your help