from:"\"alexander sulz\""

Re: Solr 3.3 crashes after ~18 hours?

2011-08-19 Thread alexander sulz


Am 19.08.2011 16:43, schrieb Yonik Seeley:

On Fri, Aug 19, 2011 at 10:36 AM, alexander sulz  wrote:

using lsof I think I pinned down the problem: too many open files!
I already doubled from 512 to 1024 once but it seems there are many SOCKETS
involved,
which are listed as "can't identify protocol", instead of "real files".
over time, the list grows and grows with these entries until.. it "crashs".
So Ive read several times the fix for this problem is to set the limit to a
ridiculous high number but
that seems a little bit of a crude fix. Why so many open sockets in the
first place?

What are you using as a client to talk to solr?
You need to look at both the update side and the query side.
Using persistent connections is the best all-around, but if not, be
sure to close the connections in the client.

-Yonik
http://www.lucidimagination.com
I use PHP to talk to solr, this one to be exact 
http://code.google.com/p/solr-php-client/ version r22 i guess.

I'll try updating it and see what happens..

Re: Solr 3.3 crashes after ~18 hours?

2011-08-19 Thread alexander sulz


Am 19.08.2011 15:48, schrieb alexander sulz:

Am 10.08.2011 17:11, schrieb Yonik Seeley:
On Wed, Aug 10, 2011 at 11:00 AM, alexander 
sulz  wrote:

Okay, with this command it hangs.

It doesn't look like a hang from this thread dump.  It doesn't look
like any solr requests are executing at the time the dump was taken.

Did you do this from the command line?
curl "http://localhost:8983/solr/update?commit=true";

Are you saying that the curl command just hung and never returned?

-Yonik
http://www.lucidimagination.com


Also: I managed to get a Thread Dump (attached).

regards

Am 05.08.2011 15:08, schrieb Yonik Seeley:

On Fri, Aug 5, 2011 at 7:33 AM, alexander sulz
  wrote:
Usually you get a XML-Response when doing commits or optimize, in 
this

case
I get nothing
in return, but the site ( http://[...]/solr/update?optimize=true )
DOESN'T
load forever or anything.
It doesn't hang! I just get a blank page / empty response.

Sounds like you are doing it from a browser?
Can you try it from the command line?  It should give back some sort
of response (or hang waiting for a response).

curl "http://localhost:8983/solr/update?commit=true";

-Yonik
http://www.lucidimagination.com


I use the stuff in the example folder, the only changes i made was 
enable

logging and changing the port to 8985.
I'll try getting a thread dump if it happens again!
So far its looking good with having allocated more memory to it.

Am 04.08.2011 16:08, schrieb Yonik Seeley:
On Thu, Aug 4, 2011 at 8:09 AM, alexander 
sulz

  wrote:

Thank you for the many replies!

Like I said, I couldn't find anything in logs created by solr.
I just had a look at the /var/logs/messages and there wasn't 
anything

either.

What I mean by crash is that the process is still there and http 
GET

pings
would return 200
but when i try visiting /solr/admin, I'd get a blank page! The 
server

ignores any incoming updates or commits,
"ignores" means what?  The request hangs?  If so, could you get a 
thread

dump?

Do queries work (like /solr/select?q=*:*) ?


thous throwing no errors, no 503's.. It's like the server has a
blackout
and
stares blankly into space.
Are you using a different servlet container than what is shipped 
with

solr?
If you did start with the solr "example" server, what jetty
configuration changes have you made?

-Yonik
http://www.lucidimagination.com


Sigh it happened again, but I have a clue: before the crash I was 
deleting some entries but haven't optimized afterwards, then, when I 
tried indexing something, solr "crashed" again (responsive but just 
blank/empty returns).


I've just tried it again (doing the curl command while solr is its 
"zombie state")
and i get the following reply from curl: "curl: (52) Empty reply from 
server"


Also, I updated my Java so the HotSpot version is now 20.1-b3

using lsof I think I pinned down the problem: too many open files!
I already doubled from 512 to 1024 once but it seems there are many 
SOCKETS involved,

which are listed as "can't identify protocol", instead of "real files".
over time, the list grows and grows with these entries until.. it "crashs".
So Ive read several times the fix for this problem is to set the limit 
to a ridiculous high number but
that seems a little bit of a crude fix. Why so many open sockets in the 
first place?

Re: Solr 3.3 crashes after ~18 hours?

2011-08-19 Thread alexander sulz


Am 10.08.2011 17:11, schrieb Yonik Seeley:

On Wed, Aug 10, 2011 at 11:00 AM, alexander sulz  wrote:

Okay, with this command it hangs.

It doesn't look like a hang from this thread dump.  It doesn't look
like any solr requests are executing at the time the dump was taken.

Did you do this from the command line?
curl "http://localhost:8983/solr/update?commit=true";

Are you saying that the curl command just hung and never returned?

-Yonik
http://www.lucidimagination.com


Also: I managed to get a Thread Dump (attached).

regards

Am 05.08.2011 15:08, schrieb Yonik Seeley:

On Fri, Aug 5, 2011 at 7:33 AM, alexander sulz
  wrote:

Usually you get a XML-Response when doing commits or optimize, in this
case
I get nothing
in return, but the site ( http://[...]/solr/update?optimize=true )
DOESN'T
load forever or anything.
It doesn't hang! I just get a blank page / empty response.

Sounds like you are doing it from a browser?
Can you try it from the command line?  It should give back some sort
of response (or hang waiting for a response).

curl "http://localhost:8983/solr/update?commit=true";

-Yonik
http://www.lucidimagination.com



I use the stuff in the example folder, the only changes i made was enable
logging and changing the port to 8985.
I'll try getting a thread dump if it happens again!
So far its looking good with having allocated more memory to it.

Am 04.08.2011 16:08, schrieb Yonik Seeley:

On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz
  wrote:

Thank you for the many replies!

Like I said, I couldn't find anything in logs created by solr.
I just had a look at the /var/logs/messages and there wasn't anything
either.

What I mean by crash is that the process is still there and http GET
pings
would return 200
but when i try visiting /solr/admin, I'd get a blank page! The server
ignores any incoming updates or commits,

"ignores" means what?  The request hangs?  If so, could you get a thread
dump?

Do queries work (like /solr/select?q=*:*) ?


thous throwing no errors, no 503's.. It's like the server has a
blackout
and
stares blankly into space.

Are you using a different servlet container than what is shipped with
solr?
If you did start with the solr "example" server, what jetty
configuration changes have you made?

-Yonik
http://www.lucidimagination.com


Sigh it happened again, but I have a clue: before the crash I was 
deleting some entries but haven't optimized afterwards, then, when I 
tried indexing something, solr "crashed" again (responsive but just 
blank/empty returns).


I've just tried it again (doing the curl command while solr is its 
"zombie state")
and i get the following reply from curl: "curl: (52) Empty reply from 
server"


Also, I updated my Java so the HotSpot version is now 20.1-b3

Re: Solr 3.3 crashes after ~18 hours?

2011-08-10 Thread alexander sulz


Okay, with this command it hangs.
Also: I managed to get a Thread Dump (attached).

regards

Am 05.08.2011 15:08, schrieb Yonik Seeley:

On Fri, Aug 5, 2011 at 7:33 AM, alexander sulz  wrote:

Usually you get a XML-Response when doing commits or optimize, in this case
I get nothing
in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T
load forever or anything.
It doesn't hang! I just get a blank page / empty response.

Sounds like you are doing it from a browser?
Can you try it from the command line?  It should give back some sort
of response (or hang waiting for a response).

curl "http://localhost:8983/solr/update?commit=true";

-Yonik
http://www.lucidimagination.com



I use the stuff in the example folder, the only changes i made was enable
logging and changing the port to 8985.
I'll try getting a thread dump if it happens again!
So far its looking good with having allocated more memory to it.

Am 04.08.2011 16:08, schrieb Yonik Seeley:

On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz
  wrote:

Thank you for the many replies!

Like I said, I couldn't find anything in logs created by solr.
I just had a look at the /var/logs/messages and there wasn't anything
either.

What I mean by crash is that the process is still there and http GET
pings
would return 200
but when i try visiting /solr/admin, I'd get a blank page! The server
ignores any incoming updates or commits,

"ignores" means what?  The request hangs?  If so, could you get a thread
dump?

Do queries work (like /solr/select?q=*:*) ?


thous throwing no errors, no 503's.. It's like the server has a blackout
and
stares blankly into space.

Are you using a different servlet container than what is shipped with
solr?
If you did start with the solr "example" server, what jetty
configuration changes have you made?

-Yonik
http://www.lucidimagination.com




Full thread dump Java HotSpot(TM) Server VM (19.1-b02 mixed mode):

"DestroyJavaVM" prio=10 tid=0x6e32e800 nid=0x5aeb waiting on condition 
[0x]
   java.lang.Thread.State: RUNNABLE

"Timer-2" daemon prio=10 tid=0x6e3ff800 nid=0x5b0b in Object.wait() [0x6e6e5000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xb0260108> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Unknown Source)
- locked <0xb0260108> (a java.util.TaskQueue)
at java.util.TimerThread.run(Unknown Source)

"pool-1-thread-1" prio=10 tid=0x6e32dc00 nid=0x5b0a waiting on condition 
[0x6dae]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xb02680e8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown
 Source)
at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

"Timer-1" daemon prio=10 tid=0x0874e000 nid=0x5b07 in Object.wait() [0x6eb6d000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xb02601c0> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Unknown Source)
- locked <0xb02601c0> (a java.util.TaskQueue)
at java.util.TimerThread.run(Unknown Source)

"8106640@qtp-25094328-9 - Acceptor0 SocketConnector@0.0.0.0:8985" prio=10 
tid=0x0832dc00 nid=0x5b06 runnable [0x6ecc7000]
   java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(Unknown Source)
- locked <0xb0260288> (a java.net.SocksSocketImpl)
at java.net.ServerSocket.implAccept(Unknown Source)
at java.net.ServerSocket.accept(Unknown Source)
at org.mortbay.jetty.bio.SocketConnector.accept(SocketConnector.java:99)
at 
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

"9097070@qtp-25094328-8" prio=10 tid=0x0832c400 nid=0x5b05 in Object.wait() 
[0x6ed18000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xb0264018> (a 
org.mortbay.thread.QueuedThreadPool$PoolThread)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:626)
- locked <0xb0264018> (a org.mortbay.thread.QueuedThreadPool$PoolThread)

"409849

Re: Solr 3.3 crashes after ~18 hours?

2011-08-05 Thread alexander sulz

Usually you get a XML-Response when doing commits or optimize, in this 
case I get nothing
in return, but the site ( http://[...]/solr/update?optimize=true ) 
DOESN'T load forever or anything.

It doesn't hang! I just get a blank page / empty response.
I use the stuff in the example folder, the only changes i made was 
enable logging and changing the port to 8985.

I'll try getting a thread dump if it happens again!
So far its looking good with having allocated more memory to it.

Am 04.08.2011 16:08, schrieb Yonik Seeley:

On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz  wrote:

Thank you for the many replies!

Like I said, I couldn't find anything in logs created by solr.
I just had a look at the /var/logs/messages and there wasn't anything
either.

What I mean by crash is that the process is still there and http GET pings
would return 200
but when i try visiting /solr/admin, I'd get a blank page! The server
ignores any incoming updates or commits,

"ignores" means what?  The request hangs?  If so, could you get a thread dump?

Do queries work (like /solr/select?q=*:*) ?


thous throwing no errors, no 503's.. It's like the server has a blackout and
stares blankly into space.

Are you using a different servlet container than what is shipped with solr?
If you did start with the solr "example" server, what jetty
configuration changes have you made?

-Yonik
http://www.lucidimagination.com

Re: Solr 3.3 crashes after ~18 hours?

2011-08-04 Thread alexander sulz


Thank you for the many replies!

Like I said, I couldn't find anything in logs created by solr.
I just had a look at the /var/logs/messages and there wasn't anything 
either.


What I mean by crash is that the process is still there and http GET 
pings would return 200
but when i try visiting /solr/admin, I'd get a blank page! The server 
ignores any incoming updates or commits,
thous throwing no errors, no 503's.. It's like the server has a blackout 
and stares blankly into space.


I just gave allocated more memory like proposed and will keep an eye on 
it if the problem still persists.


Thank you guys, you are awesome.


Am 02.08.2011 15:23, schrieb François Schiettecatte:

Assuming you are running on Linux, you might want to check /var/log/messages 
too (the location might vary), I think the kernel logs forced process 
termination there. I recall that the kernel will usually picks the process 
consuming the most memory, there may be other factors involved too.

François

On Aug 2, 2011, at 9:04 AM, wakemaster 39 wrote:


Monitor your memory usage.  I use to encounter a problem like this before
where nothing was in the logs and the process was just gone.

Turned out my system was out odd memory and swap got used up because of
another process which then forced the kernel to start killing off processes.
Google OOM linux and you will find plenty of other programs and people with
a similar problem.

Cameron
On Aug 2, 2011 6:02 AM, "alexander sulz"  wrote:

Hello folks,

I'm using the latest stable Solr release ->  3.3 and I encounter strange
phenomena with it.
After about 19 hours it just crashes, but I can't find anything in the
logs, no exceptions, no warnings,
no suspicious info entries..

I have an index-job running from 6am to 8pm every 10 minutes. After each
job there is a commit.
An optimize-job is done twice a day at 12:15pm and 9:15pm.

Does anyone have an idea what could possibly be wrong or where to look
for further debug info?

regards and thank you
alex

Re: Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread alexander sulz


Nope, none :/

Am 02.08.2011 12:33, schrieb Bernd Fehling:

Any JAVA_OPTS set?

Do not use "-XX:+OptimizeStringConcat" or "-XX:+AggressiveOpts" flags.


Am 02.08.2011 12:01, schrieb alexander sulz:

Hello folks,

I'm using the latest stable Solr release -> 3.3 and I encounter 
strange phenomena with it.
After about 19 hours it just crashes, but I can't find anything in 
the logs, no exceptions, no warnings,

no suspicious info entries..

I have an index-job running from 6am to 8pm every 10 minutes. After 
each job there is a commit.

An optimize-job is done twice a day at 12:15pm and 9:15pm.

Does anyone have an idea what could possibly be wrong or where to 
look for further debug info?


regards and thank you
alex

Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread alexander sulz


Hello folks,

I'm using the latest stable Solr release -> 3.3 and I encounter strange 
phenomena with it.
After about 19 hours it just crashes, but I can't find anything in the 
logs, no exceptions, no warnings,

no suspicious info entries..

I have an index-job running from 6am to 8pm every 10 minutes. After each 
job there is a commit.

An optimize-job is done twice a day at 12:15pm and 9:15pm.

Does anyone have an idea what could possibly be wrong or where to look 
for further debug info?


regards and thank you
 alex

Jetty Logs - Max line size?

2011-07-27 Thread alexander sulz


Hello

I enabled Jetty Logs but my GET requests seem so long that they get 
truncated and without a line break,

so in the end it looks like this:

notice the logged ping and where it begins.
How can i change this?

thank you very much

000.000.000.000 -  -  [27/Jul/2011:17:38:04 +0100] "GET 
/solr/select?fl=id%2Cdoc_feature%2Cmandant%2Cd_id%2Cdoc_title%2Cdoc_id%2Cscore%2Ccategory%2Canrede%2Ckeyword_a%2Ckeyword_a_name%2Cmenu_id%2Cprice%2Caprice%2Cstandort_id%2Cdate_start%2Cdate_end%2Cdlpath%2Cdownloads%2Cdoc_type%2Cdoc_id%2Cmenu_path_text%2Ctitle%2Csummary%2Csubtitle%2Cdate_online%2Ctables%2Ckdnrzen%2Cplz%2Cort%2Cadresse%2Cbundesland%2Ctelefonnummer%2Cmobil%2Cfax%2Cemail%2Cnachname%2Cvorname%2Chauptfunktion%2Cbereichname%2Cfilialename&sort=date_online+desc&rows=0&version=1.2&wt=json&json.nl=map&q=text_copy%3A%28schuhe%29+%28category%3A%22Lagerhaus%22+OR+%28category%3ASortiment+AND+mandant%3A%28%22Lagerhaus%22+OR+Portal%29%29+OR+kdnrzen%3A%28%2A%29%29+AND+-%28doc_feature%3Abroschuere+AND+doc_type%3Acontent%29+AND+-%28menu_path_text%3A%2ATables%2A%29+AND+-%28id%3Acld_%2A%29+date_online%3A%5B2006-06-21T00%3A00%3A00Z+TO+2011-7-27T23%3A59%3A59Z%5D+date_offline%3A%5B2011-7-27T00%3A00%3A00Z+TO+%2A%5D+%28doc_type%3Acontent+AND+-doc_feature%3A%28ange000.000.000.000 
-  -  [27/Jul/2011:17:38:04 +0100] "HEAD /solr/admin/ping HTTP/1.0" 200 0

Re: Average PDF index time

2011-07-12 Thread alexander sulz


Am 12.07.2011 10:08, schrieb alexander sulz:



Hi all,

Are there some kind of average indexing times or PDF's in relation to
its size?
I have here a 10MB PDF (50 pages) which takes about 30 seconds to 
index!

Is that normal?

Depends on you hardware. PDF parsing is a lot more tedious than XML and
besides parsing it's also analyzed and stored and maybe even 
committed. Is it

a problem or do you have many thousands of files with this size?


Luckily I don't there just about 500 of them all in all and about 100 
of them are bigger,
10 of them even problematicly big so that my php script stops working 
but thats another problem.
Unfortunatly I don't have a clue about the server spec's or know 
anyone who does.

greetings
   alex


So I figured out I had my "bleeding-edge" Version of Solr running.
It was 3.3 with the latest tika pulled from SVN (tika1.0-SNAPSHOT).
I reverted back to the stable 0.9 release and now I get 2 seconds index 
time for the same PDF!
Still, why the PHP stops working correctly is beyond me, but it seems to 
be fixed now.


regards
 alex

Re: Average PDF index time

2011-07-12 Thread alexander sulz




Hi all,

Are there some kind of average indexing times or PDF's in relation to
its size?
I have here a 10MB PDF (50 pages) which takes about 30 seconds to index!
Is that normal?

Depends on you hardware. PDF parsing is a lot more tedious than XML and
besides parsing it's also analyzed and stored and maybe even committed. Is it
a problem or do you have many thousands of files with this size?
Luckily I don't there just about 500 of them all in all and about 100 of 
them are bigger,
10 of them even problematicly big so that my php script stops working 
but thats another problem.
Unfortunatly I don't have a clue about the server spec's or know anyone 
who does.

greetings
   alex

Average PDF index time

2011-07-11 Thread alexander sulz


Hi all,

Are there some kind of average indexing times or PDF's in relation to 
its size?

I have here a 10MB PDF (50 pages) which takes about 30 seconds to index!
Is that normal?

greetings
 alex

Re: Controlling Tika's metadata

2011-06-17 Thread alexander sulz


I have the same problem with discarding the metadata title.
I thought the parameter "captureAttr" (can be provided at the 
solrconfig.xml and via get/post as a parameter) is responsible for that? 
I set it to false in in the xml and as a parameter, still, I get "not 
multivalued field" errors due to metadata & literals delivering content 
to a "no multivalued" field. ;(


using 3.1 though.

On 02.02.2011 17:13, Grant Ingersoll wrote:

On Jan 28, 2011, at 5:38 PM, Andreas Kemkes wrote:


Just getting my feet wet with the text extraction using both schema and
solrconfig settings from the example directory in the 1.4 distribution, so I
might miss something obvious.

Trying to provide my own title (and discarding the one received through Tika's
metadata) wasn't straightforward. I had to use the following:

fmap.title=tika_title (to discard the Tika title)
literal.attr_title=New Title (to provide the correct one)
fmap.attr_title=title (to map it back to the field as I would like to use title
in searches)

Is there anything easier than the above?

How can this best be generalized to other metadata provided by Tika (which in
our use case will be mostly ignored, as it is provided separately)?

You can provide your own ContentHandler (see the wiki docs).  I think it would 
be reasonable to patch the ExtractingRequestHandler to have a no metadata 
option and it wouldn't be that hard.

Search Cloud , store stemmed Tokens?

2010-10-08 Thread alexander sulz


  Hello dear Solr Users..

As far as I understand, I am able to process stuff with analyzers (and 
in there with tokenizers and filters and whatnot)
before indexing, but is it also possible to do that before storing the 
input into a field?
What I want to do is to store some search words from users to make a 
Search Cloud!
Ideally, before storing those words, I want to stem them into their base 
form,..
So if some people search for "howls", "howling", "howled".. only "howl" 
will be stored into the field.

With this, I can do a facet query and easily make a cloud out of that.

(sorry for double posting, seems my mail ended up somwhere else, sorry 
about that)


thanks for your patience

 alex

Search Cloud , store stemmed Tokens?

2010-10-08 Thread alexander sulz


 Hello dear Solr Users..

As far as I understand, I am able to process stuff with analyzers (and 
in there with tokenizers and filters and whatnot)
before indexing, but is it also possible to do that before storing the 
input into a field?
What I want to do is to store some search words from users to make a 
Search Cloud!
Ideally, before storing those words, I want to stem them into their base 
form,..
So if some people search for "howls", "howling", "howled".. only "howl" 
will be stored into the field.

With this, I can do a facet query and easily make a cloud out of that.

thanks for your patience

 alex

Umlaut in facet name attribute

2010-10-05 Thread alexander sulz


 Good Evening and Morning.

I noticed that if I do a facet search on a field which value contains 
umlaute (öäü),
the facet list returned converted the value of the field into a normal 
character (oau)..


How do I precent this from happening?

I cant seem to find the configuration for faceting in theschema or 
config xml files.


thx
 alex

Re: Search the mailinglist?

2010-09-17 Thread alexander sulz


 Many thank yous to all of you :)

Am 17.09.2010 17:24, schrieb Walter Underwood:

Or, for a fascinating multi-dimensional UI to mailing list archives: 
http://markmail.org/  --wunder

On Sep 17, 2010, at 7:15 AM, Markus Jelsma wrote:


http://www.lucidimagination.com/search/?q=


On Friday 17 September 2010 16:10:23 alexander sulz wrote:

  Im sry to bother you all with this, but is there a way to search through
the mailinglist archive? Ive found
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ so far
but there isnt any convinient way to search through the archive.

Thanks for your help


Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Indexing PDF - literal field already there & many "null"'s in text field

2010-09-17 Thread alexander sulz


 Hi everyone.

Im successfully indexing PDF files right now but I still got some problems.

1. Tika seems to map some content to appropiate fields in my schema.xml
If I pass on a literal.title=blabla parameter, tika may have parsed some 
information

out of the pdf to fill in the field "title" itself.
Now title is not a multiValued field, so I get an error. How can I 
change this behaviour,

making tika stop filling fields for example.

2. My "text" field is successfully filled with content parsed by tika, 
but it contains

many "null" strings. Here is a little extract:
nullommen nullie mit diesem ausgefnuten nulleratungs-nullutschein 
nullu einem Lagerhaus nullaustoffnullerater in
einem Lagerhaus in nullhrer Nnullhe und fragen nullie nach dem 
Energiesnullar-Potennullial fnull nullhr Eigenheimnull
Die kostenlose Energiespar-Beratung ist gültig bis nullunull 
nullnullDenullenullber nullnullnullnullunnullin nullenuller 
Lagernullaus-Baustoffe nullbteilung einlnullsbarnullDie 
persnullnlinullnulle Energiespar-
Beratung erfolgt 
aussnullnulllienulllinullnullinullLagernullausnullDieser 
Beratungs-nullutsnullnullein ist eine kostenlose Sernullinulleleistung 
für nullie Erstellung eines unnullerbinnulllinullnullen nullngebotes
nullur Optinullierung nuller EnergieeffinulliennullInullres 
Eigennulleinulles für nullen oben nullefinierten nulleitraunullnull

Quelle: Fachverband Wärmedämm-Verbundsysteme, Baden-Baden
nie
nulli
enull
er Fa
ss
anull
en
ris
senull
anull
snull
anulll null
nullm
anull
nullinullnull
spr
eis
einull
e F
enulls
nuller
nullanull
nullnullnullnull
ei null
enullnull
re
anullnullinullnullsfenullsnullernullanullnull
1nullm nullnuller null5m
nullanullimale nullualitätnull
• für innen und aunullen
• langlebig und nulletterfest
• nullarm und pnullegeleicht
nullunullenfensterbanknullnullnull,null cm
1nullnullnullnullnulllfm
nullelnullpal cnullnullnullacnullminullnullnullfacnulls cnullnullnullnull
fnull m anullernullrnullnullFassanulle nullFenullsnuller

Thanks for your time

Search the mailinglist?

2010-09-17 Thread alexander sulz


 Im sry to bother you all with this, but is there a way to search through
the mailinglist archive? Ive found 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ so far

but there isnt any convinient way to search through the archive.

Thanks for your help

Re: Solr 3.3 crashes after ~18 hours?

Re: Solr 3.3 crashes after ~18 hours?

Re: Solr 3.3 crashes after ~18 hours?

Re: Solr 3.3 crashes after ~18 hours?

Re: Solr 3.3 crashes after ~18 hours?

Re: Solr 3.3 crashes after ~18 hours?

Re: Solr 3.3 crashes after ~18 hours?

Solr 3.3 crashes after ~18 hours?

Jetty Logs - Max line size?

Re: Average PDF index time

Re: Average PDF index time

Average PDF index time

Re: Controlling Tika's metadata

Search Cloud , store stemmed Tokens?

Search Cloud , store stemmed Tokens?

Umlaut in facet name attribute

Re: Search the mailinglist?

Indexing PDF - literal field already there & many "null"'s in text field

Search the mailinglist?

19 matches

Site Navigation

Mail list logo

Footer information