date:20081121

[jira] Commented: (SOLR-842) Better error handling for DIH

2008-11-21 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649904#action_12649904
 ] 

Shalin Shekhar Mangar commented on SOLR-842:


Lance, Transformers can add two special fields to a row "$hasMore" and 
"$nextUrl" which tells the XPathEntityProcessor whether to stop now and if not, 
what is the nextUrl to be fetched. You can write a transformer which adds these 
special fields based on whether you have more results or not. Maybe that can be 
used here?

> Better error handling for DIH
> -
>
> Key: SOLR-842
> URL: https://issues.apache.org/jira/browse/SOLR-842
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-842.patch, SOLR-842.patch, SOLR-842.patch, 
> SOLR-842.patch
>
>
> Currently DIH fails completely on any error. We must have better control on 
> error behavior
> mail thread: http://markmail.org/message/xvfbfaskfmlj2pnm
> an entity can have an attribute {{onError}} the values can be {{abort, 
> continue,skip}}
> abort is the default . It aborts the import. continue or skip does not fail 
> the import it continues from there. skip skips all rows in an xml (only if 
> stream != true)if there is an error in xml but continues with the next xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-84) Logo Contests

2008-11-21 Thread solprovider (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

solprovider updated SOLR-84:


Attachment: solr.s7.jpg

Attached solr.s7.jpg.  Very resizable. Focus on search rather than "solar".

> Logo Contests
> -
>
> Key: SOLR-84
> URL: https://issues.apache.org/jira/browse/SOLR-84
> Project: Solr
>  Issue Type: Improvement
>Reporter: Bertrand Delacretaz
>Priority: Minor
> Attachments: apache-solr-004.png, apache_soir_001.ai.zip, 
> apache_soir_001.jpg, apache_solr_a_blue.jpg, apache_solr_a_red.jpg, 
> apache_solr_b_blue.jpg, apache_solr_b_red.jpg, apache_solr_burning.png, 
> apache_solr_c_blue.jpg, apache_solr_c_blue_shirt.jpg, apache_solr_c_red.jpg, 
> apache_solr_contour.png, apache_solr_d_blue.jpg, apache_solr_d_red.jpg, 
> apache_solr_sun.png, logo-grid.jpg, logo-solr-d.jpg, logo-solr-e.jpg, 
> logo-solr-source-files-take2.zip, logo_remake.jpg, logo_remake.svg, 
> solr-84-source-files.zip, solr-circle-grad.png, solr-f.jpg, 
> solr-greyscale.png, solr-logo-20061214.jpg, solr-logo-20061218.JPG, 
> solr-logo-20070124.JPG, solr-logo.jpg, solr-logo.png, solr-nick.gif, 
> solr-solid-R.png, solr-solid.png, solr.jpg, solr.png, solr.s1.jpg, 
> solr.s2.jpg, solr.s3.jpg, solr.s4.jpg, solr.s5.jpg, solr.s7.jpg, solr.svg, 
> solr2_maho-vote.png, solr2_maho.png, solr2_maho_impression.png, 
> solr3_maho.png, solr_attempt.jpg, solr_attempt2.jpg, solr_sp.png, 
> solrlogo.jpg, solrlogo2.jpg, sslogo-solr-70s.png, sslogo-solr-classic.png, 
> sslogo-solr-dance.png, sslogo-solr-fiesta.png, sslogo-solr-finder2.0.png
>
>
> This issue was original a scratch pad for various ideas for new Logos.  It is 
> now being used as a repository for submissions for the Solr Logo Contest...
>http://wiki.apache.org/solr/LogoContest
> Note that many of the images currently attached are not eligible for the 
> contest since they do not meet the official guidelines for new Apache project 
> logos (in particular that the full project name "Apache Solr" must be 
> included in the Logo).  Only eligible attachments will be included in the 
> official voting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: LogoContest Process & Timeline ... was: Re: [Solr Wiki] Update of "LogoContest" by HossMan

2008-11-21 Thread Ryan McKinley


I updated: http://wiki.apache.org/solr/LogoContest
with the list I think are voteable.

I took the liberty to drop:
 https://issues.apache.org//jira/secure/attachment/12394316/solr-solid-R.png
since I asked for it and the original is nicer (i think)

I also swapped:
 https://issues.apache.org//jira/secure/attachment/12394367/solr2_maho.png
into:
 https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png

if you see anything missing from the voting list, or something that  
should be removed/modified -- fix it soon!


ryan


On Nov 21, 2008, at 8:07 PM, Chris Hostetter wrote:



Ryan, are you still volunteering for this? :)

: Date: Wed, 1 Oct 2008 17:10:28 -0400
: From: Ryan McKinley
: Subject: Fwd: LogoContest Process & Timeline ... was: Re: [Solr  
Wiki] Update

: of "LogoContest" by HossMan

: Rather then voting directly off the JIRA page -- When the  
submissions

: are "closed", I suggest we build a special page for the contest and
: only put 'final drafts' on that.   This page will be passed by the
: solr-user@ list before official voting begins.  I'll volunteer to
: build this page.


-Hoss

[jira] Updated: (SOLR-84) Logo Contests

2008-11-21 Thread Ryan McKinley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-84:
--

Attachment: solr2_maho-vote.png

this is derived from 'solr2_maho.png'  I'm posting it so we have something to 
vote on

> Logo Contests
> -
>
> Key: SOLR-84
> URL: https://issues.apache.org/jira/browse/SOLR-84
> Project: Solr
>  Issue Type: Improvement
>Reporter: Bertrand Delacretaz
>Priority: Minor
> Attachments: apache-solr-004.png, apache_soir_001.ai.zip, 
> apache_soir_001.jpg, apache_solr_a_blue.jpg, apache_solr_a_red.jpg, 
> apache_solr_b_blue.jpg, apache_solr_b_red.jpg, apache_solr_burning.png, 
> apache_solr_c_blue.jpg, apache_solr_c_blue_shirt.jpg, apache_solr_c_red.jpg, 
> apache_solr_contour.png, apache_solr_d_blue.jpg, apache_solr_d_red.jpg, 
> apache_solr_sun.png, logo-grid.jpg, logo-solr-d.jpg, logo-solr-e.jpg, 
> logo-solr-source-files-take2.zip, logo_remake.jpg, logo_remake.svg, 
> solr-84-source-files.zip, solr-circle-grad.png, solr-f.jpg, 
> solr-greyscale.png, solr-logo-20061214.jpg, solr-logo-20061218.JPG, 
> solr-logo-20070124.JPG, solr-logo.jpg, solr-logo.png, solr-nick.gif, 
> solr-solid-R.png, solr-solid.png, solr.jpg, solr.png, solr.s1.jpg, 
> solr.s2.jpg, solr.s3.jpg, solr.s4.jpg, solr.s5.jpg, solr.svg, 
> solr2_maho-vote.png, solr2_maho.png, solr2_maho_impression.png, 
> solr3_maho.png, solr_attempt.jpg, solr_attempt2.jpg, solr_sp.png, 
> solrlogo.jpg, solrlogo2.jpg, sslogo-solr-70s.png, sslogo-solr-classic.png, 
> sslogo-solr-dance.png, sslogo-solr-fiesta.png, sslogo-solr-finder2.0.png
>
>
> This issue was original a scratch pad for various ideas for new Logos.  It is 
> now being used as a repository for submissions for the Solr Logo Contest...
>http://wiki.apache.org/solr/LogoContest
> Note that many of the images currently attached are not eligible for the 
> contest since they do not meet the official guidelines for new Apache project 
> logos (in particular that the full project name "Apache Solr" must be 
> included in the Logo).  Only eligible attachments will be included in the 
> official voting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-842) Better error handling for DIH

2008-11-21 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649895#action_12649895
 ] 

Noble Paul commented on SOLR-842:
-

Lance , could you paste a sample data-config and explain the usecase . 

> Better error handling for DIH
> -
>
> Key: SOLR-842
> URL: https://issues.apache.org/jira/browse/SOLR-842
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-842.patch, SOLR-842.patch, SOLR-842.patch, 
> SOLR-842.patch
>
>
> Currently DIH fails completely on any error. We must have better control on 
> error behavior
> mail thread: http://markmail.org/message/xvfbfaskfmlj2pnm
> an entity can have an attribute {{onError}} the values can be {{abort, 
> continue,skip}}
> abort is the default . It aborts the import. continue or skip does not fail 
> the import it continues from there. skip skips all rows in an xml (only if 
> stream != true)if there is an error in xml but continues with the next xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: LogoContest Process & Timeline ... was: Re: [Solr Wiki] Update of "LogoContest" by HossMan

2008-11-21 Thread Ryan McKinley


actually looking over this now...
The bit I was worried about was having to click on each icon to see  
the non-scaled version.


I think we can use the wiki page just fine.  We should update the list  
of 'valid' entries and make sure everyone agrees on what is there  
before starting a vote.

I'll take a shot at that now.

On Nov 21, 2008, at 8:07 PM, Chris Hostetter wrote:



Ryan, are you still volunteering for this? :)

: Date: Wed, 1 Oct 2008 17:10:28 -0400
: From: Ryan McKinley
: Subject: Fwd: LogoContest Process & Timeline ... was: Re: [Solr  
Wiki] Update

: of "LogoContest" by HossMan

: Rather then voting directly off the JIRA page -- When the  
submissions

: are "closed", I suggest we build a special page for the contest and
: only put 'final drafts' on that.   This page will be passed by the
: solr-user@ list before official voting begins.  I'll volunteer to
: build this page.


-Hoss

Re: Motivation for white space after entities in HTMLStripReader

2008-11-21 Thread Grant Ingersoll

It is an attempt at making things work properly with the highlighter  
(such that offsets are correct).  I believe it works most of the time,  
but there still might be a few issues, check JIRA.


-Grant

On Nov 21, 2008, at 5:29 PM, Dawid Weiss wrote:



Hi folks. What's the motivation to add exactly the number of white  
spaces after an entity declaration in HTMLStripReader? It basically  
looks like this:


"lód"

(UTF: lód, "ice" in Polish) is translated into:

"ló   d"

This happens both with numeric entities and named entities. Needless  
to say, these added spaces in the character stream do no good as  
they effectively split a single term "lód" into two meaningless  
terms "l" and "d".


I can fix this in the code easily, but it looks like it was  
intentional, so before I write test cases and commit a JIRA issue I  
would like to understand what the original reasons might have been  
(I really don't see anything this would be useful for). Apologies if  
I'm being dim here.


Dawid


--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: LogoContest Process & Timeline ... was: Re: [Solr Wiki] Update of "LogoContest" by HossMan

2008-11-21 Thread Ryan McKinley


sure.  I'll get something up by tomorrow...


On Nov 21, 2008, at 8:07 PM, Chris Hostetter wrote:



Ryan, are you still volunteering for this? :)

: Date: Wed, 1 Oct 2008 17:10:28 -0400
: From: Ryan McKinley
: Subject: Fwd: LogoContest Process & Timeline ... was: Re: [Solr  
Wiki] Update

: of "LogoContest" by HossMan

: Rather then voting directly off the JIRA page -- When the  
submissions

: are "closed", I suggest we build a special page for the contest and
: only put 'final drafts' on that.   This page will be passed by the
: solr-user@ list before official voting begins.  I'll volunteer to
: build this page.


-Hoss

[jira] Commented: (SOLR-842) Better error handling for DIH

2008-11-21 Thread Lance Norskog (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649888#action_12649888
 ] 

Lance Norskog commented on SOLR-842:


Wow!

I just found another case for loop control: receiving no documents in a loop.

My test case is that to fetch subsequent pages of results (first 40, next 40, 
etc.) from a search API I could not use any value returned in the last request. 
I had to make an XML file giving the "start 0, start 40, start 80" sequence. I 
drove an RSS feed input with this as an outer loop.

Now, suppose I have 100 requests in the file but this particular search only 
has 20 results. The second time I do the search I get no documents: now I want 
to break out of my driving XML file loop. With the current DIH i will send 
another 98 search requests that will all fail.

So, two features here:
1) to skip when there are no documents.
2) to end the next outer loop.

"break to entity X" would be the most flexible - you could break out three 
loops if you want. This is the same as "break to label" in Java or C.

Thanks for your time,

Lance (the instigator)

> Better error handling for DIH
> -
>
> Key: SOLR-842
> URL: https://issues.apache.org/jira/browse/SOLR-842
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-842.patch, SOLR-842.patch, SOLR-842.patch, 
> SOLR-842.patch
>
>
> Currently DIH fails completely on any error. We must have better control on 
> error behavior
> mail thread: http://markmail.org/message/xvfbfaskfmlj2pnm
> an entity can have an attribute {{onError}} the values can be {{abort, 
> continue,skip}}
> abort is the default . It aborts the import. continue or skip does not fail 
> the import it continues from there. skip skips all rows in an xml (only if 
> stream != true)if there is an error in xml but continues with the next xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-856) Suport for "Accept-Encoding : gzip" in SolrDispatchFilter

2008-11-21 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649874#action_12649874
 ] 

Hoss Man commented on SOLR-856:
---

bq. Now I am more or less convinced that we must not add a filter. We should 
have enough documentation on how these things are efficiently handled in each 
of the containers.

If that's the consensus then i'm okay with that, but my personal preference 
would be to do it in a container agnostic manner (by explicitly using a Filter 
that does this in the stock web.xml) so that it works well for all users, 
regardless of container, out of the box.  users who are knowledgeable about 
java, servlet containers, load balancers, etc... can then comment that filter 
out of the web.xml (or replace it with something else)

punting on this to the servlet container puts a burden on novice users that 
seems easily avoidable.

But it's not really that big of a deal to me -- it's an optimization that's 
only really useful in low bandwidth situations anyway, so if it's something 
that's going to make a difference to people, it's probably fine to make them 
figure out the best way to turn it on in their specific situation.

So  is this a "Won't Fix" situation?

> Suport for "Accept-Encoding : gzip" in SolrDispatchFilter
> -
>
> Key: SOLR-856
> URL: https://issues.apache.org/jira/browse/SOLR-856
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
> Attachments: SOLR-856.patch
>
>
> If the client sends an Accept-Encoding : gzip header then SolrDispatchFilter 
> should respect that and send back data as zipped

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Fwd: LogoContest Process & Timeline ... was: Re: [Solr Wiki] Update of "LogoContest" by HossMan

2008-11-21 Thread Chris Hostetter


Ryan, are you still volunteering for this? :)

: Date: Wed, 1 Oct 2008 17:10:28 -0400
: From: Ryan McKinley
: Subject: Fwd: LogoContest Process & Timeline ... was: Re: [Solr Wiki] Update
: of "LogoContest" by HossMan

: Rather then voting directly off the JIRA page -- When the submissions
: are "closed", I suggest we build a special page for the contest and
: only put 'final drafts' on that.   This page will be passed by the
: solr-user@ list before official voting begins.  I'll volunteer to
: build this page.


-Hoss

Motivation for white space after entities in HTMLStripReader

2008-11-21 Thread Dawid Weiss



Hi folks. What's the motivation to add exactly the number of white spaces after 
an entity declaration in HTMLStripReader? It basically looks like this:


"lód"

(UTF: lód, "ice" in Polish) is translated into:

"ló   d"

This happens both with numeric entities and named entities. Needless to say, 
these added spaces in the character stream do no good as they effectively split 
a single term "lód" into two meaningless terms "l" and "d".


I can fix this in the code easily, but it looks like it was intentional, so 
before I write test cases and commit a JIRA issue I would like to understand 
what the original reasons might have been (I really don't see anything this 
would be useful for). Apologies if I'm being dim here.


Dawid

[jira] Updated: (SOLR-538) CopyField maxLength property

2008-11-21 Thread Chris Harris (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Harris updated SOLR-538:
--

Attachment: SOLR-538.patch

Small change to bring Lars' 2008-11-08 version of SOLR-538.patch in sync with 
trunk r719187

> CopyField maxLength property
> 
>
> Key: SOLR-538
> URL: https://issues.apache.org/jira/browse/SOLR-538
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Nicolas Dessaigne
>Priority: Minor
> Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, 
> SOLR-538-for-1.3.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, 
> SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/[EMAIL 
> PROTECTED]/msg09807.html), the objective of this task is to add a maxLength 
> property to the CopyField "command". This property simply limits the number 
> of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index 
> contains big documents.
> Example :
> 
> This approach has also the advantage of limiting the index size for large 
> documents (the original text field does not need to be stored and to have 
> term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-799) Add support for hash based exact/near duplicate document handling

2008-11-21 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649826#action_12649826
 ] 

Mark Miller commented on SOLR-799:
--

bq. There's probably no need for a separate test solrconfig-deduplicate.xml if 
all it adds is an update processor. Tests could just explicitly specify the 
update handler on updates.

Now that I look to fix this, I am not understanding - I don't need to change 
the update handler, I need to change the update chain...I am not seeing how 
that can be done dynamically...is it possible? If not I think I need the config 
xml.

> Add support for hash based exact/near duplicate document handling
> -
>
> Key: SOLR-799
> URL: https://issues.apache.org/jira/browse/SOLR-799
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Mark Miller
>Priority: Minor
> Attachments: SOLR-799.patch, SOLR-799.patch, SOLR-799.patch
>
>
> Hash based duplicate document detection is efficient and allows for blocking 
> as well as field collapsing. Lets put it into solr. 
> http://wiki.apache.org/solr/Deduplication

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Classloader and SolrResourceLoader fun

2008-11-21 Thread Chris Hostetter


: Although, it still does seem weird that two SolrResourceLoader classes get
: created.

I think that's because the CoreContainer always has one SolrResourceLoader 
for the main instanceDir (ie: where it finds solr.xml, which might have a 
sharedLib) and then each SolrCore has one.

But that's just a guess ... CoreContainer constructor semantics are kind 
of a mess, it's hard to tell what's going on in there (three public 
constructors, none of which delegate to each other or a common 
initialization method)


-Hoss

Re: Case sensitive search problem in Solr 1.2

2008-11-21 Thread Ryan McKinley



Whenever I am searching with the words "EarthWatch" or "earthwatch" or
"EarthWatch" or "eArthWatch". I am getting the different results for  
each
search respectively. I am not able to understand why this is  
happening?


I think you want to apply the LowerCaseFilterFactory:
 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2b63862c67c6c3a776b465bc8dec45db74ee05d2

Also, mess around with the /admin/analysis.jsp page to get a sense how  
your fields getting tokenized.


This should work in 1.2 or 1.3

ryan

Logging in SOLR 1.3

2008-11-21 Thread PravinDabhade


Is there any way to log the content which are get Indexed  in log file.

-- 
View this message in context: 
http://www.nabble.com/Logging-in-SOLR-1.3-tp20621689p20621689.html
Sent from the Solr - Dev mailing list archive at Nabble.com.

Case sensitive search problem in Solr 1.2

2008-11-21 Thread Tushar_Gandhi


Hi,
 I am using Solr 1.2. I am facing a problem of the case sensitive
behavior of the Solr 1.2.
Whenever I am searching with the words "EarthWatch" or "earthwatch" or
"EarthWatch" or "eArthWatch". I am getting the different results for each
search respectively. I am not able to understand why this is happening?
   I want to solve this problem such a way that search will become case
insensitive and I will get same result for any combination of capital and
small letters.
 Upgrade of solr1.2 to Solr1.3 will solve my problem?

Help me.
Thanks & Regards,
Tushar Gandhi.
  

-- 
View this message in context: 
http://www.nabble.com/Case-sensitive-search-problem-in-Solr-1.2-tp20620776p20620776.html
Sent from the Solr - Dev mailing list archive at Nabble.com.

Re: [jira] Updated: (SOLR-857) Memory Leak during the indexing of large xml files

2008-11-21 Thread Mark Miller


How many unique fields do all of the xml files contain (even approx)?


Ruben Jimenez (JIRA) wrote:

 [ 
https://issues.apache.org/jira/browse/SOLR-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruben Jimenez updated SOLR-857:
---

Attachment: solr.zip

I did some poking around and think I pinpointed the source of the memory 
problems.  I took Bill's advice to look for a large HashMap so I walked through 
each of them in the heapdump and found that there were 16 rather large 
HashMaps.  Each of these had a size of over 120,000.  Upon further inspection I 
also found that these 16 Maps seem to be three distinct Maps.  I came to this 
conclusion by looking at the first item found in each to group them initially 
and then confirmed this by choosing two additional random locations per group 
to verify the each map location contained the same object.  See 
FieldInfoSample.PNG to see a sample.

This may not be related but I did a Google Search for lucene fieldinfos leak 
2008 and the following came up:  
http://mail-archives.apache.org/mod_mbox/lucene-java-commits/200809.mbox/[EMAIL 
PROTECTED]

Assuming I'm unable to find a way to reproduce this error without a rather 
large number of these files should I just start zipping them and uploading one 
at a time?

  

Memory Leak during the indexing of large xml files
--

Key: SOLR-857
URL: https://issues.apache.org/jira/browse/SOLR-857
Project: Solr
 Issue Type: Bug
   Affects Versions: 1.3
Environment: Verified on Ubuntu 8.0.4 (1.7GB RAM, 2.4GHz dual core) and 
Windows XP (2GB RAM, 2GHz pentium) both with a Java5 SDK
   Reporter: Ruben Jimenez
Attachments: OQ_SOLR_1.xml.zip, schema.xml, solr.zip, 
solr256MBHeap.jpg


While indexing a set of SOLR xml files that contain 5000 document adds within 
them and are about 30MB each, SOLR 1.3 seems to continually use more and more 
memory until the heap is exhausted, while the same files are indexed without 
issue with SOLR 1.2.
Steps used to reproduce.
1 - Download SOLR 1.3
2 - Modify example schema.xml to match fields required
3 - start example server with following command java -Xms512m -Xmx1024m 
-XX:MaxPermSize=128m -jar start.jar
4 - Index files as follow java -Xmx128m -jar .../examples/exampledocs/post.jar 
*.xml
Directory with xml files contains about 100 xml files each of about 30MB each.  
While indexing after about the 25th file SOLR 1.3 runs out of memory, while 
SOLR 1.2 is able to index the entire set of files without any problems.

Solr nightly build failure

2008-11-21 Thread solr-dev


init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build
[mkdir] Created dir: /tmp/apache-solr-nightly/build/web

compile-common:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/common
[javac] Compiling 39 source files to /tmp/apache-solr-nightly/build/common
[javac] Note: 
/tmp/apache-solr-nightly/src/java/org/apache/solr/common/util/FastInputStream.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/core
[javac] Compiling 370 source files to /tmp/apache-solr-nightly/build/core
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile-solrj-core:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/client/solrj
[javac] Compiling 27 source files to 
/tmp/apache-solr-nightly/build/client/solrj
[javac] Note: 
/tmp/apache-solr-nightly/client/java/solrj/src/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile-solrj:
[javac] Compiling 2 source files to 
/tmp/apache-solr-nightly/build/client/solrj

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 129 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

junit:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results
[junit] Running org.apache.solr.BasicFunctionalityTest
[junit] Tests run: 19, Failures: 0, Errors: 0, Time elapsed: 16.504 sec
[junit] Running org.apache.solr.ConvertedLegacyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.554 sec
[junit] Running org.apache.solr.DisMaxRequestHandlerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.87 sec
[junit] Running org.apache.solr.EchoParamsTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.124 sec
[junit] Running org.apache.solr.OutputWriterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.169 sec
[junit] Running org.apache.solr.SampleTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.866 sec
[junit] Running org.apache.solr.SolrInfoMBeanTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.671 sec
[junit] Running org.apache.solr.TestDistributedSearch
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 21.673 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.386 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.369 sec
[junit] Running org.apache.solr.analysis.EnglishPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.017 sec
[junit] Running org.apache.solr.analysis.HTMLStripReaderTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.703 sec
[junit] Running org.apache.solr.analysis.LengthFilterTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.809 sec
[junit] Running org.apache.solr.analysis.TestBufferedTokenStream
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.044 sec
[junit] Running org.apache.solr.analysis.TestCapitalizationFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.931 sec
[junit] Running org.apache.solr.analysis.TestCharFilter
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.33 sec
[junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.868 sec
[junit] Running org.apache.solr.analysis.TestKeepWordFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.843 sec
[junit] Running org.apache.solr.analysis.TestMappingCharFilter
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.36 sec
[junit] Running org.apache.solr.analysis.TestMappingCharFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.361 sec
[junit] Running org.apache.solr.analysis.TestPatternR

Build failed in Hudson: Solr-trunk #631

2008-11-21 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Solr-trunk/631/changes

Changes:

[shalin] SOLR-465 -- Upgraded to Lucene 2.9-dev (r719351)

[ryan] SOLR-793: fix possible deadlock

[billa] SOLR-830: Use perl regex to improve accuracy of finding latest snapshot 
in snappuller

[billa] SOLR-346: Use perl regex to improve accuracy of finding latest snapshot 
in snapinstaller

--
[...truncated 1817 lines...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.062 sec
[junit] Running org.apache.solr.analysis.TestKeepWordFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.781 sec
[junit] Running org.apache.solr.analysis.TestMappingCharFilter
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.392 sec
[junit] Running org.apache.solr.analysis.TestMappingCharFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.409 sec
[junit] Running org.apache.solr.analysis.TestPatternReplaceFilter
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.439 sec
[junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.8 sec
[junit] Running org.apache.solr.analysis.TestPhoneticFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.92 sec
[junit] Running org.apache.solr.analysis.TestRemoveDuplicatesTokenFilter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.149 sec
[junit] Running org.apache.solr.analysis.TestSynonymFilter
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.89 sec
[junit] Running org.apache.solr.analysis.TestSynonymMap
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 2.206 sec
[junit] Running org.apache.solr.analysis.TestTrimFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.211 sec
[junit] Running org.apache.solr.analysis.TestWordDelimiterFilter
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 12.345 sec
[junit] Running org.apache.solr.common.SolrDocumentTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.352 sec
[junit] Running org.apache.solr.common.params.SolrParamTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.383 sec
[junit] Running org.apache.solr.common.util.ContentStreamTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.418 sec
[junit] Running org.apache.solr.common.util.IteratorChainTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.337 sec
[junit] Running org.apache.solr.common.util.NamedListTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.333 sec
[junit] Running org.apache.solr.common.util.TestNamedListCodec
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.357 sec
[junit] Running org.apache.solr.common.util.TestXMLEscaping
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.349 sec
[junit] Running org.apache.solr.core.AlternateDirectoryTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.27 sec
[junit] Running org.apache.solr.core.RequestHandlersTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.575 sec
[junit] Running org.apache.solr.core.ResourceLoaderTest
[junit] Tests run: 2, Failures: 0, Errors: 1, Time elapsed: 0.425 sec
[junit] Test org.apache.solr.core.ResourceLoaderTest FAILED
[junit] Running org.apache.solr.core.SOLR749Test
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.749 sec
[junit] Running org.apache.solr.core.SolrCoreTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 11.676 sec
[junit] Running org.apache.solr.core.TestArbitraryIndexDir
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.982 sec
[junit] Running org.apache.solr.core.TestBadConfig
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.428 sec
[junit] Running org.apache.solr.core.TestConfig
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 5.723 sec
[junit] Running org.apache.solr.core.TestJmxIntegration
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 4.386 sec
[junit] Running org.apache.solr.core.TestJmxMonitoredMap
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.005 sec
[junit] Running org.apache.solr.core.TestQuerySenderListener
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.139 sec
[junit] Running org.apache.solr.core.TestSolrDeletionPolicy1
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 11.636 sec
[junit] Running org.apache.solr.core.TestSolrDeletionPolicy2
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.082 sec
[junit] Running org.apache.solr.handler.AnalysisRequestHandlerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.611 sec
[juni

[jira] Commented: (SOLR-842) Better error handling for DIH

[jira] Updated: (SOLR-84) Logo Contests

Re: LogoContest Process & Timeline ... was: Re: [Solr Wiki] Update of "LogoContest" by HossMan

[jira] Updated: (SOLR-84) Logo Contests

[jira] Commented: (SOLR-842) Better error handling for DIH

Re: LogoContest Process & Timeline ... was: Re: [Solr Wiki] Update of "LogoContest" by HossMan

Re: Motivation for white space after entities in HTMLStripReader

Re: LogoContest Process & Timeline ... was: Re: [Solr Wiki] Update of "LogoContest" by HossMan

[jira] Commented: (SOLR-842) Better error handling for DIH

[jira] Commented: (SOLR-856) Suport for "Accept-Encoding : gzip" in SolrDispatchFilter

Re: Fwd: LogoContest Process & Timeline ... was: Re: [Solr Wiki] Update of "LogoContest" by HossMan

Motivation for white space after entities in HTMLStripReader

[jira] Updated: (SOLR-538) CopyField maxLength property

[jira] Commented: (SOLR-799) Add support for hash based exact/near duplicate document handling

Re: Classloader and SolrResourceLoader fun

Re: Case sensitive search problem in Solr 1.2

Logging in SOLR 1.3

Case sensitive search problem in Solr 1.2

Re: [jira] Updated: (SOLR-857) Memory Leak during the indexing of large xml files

Solr nightly build failure

Build failed in Hudson: Solr-trunk #631

21 matches

Site Navigation

Mail list logo

Footer information