[jira] Resolved: (SOLR-891) A Clobtransformer to read strings from Clob

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-891.


Resolution: Fixed
  Assignee: Shalin Shekhar Mangar

Committed revision 725926.

Thanks Noble!

> A Clobtransformer to read strings from Clob
> ---
>
> Key: SOLR-891
> URL: https://issues.apache.org/jira/browse/SOLR-891
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-891.patch, SOLR-891.patch, SOLR-891.patch
>
>
> Clob cannot be directly be consumed by Solr. So JdbcDataSource can translate 
> it to Strings before processing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] LOGO

2008-12-11 Thread Shalin Shekhar Mangar
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg
https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg
https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png

Rays > sun
Red > blue
Diversity > similar logos

:-)

On Thu, Dec 11, 2008 at 7:21 AM, Ryan McKinley  wrote:

> This thread is for solr committers to list the top 4 logos preferences from
> the community logo contest.  As a guide, we should look at:
> http://people.apache.org/~ryan/solr-logo-results.html
>
> The winner will be tabulated using "instant runoff voting" -- if this
> happens to result in a tie, the winner will be picked by the 'Single
> transferable vote'
> http://en.wikipedia.org/wiki/Instant-runoff_voting
> http://en.wikipedia.org/wiki/Single_transferable_vote
>
> To cast a valid vote, you *must* include 4 options.
>
> ryan




-- 
Regards,
Shalin Shekhar Mangar.


[jira] Updated: (SOLR-821) replication must allow copying conf file in a different name to slave

2008-12-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-821:


Attachment: SOLR-821.patch

it had an obvious bug. But still untested

> replication must allow copying conf file in a different name to slave
> -
>
> Key: SOLR-821
> URL: https://issues.apache.org/jira/browse/SOLR-821
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-821.patch, SOLR-821.patch
>
>
> It is likely that a file is different in master and slave. for instance 
> replicating solrconfig.xml is not possible with the current config if master 
> and slave has diffferent solrconfig.xml (which is always true)
> We can add an alias feature in the confFiles as
> {code}
>  name="confFiles">slave_solrconfig.xml:solrconfig.xml,slave_schema.xml:schema.xml
> {code}
> This means that the file slave_solrconfig.xml should be copied to the slave 
> as solrconfig.xml and slave_schema.xml must be saved to slave as schema.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: ant example, tika

2008-12-11 Thread Chris Hostetter

: Ignoring the JSP dilemma... DIH's JAR doesn't need to be in the WAR, but can
: ship in a lib/ directory outside the WAR and come in as a plugin.  And Solr
: can ship with all of the contribs wired in to a kitchen-sink example
: configuration.
: 
: There is merit to keeping Solr's WAR and core to the most minimal size
: possible and leveraging the plugin capability to let users reduce the
: footprint and un-used parts.

+1 ... there really shouldn't be any contrib's in the war.  If we're 
worried that asking people to put the DIH jar in the plugin directory is 
too complicated for new users to understand (and i really can't believe 
that: if someone can understand ow to write a data-config.xml then copying 
a jar file should be trivial) we can make a "solr-kitchen-sink.war" that 
contains *every* contrib and *every* dependency in addition to the regular 
one.

But even that seems less useful in general then having a more robust set 
of examples -- where each one gets a lib directory populated with just the 
plugins it's demonstrating (and possibly a "kitchen-sink" example showing 
off all of them)

Honestly: I didn't even realize DIH was adding itself to the war untill 
recently, but then again i've been a little out of touch.


As for the JSP issue: personally, I think ideally we'd eliminate the JSPs 
completley (regardless of wether they are in a contrib or not).  Almost 
every admin JSP we have now could be done as request handler with an XSLT 
directive for the browser to apply (which gives us hte added bonus of 
making the data on all those pages easily machine parsable), but if 
"contribA" has need of some particularly complex server side presentation 
processing then making contribA depend on the velocity contrib seems like 
the best way to go.


-Hoss



Re: [VOTE] LOGO

2008-12-11 Thread Yonik Seeley
To try and avoid double-counting (the same person voting for a red
version, then a blue version, etc), I went back to the original vote:

21 people had one of the apache_solr_[abc]_[red/blue].jpg varianst
ahead of sslogo-solr-finder2.0.png
3 people had sslogo-solr-finder2.0.png ahead of apache_solr_c_red.jpg variants.

Due to that, I'm revising my vote to remove sslogo-solr-finder2.0.png
(if that makes a difference).
If one of the apache_solr_c_red.jpg variants win, we can still declare
a winner, but go back to the community for more guidance on the exact
version they prefer.

-Yonik

On Thu, Dec 11, 2008 at 8:32 PM, Yonik Seeley  wrote:
https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg
https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg
https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
https://issues.apache.org/jira/secure/attachment/12394263/apache_solr_a_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
https://issues.apache.org/jira/secure/attachment/12394265/apache_solr_b_blue.jpg


Re: [VOTE] LOGO

2008-12-11 Thread Koji Sekiguchi

https://issues.apache.org/jira/secure/attachment/12392306/apache_solr_sun.png
https://issues.apache.org/jira/secure/attachment/12394165/solr-logo.png
https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png
https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png

koji

Ryan McKinley wrote:
This thread is for solr committers to list the top 4 logos preferences 
from the community logo contest.  As a guide, we should look at:

http://people.apache.org/~ryan/solr-logo-results.html

The winner will be tabulated using "instant runoff voting" -- if this 
happens to result in a tie, the winner will be picked by the 'Single 
transferable vote'

http://en.wikipedia.org/wiki/Instant-runoff_voting
http://en.wikipedia.org/wiki/Single_transferable_vote

To cast a valid vote, you *must* include 4 options.

ryan




Re: [VOTE] LOGO

2008-12-11 Thread Yonik Seeley
https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg
https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg
https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
https://issues.apache.org/jira/secure/attachment/12394263/apache_solr_a_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
https://issues.apache.org/jira/secure/attachment/12394265/apache_solr_b_blue.jpg


Rational:
- There was a strong preference for capital letters over small in "Apache Solr"
- There was a strong preference for "rays" over "sun"
- There was a strong preference for red over blue
- There seemed to be a preference for "rays or sun" over "finder"

-Yonik


Re: [VOTE] LOGO

2008-12-11 Thread Chris Hostetter

: To cast a valid vote, you *must* include 4 options.

As discussed, I believe ryan ment "at least 4" 

https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
https://issues.apache.org/jira/secure/attachment/12394165/solr-logo.png
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png
https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg

-Hoss



Re: [VOTE] LOGO

2008-12-11 Thread Ryan McKinley
Agreed -- assuming there is a clear winner none of this is an issue.   
My rationale for saying we need to have 4 votes is to be sure that we  
can go through 4 rounds and have everyone have a vote in each round.   
If someone only put one preference and that is eliminated, there are  
fewer votes to tally for the next round.


But i really don't care -- go ahead and vote for as many or as few as  
you like...  with instant run off it should not matter (however if you  
vote for too few and they are eliminated early you no longer have a  
vote)




On Dec 11, 2008, at 4:28 PM, Mike Klaas wrote:

I agree.  I don't see why there needs to be  a minimum or maximum  
number of logos to rank per vote.


-Mike

On 10-Dec-08, at 7:52 PM, Yonik Seeley wrote:


Doesn't limiting to top 4 defeat the purpose of using STV to overcome
splitting-the-vote?
Seems like we should rank the whole list (or all that an individual
finds acceptable)

-Yonik

On Wed, Dec 10, 2008 at 8:51 PM, Ryan McKinley   
wrote:
This thread is for solr committers to list the top 4 logos  
preferences from

the community logo contest.  As a guide, we should look at:
http://people.apache.org/~ryan/solr-logo-results.html

The winner will be tabulated using "instant runoff voting" -- if  
this

happens to result in a tie, the winner will be picked by the 'Single
transferable vote'
http://en.wikipedia.org/wiki/Instant-runoff_voting
http://en.wikipedia.org/wiki/Single_transferable_vote

To cast a valid vote, you *must* include 4 options.

ryan






Re: [VOTE] LOGO

2008-12-11 Thread Mike Klaas
I agree.  I don't see why there needs to be  a minimum or maximum  
number of logos to rank per vote.


-Mike

On 10-Dec-08, at 7:52 PM, Yonik Seeley wrote:


Doesn't limiting to top 4 defeat the purpose of using STV to overcome
splitting-the-vote?
Seems like we should rank the whole list (or all that an individual
finds acceptable)

-Yonik

On Wed, Dec 10, 2008 at 8:51 PM, Ryan McKinley   
wrote:
This thread is for solr committers to list the top 4 logos  
preferences from

the community logo contest.  As a guide, we should look at:
http://people.apache.org/~ryan/solr-logo-results.html

The winner will be tabulated using "instant runoff voting" -- if this
happens to result in a tie, the winner will be picked by the 'Single
transferable vote'
http://en.wikipedia.org/wiki/Instant-runoff_voting
http://en.wikipedia.org/wiki/Single_transferable_vote

To cast a valid vote, you *must* include 4 options.

ryan




[jira] Created: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2008-12-11 Thread Tom Burton-West (JIRA)
Port of Nutch  CommonGrams filter to Solr
-

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor


Phrase queries containing common words are extremely slow.  We are reluctant to 
just use stop words due to various problems with false hits and some things 
becoming impossible to search with stop words turned on. (For example "to be or 
not to be", "the who", "man in the moon" vs "man on the moon" etc.)  

Several postings regarding slow phrase queries have suggested using the 
approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
take this on.

It should be possible to port the Nutch CommonGrams code to Solr  and create a 
suitable Solr FilterFactory so that it could be used in Solr by listing it in 
the Solr schema.xml.

"Construct n-grams for frequently occuring terms and phrases while indexing. 
Optimize phrase queries to use the n-grams. Single terms are still indexed too, 
with n-grams overlaid."
http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: support for multi-select facets

2008-12-11 Thread Yonik Seeley
On Thu, Dec 11, 2008 at 2:54 PM, Shalin Shekhar Mangar
 wrote:
> On Thu, Dec 11, 2008 at 11:54 PM, Yonik Seeley  wrote:
>
>>
>> Option #3: tag parts of a request using "local params"
>> q=foo&fq=date:[1 TO 2]&fq=securityfilter:42&fq={!tag=type}type:(pdf OR
>> html)&facet.field=type
>>&facet.field={!exclude=type}author
>>
>> So here, one fq is tagged with "type" {!tag=type}
>> and then excluded when faceting on author.
>> Upsides:
>>  - don't necessarily need to repeat and re-parse params since they
>> are referenced by name/tag.
>>  - tagging is a generic mechanism that can be used for other functionality.
>>
>> Thoughts?
>
>
> I like this idea. A few questions:
>
> The tag is only used for the current request, right?

Right.

> How will this look when we want to exclude more than one filter? Will it be
> like fq={!exclude=filter1,filter2} ?

Yeah, exactly what I was thinking.

> Is this local param a syntax we are inventing or is it something which
> already exists?

Already exists, and is utilized in the QParser framework:
http://lucene.apache.org/solr/api/org/apache/solr/search/DisMaxQParserPlugin.html

-Yonik


Re: support for multi-select facets

2008-12-11 Thread Shalin Shekhar Mangar
On Thu, Dec 11, 2008 at 11:54 PM, Yonik Seeley  wrote:

>
> Option #3: tag parts of a request using "local params"
> q=foo&fq=date:[1 TO 2]&fq=securityfilter:42&fq={!tag=type}type:(pdf OR
> html)&facet.field=type
>&facet.field={!exclude=type}author
>
> So here, one fq is tagged with "type" {!tag=type}
> and then excluded when faceting on author.
> Upsides:
>  - don't necessarily need to repeat and re-parse params since they
> are referenced by name/tag.
>  - tagging is a generic mechanism that can be used for other functionality.
>
> Thoughts?


I like this idea. A few questions:

The tag is only used for the current request, right?

How will this look when we want to exclude more than one filter? Will it be
like fq={!exclude=filter1,filter2} ?

Is this local param a syntax we are inventing or is it something which
already exists?

-- 
Regards,
Shalin Shekhar Mangar.


[jira] Commented: (SOLR-236) Field collapsing

2008-12-11 Thread Stephen Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655750#action_12655750
 ] 

Stephen Weiss commented on SOLR-236:


I'm using Ivan's patch and running into some trouble with faceting...

Basically, I can tell that faceting is happening after the collapse - because 
the facet counts are definitely lower than they would be otherwise.  For 
example, with one search, I'd have 196 results with no collapsing, I get 120 
results with collapsing - but the facet count is 119???  In other searches the 
difference is more drastic - In another search, I get 61 results without 
collapsing, 61 with collapsing, but the facet count is 39.

Looking at it for a while now, I think I can guess what the problem might be...

The incorrect counts seem to only happen when the term in question does not 
occur evenly across all duplicates of a document.  That is, multiple document 
records may exist for the same image (it's an image search engine), but each 
document will have different terms in different fields depending on the 
audience it's targeting.  So, when you collapse, the counts are lower than they 
should be because when you actually execute a search with that facet's term 
included in the query, *all* the documents after collapsing will be ones that 
have that term.

Here's an illustration:

Collapse field is "link_id", facet field is "keyword":


Doc 1:
id: 123456,
link_id: 2,
keyword: Black, Printed, Dress

Doc 2:
id: 123457,
link_id: 2,
keyword: Black, Shoes, Patent

Doc 3:
id: 123458,
link_id: 2,
keyword: Red, Hat, Felt

Doc 4:
id: 123459,
link_id:1,
keyword: Felt, Hat, Black

So, when you collapse, only two of these documents are in the result set 
(123456, 123459), and only the keywords Black, Printed, Dress, Felt, and Hat 
are counted.  The facet count for Black is 2, the facet count for Felt is 1.  
If you choose Black and add it to your query, you get 2 results (great).  
However, if you add *Felt* to your query, you get 2 results (because a 
different document for link_id 2 is chosen in that query than is in the more 
general query from which the facets are produced).

I think what needs to happen here is that all the terms for all the documents 
that are collapsed together need to be included (just once) with the document 
that gets counted for faceting.  In this example, when the document for link_id 
2 is counted, it would need to appear to the facet counter to have keywords 
Black, Printed, Dress, Shoes, Patent, Red, Hat, and Felt, as opposed to just 
Black, Printed, and Dress.


> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
> Fix For: 1.4
>
> Attachments: collapsing-patch-to-1.3.0-ivan.patch, 
> collapsing-patch-to-1.3.0-ivan_2.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



support for multi-select facets

2008-12-11 Thread Yonik Seeley
There are some categories of facets where it makes sense to allow the
selection of multiple values, and still show the counts (and the
ability to select) currently unselected values.

Here's a simple example with a multi-select facet "type", and a
traditional facet "author".

-- type --
 x pdf (32)
word (17)
 x html(46)
excel(11)

 author 
 erik (31)
 grant (27)
 yonik (14)

Currently, Solr doesn't support this well - all facets generated use
the same base doc set.

Here is what a request currently looks like:
q=foo&fq=date:[1 TO 2]&fq=securityfilter:42&fq=type:(pdf OR
html)&facet.field=author&facet.field=type

The problem of course is that the counts for "word" and "excel" would
come back as "0".  What is needed is to ignore any constraints on
"type" when faceting on that field.

Option #1: ability to specify the query/filters per-facet:
   f.type.facet.base=+foo +date:[1 TO 2] +securityfilter:42
 OR, specify all the parts as field-specific fqs for better caching
   f.type.facet.base=foo&f.type.facet.fq=date:[1 TO
2]&f.type.facet.fq=securityfilter:42
Downsides:
  - field-specific parameters don't work for facet queries, which may
also want this feature.
  - complex filters are repeated and re-parsed.

Option #2: ability to specify as a "local param" (meta-data on a parameter)
  facet.field={!base='f.type.facet.base=+foo +date:[1 TO 2]
+securityfilter:42'}type
Upsides:
  - can work for filter.query params
Downsides:
  - client needs to escape big query string
  - single "base" parameter not good for caching
  - complex filters are repeated and re-parsed.

Option #3: tag parts of a request using "local params"
q=foo&fq=date:[1 TO 2]&fq=securityfilter:42&fq={!tag=type}type:(pdf OR
html)&facet.field=type
&facet.field={!exclude=type}author

So here, one fq is tagged with "type" {!tag=type}
and then excluded when faceting on author.
Upsides:
  - don't necessarily need to repeat and re-parse params since they
are referenced by name/tag.
  - tagging is a generic mechanism that can be used for other functionality.

Thoughts?

-Yonik


RE: 1.4 Planning

2008-12-11 Thread Feak, Todd
Done.

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Thursday, December 11, 2008 6:30 AM
To: solr-dev@lucene.apache.org
Subject: Re: 1.4 Planning

Hey Todd,

Yeah, I hear you on the Lucene thing, but am not quite sure how to  
handle it.  As Hoss' has said, by us voting to release Solr, we are  
also voting to release that particular version of Lucene.  It means we  
have confidence that it will do the job for Solr.   If it's just a  
source issue, we can package up the sources into the Maven repo.  I'm  
hesitant to suggest that we package the Lucene source into the main  
distribution.  Maybe we just point people to the repo?  Can you open a  
JIRA issue against 1.4 to track this so that we don't forget to do it?

Thanks,
Grant

On Dec 10, 2008, at 5:28 PM, Feak, Todd wrote:

> One of the things about the 1.3 release that I didn't care for, was  
> the
> inclusion of a non-release version of Lucene. It was very difficult to
> track down the source code for the "dev" version that was used. Once
> Lucene had release its official version, I upgraded to that (which  
> went
> well) so that I had access to Lucene source.
>
> If Lucene is going to release relatively soon, a release after that
> might be nice.
>
> For what it's worth, I understand the difficulty of having 2 projects
> running in parallel with the dependency between them. While trying to
> improve both at the same time. I'm not trying to throw stones, just
> sharing an opinion.
>
> -Todd Feak
>
> -Original Message-
> From: Ryan McKinley [mailto:ryan...@gmail.com]
> Sent: Wednesday, December 10, 2008 1:20 PM
> To: solr-dev@lucene.apache.org
> Subject: Re: 1.4 Planning
>
> sounds good.
>
> is lucene planning a release anytime soon?  If so, is it worth
> *trying* to coordinate?
>
>
>
> On Dec 10, 2008, at 3:49 PM, Grant Ingersoll wrote:
>
>> I'd like to suggest we start thinking about 1.4 being released in
>> early January.  Here's my reasoning:
>>
>> 1. I think we all agree that 1.2 -> 1.3 was way too long
>> 2. Quarterly releases seem to be a pretty nice timeframe for people
>> such that you aren't constantly upgrading, yet you don't have to
>> wait for eternity for new features
>> 3. And here's where the rubber meets the road:  We've actually put
>> in some significant features and bug fixes, namely Java-based
>> Replication, Tika Integration, new more scalable faceting
>> implementation and on and on, not to mention Lucene improvements and
>> other bug fixes.  Read about it at
> https://svn.apache.org/repos/asf/lucene/solr/trunk/CHANGES.txt
>>
>> Here's what's currently targeted to 1.4:
>
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mod
> e=hide&sorter/order=DESC&sorter/ 
> field=priority&resolution=-1&pid=1231023
> 0&fixfor=12313351
>>
>> Since I so enjoyed doing the release last time, I volunteer to do it
>> again this time.
>>
>> Thoughts?
>>
>> -Grant
>
>

--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ













[jira] Created: (SOLR-907) Include access to Lucene Source with 1.4 release

2008-12-11 Thread Todd Feak (JIRA)
Include access to Lucene Source with 1.4 release


 Key: SOLR-907
 URL: https://issues.apache.org/jira/browse/SOLR-907
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Todd Feak
Priority: Minor


If Solr 1.4 release with a non-release version of Lucene, please include some 
way to access the exact source code for the Lucene libraries that are included 
with Solr. This could take the form of Maven2 Repo source files, a subversion 
location and revision number, including the source with the distribution, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] LOGO

2008-12-11 Thread Bill Au
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg

Bill


On Thu, Dec 11, 2008 at 12:01 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:

>
> On Dec 10, 2008, at 10:12 PM, Otis Gospodnetic wrote:
>
>>
>>
>> https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png
>>   is this one ok?
>>
>
> Seems totally fine by me.
>
>


Re: 1.4 Planning

2008-12-11 Thread Mark Miller

bq. TrieRange

We def want this. This makes range search so much more powerful. Imagine 
something like searching for time ranges in logs indexed a line per doc 
or something - this makes large scale range processing so much more doable.


We need some new field types to do the encoding and then a way to work 
the new trierangequery into the queryparser(s) right?


- Mark

Grant Ingersoll wrote:
Seems like there are some things in Lucene that we'll want to leverage 
soon, too, like the new filters coming down (TrieRange, others) but 
I'm not sure how they fit yet into Solr




On Dec 11, 2008, at 12:50 AM, Shalin Shekhar Mangar wrote:


+1

This is great! Thanks for taking the initiative, Grant.

On Thu, Dec 11, 2008 at 2:19 AM, Grant Ingersoll 
<[EMAIL PROTECTED]>wrote:



I'd like to suggest we start thinking about 1.4 being released in early
January.  Here's my reasoning:

1. I think we all agree that 1.2 -> 1.3 was way too long
2. Quarterly releases seem to be a pretty nice timeframe for people 
such

that you aren't constantly upgrading, yet you don't have to wait for
eternity for new features
3. And here's where the rubber meets the road:  We've actually put 
in some

significant features and bug fixes, namely Java-based Replication, Tika
Integration, new more scalable faceting implementation and on and 
on, not to

mention Lucene improvements and other bug fixes.  Read about it at
https://svn.apache.org/repos/asf/lucene/solr/trunk/CHANGES.txt

Here's what's currently targeted to 1.4:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310230&fixfor=12313351 



Since I so enjoyed doing the release last time, I volunteer to do it 
again

this time.

Thoughts?

-Grant









Re: 1.4 Planning

2008-12-11 Thread Yonik Seeley
On Thu, Dec 11, 2008 at 9:25 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> Seems like there are some things in Lucene that we'll want to leverage soon,
> too, like the new filters coming down (TrieRange, others) but I'm not sure
> how they fit yet into Solr

+1

I'm thinking currently thinking about FieldCacheRangeFilter type stuff for Solr.
We definitely need TrieRange in 1.4 too.

-Yonik


Re: 1.4 Planning

2008-12-11 Thread Grant Ingersoll
Seems like there are some things in Lucene that we'll want to leverage  
soon, too, like the new filters coming down (TrieRange, others) but  
I'm not sure how they fit yet into Solr




On Dec 11, 2008, at 12:50 AM, Shalin Shekhar Mangar wrote:


+1

This is great! Thanks for taking the initiative, Grant.

On Thu, Dec 11, 2008 at 2:19 AM, Grant Ingersoll  
<[EMAIL PROTECTED]>wrote:


I'd like to suggest we start thinking about 1.4 being released in  
early

January.  Here's my reasoning:

1. I think we all agree that 1.2 -> 1.3 was way too long
2. Quarterly releases seem to be a pretty nice timeframe for people  
such

that you aren't constantly upgrading, yet you don't have to wait for
eternity for new features
3. And here's where the rubber meets the road:  We've actually put  
in some
significant features and bug fixes, namely Java-based Replication,  
Tika
Integration, new more scalable faceting implementation and on and  
on, not to

mention Lucene improvements and other bug fixes.  Read about it at
https://svn.apache.org/repos/asf/lucene/solr/trunk/CHANGES.txt

Here's what's currently targeted to 1.4:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310230&fixfor=12313351

Since I so enjoyed doing the release last time, I volunteer to do  
it again

this time.

Thoughts?

-Grant







Re: 1.4 Planning

2008-12-11 Thread Grant Ingersoll

Hey Todd,

Yeah, I hear you on the Lucene thing, but am not quite sure how to  
handle it.  As Hoss' has said, by us voting to release Solr, we are  
also voting to release that particular version of Lucene.  It means we  
have confidence that it will do the job for Solr.   If it's just a  
source issue, we can package up the sources into the Maven repo.  I'm  
hesitant to suggest that we package the Lucene source into the main  
distribution.  Maybe we just point people to the repo?  Can you open a  
JIRA issue against 1.4 to track this so that we don't forget to do it?


Thanks,
Grant

On Dec 10, 2008, at 5:28 PM, Feak, Todd wrote:

One of the things about the 1.3 release that I didn't care for, was  
the

inclusion of a non-release version of Lucene. It was very difficult to
track down the source code for the "dev" version that was used. Once
Lucene had release its official version, I upgraded to that (which  
went

well) so that I had access to Lucene source.

If Lucene is going to release relatively soon, a release after that
might be nice.

For what it's worth, I understand the difficulty of having 2 projects
running in parallel with the dependency between them. While trying to
improve both at the same time. I'm not trying to throw stones, just
sharing an opinion.

-Todd Feak

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 10, 2008 1:20 PM
To: solr-dev@lucene.apache.org
Subject: Re: 1.4 Planning

sounds good.

is lucene planning a release anytime soon?  If so, is it worth
*trying* to coordinate?



On Dec 10, 2008, at 3:49 PM, Grant Ingersoll wrote:


I'd like to suggest we start thinking about 1.4 being released in
early January.  Here's my reasoning:

1. I think we all agree that 1.2 -> 1.3 was way too long
2. Quarterly releases seem to be a pretty nice timeframe for people
such that you aren't constantly upgrading, yet you don't have to
wait for eternity for new features
3. And here's where the rubber meets the road:  We've actually put
in some significant features and bug fixes, namely Java-based
Replication, Tika Integration, new more scalable faceting
implementation and on and on, not to mention Lucene improvements and
other bug fixes.  Read about it at

https://svn.apache.org/repos/asf/lucene/solr/trunk/CHANGES.txt


Here's what's currently targeted to 1.4:

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mod
e=hide&sorter/order=DESC&sorter/ 
field=priority&resolution=-1&pid=1231023

0&fixfor=12313351


Since I so enjoyed doing the release last time, I volunteer to do it
again this time.

Thoughts?

-Grant





--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












[jira] Updated: (SOLR-891) A Clobtransformer to read strings from Clob

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-891:
---

Attachment: SOLR-891.patch

# Adding Apache license header to ClobTransformer.java
# Wrap and throw IOException
# Added rudimentary javadocs

> A Clobtransformer to read strings from Clob
> ---
>
> Key: SOLR-891
> URL: https://issues.apache.org/jira/browse/SOLR-891
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Reporter: Noble Paul
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-891.patch, SOLR-891.patch, SOLR-891.patch
>
>
> Clob cannot be directly be consumed by Solr. So JdbcDataSource can translate 
> it to Strings before processing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-884) Better error checking in CachedSqlEntityProcessor

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-884:
---

  Assignee: Shalin Shekhar Mangar
  Priority: Minor  (was: Major)
Issue Type: Bug  (was: Improvement)

> Better error checking in CachedSqlEntityProcessor
> -
>
> Key: SOLR-884
> URL: https://issues.apache.org/jira/browse/SOLR-884
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-884.patch
>
>
> CachedSqlEntityProcessor does not check if the cache key is present in the 
> query results .
> Another potential check could be that both the key and the lookup value must 
> be of same type

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-884) Better error checking in CachedSqlEntityProcessor

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-884.


Resolution: Fixed

Committed revision 725684.

Thanks Noble!

> Better error checking in CachedSqlEntityProcessor
> -
>
> Key: SOLR-884
> URL: https://issues.apache.org/jira/browse/SOLR-884
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-884.patch
>
>
> CachedSqlEntityProcessor does not check if the cache key is present in the 
> query results .
> Another potential check could be that both the key and the lookup value must 
> be of same type

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-812) JDBC optimizations: setReadOnly, setMaxRows

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-812:
---

Attachment: SOLR-812.patch

Changes
# Supports the following through configuration (extra attributes in 
 element or solrconfig.xml):
## readOnly
## autoCommit
## transactionIsolation
## holdability
## maxRows
# If readOnly is specified the following are added by default (but they will be 
overridden if specified explicitly):
{code}
setAutoCommit(true);
setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);
setHoldability(ResultSet.CLOSE_CURSORS_AT_COMMIT);
{code}
# If 'start' and 'rows' are specified as request parameters, then we call 
setMaxRows(start + rows) automatically overriding value specified in 
configuration
# No changes are made unless configuration is specified, so it is backwards 
compatible.

I'd like to commit this in a day or two. We also need to add this documentation 
to the wiki page.



> JDBC optimizations: setReadOnly, setMaxRows
> ---
>
> Key: SOLR-812
> URL: https://issues.apache.org/jira/browse/SOLR-812
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 1.3
>Reporter: David Smiley
> Attachments: SOLR-812.patch
>
>
> I'm looking at the DataImport code as of Solr v1.3 and using it with Postgres 
> and very large data sets and there some improvement suggestions I have.
> 1. call setReadOnly(true) on the connection.  DIH doesn't change the data so 
> this is obvious.
> 2. call setAutoCommit(false) on the connection.   (this is needed by Postgres 
> to ensure that the fetchSize hint actually works)
> 3. call setMaxRows(X) on the statement which is to be used when the 
> dataimport.jsp debugger is only grabbing X rows.  fetchSize is just a hint 
> and alone it isn't sufficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-884) Better error checking in CachedSqlEntityProcessor

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-884:
---

  Component/s: contrib - DataImportHandler
Fix Version/s: 1.4

> Better error checking in CachedSqlEntityProcessor
> -
>
> Key: SOLR-884
> URL: https://issues.apache.org/jira/browse/SOLR-884
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Reporter: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-884.patch
>
>
> CachedSqlEntityProcessor does not check if the cache key is present in the 
> query results .
> Another potential check could be that both the key and the lookup value must 
> be of same type

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-891) A Clobtransformer to read strings from Clob

2008-12-11 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655624#action_12655624
 ] 

noble.paul edited comment on SOLR-891 at 12/11/08 4:27 AM:
---

A new Transformer ClobTransformer


the field must have aan attribute clob="true"

  was (Author: noble.paul):
A new Transformer ClobTransformer



  
> A Clobtransformer to read strings from Clob
> ---
>
> Key: SOLR-891
> URL: https://issues.apache.org/jira/browse/SOLR-891
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Reporter: Noble Paul
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-891.patch, SOLR-891.patch
>
>
> Clob cannot be directly be consumed by Solr. So JdbcDataSource can translate 
> it to Strings before processing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-891) A Clobtransformer to read strings from Clob

2008-12-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-891:


Attachment: SOLR-891.patch

A new Transformer ClobTransformer




> A Clobtransformer to read strings from Clob
> ---
>
> Key: SOLR-891
> URL: https://issues.apache.org/jira/browse/SOLR-891
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Reporter: Noble Paul
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-891.patch, SOLR-891.patch
>
>
> Clob cannot be directly be consumed by Solr. So JdbcDataSource can translate 
> it to Strings before processing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-891) A Clobtransformer to read strings from Clob

2008-12-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-891:


Comment: was deleted

> A Clobtransformer to read strings from Clob
> ---
>
> Key: SOLR-891
> URL: https://issues.apache.org/jira/browse/SOLR-891
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Reporter: Noble Paul
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-891.patch, SOLR-891.patch
>
>
> Clob cannot be directly be consumed by Solr. So JdbcDataSource can translate 
> it to Strings before processing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-891) A Clobtransformer to read strings from Clob

2008-12-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-891:


Fix Version/s: 1.4
  Summary: A Clobtransformer to read strings from Clob  (was: 
JdbcDataSource can translate Clobs to String)

> A Clobtransformer to read strings from Clob
> ---
>
> Key: SOLR-891
> URL: https://issues.apache.org/jira/browse/SOLR-891
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Reporter: Noble Paul
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-891.patch
>
>
> Clob cannot be directly be consumed by Solr. So JdbcDataSource can translate 
> it to Strings before processing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-812) JDBC optimizations: setReadOnly, setMaxRows

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655615#action_12655615
 ] 

Shalin Shekhar Mangar commented on SOLR-812:


bq. Certainly exposing these and the above in JdbcDataSource via properties 
would be more flexible to users. But sensible defaults should be set for 
read-only.

Any extra attributes that you specify to the  element are passed on 
to the DriverManager in a Properties object. So if your driver supports url 
parameters for these optimizations, you can use them right now. However, each 
driver has a different way of specifying this configuration, so we should 
support a way of making them configurable.

I'm not very comfortable with making changes to the default configuration for 
the sake of backwards-compatibility. But we should make these possible and 
document their usage.

> JDBC optimizations: setReadOnly, setMaxRows
> ---
>
> Key: SOLR-812
> URL: https://issues.apache.org/jira/browse/SOLR-812
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 1.3
>Reporter: David Smiley
>
> I'm looking at the DataImport code as of Solr v1.3 and using it with Postgres 
> and very large data sets and there some improvement suggestions I have.
> 1. call setReadOnly(true) on the connection.  DIH doesn't change the data so 
> this is obvious.
> 2. call setAutoCommit(false) on the connection.   (this is needed by Postgres 
> to ensure that the fetchSize hint actually works)
> 3. call setMaxRows(X) on the statement which is to be used when the 
> dataimport.jsp debugger is only grabbing X rows.  fetchSize is just a hint 
> and alone it isn't sufficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-841) DataImportHandler uses configuration without checking for existence of required attributes

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-841.


   Resolution: Fixed
Fix Version/s: 1.4
 Assignee: Shalin Shekhar Mangar

Committed revision 725635.

Thanks for reporting this Michael!

> DataImportHandler uses configuration without checking for existence of 
> required attributes
> --
>
> Key: SOLR-841
> URL: https://issues.apache.org/jira/browse/SOLR-841
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
> Environment: Tomcat 6, jdk 6u10
>Reporter: Michael Henson
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-841.patch
>
>
> The DataImportHandler's  XPathEntityProcessor throws a NullPointerException 
> when it tries to process a row from an XML data source during a 
> "full-import", both in the dataimport.jsp debugger and when started as a 
> normal request. The null pointer is thrown when a  tag in the 
>  ...  section of data-config.xml has no 
> "column" attribute.
> Example:
>  which should have been  column="entity_id" .../>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-841) DataImportHandler uses configuration without checking for existence of required attributes

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-841:
---

Attachment: SOLR-841.patch

With this patch, DataImportHandler throws an exception if a field does not have 
a column attribute.

> DataImportHandler uses configuration without checking for existence of 
> required attributes
> --
>
> Key: SOLR-841
> URL: https://issues.apache.org/jira/browse/SOLR-841
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
> Environment: Tomcat 6, jdk 6u10
>Reporter: Michael Henson
>Priority: Minor
> Attachments: SOLR-841.patch
>
>
> The DataImportHandler's  XPathEntityProcessor throws a NullPointerException 
> when it tries to process a row from an XML data source during a 
> "full-import", both in the dataimport.jsp debugger and when started as a 
> normal request. The null pointer is thrown when a  tag in the 
>  ...  section of data-config.xml has no 
> "column" attribute.
> Example:
>  which should have been  column="entity_id" .../>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-846) Out Of memory doing delta import with fetch size set to -1

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655574#action_12655574
 ] 

Shalin Shekhar Mangar commented on SOLR-846:


Committed revision 725627.

I've committed Noble's patch, however as he noted, it is only a partial 
solution. I'm in favor of streaming it however that will be an invasive change. 
Let's keep this issue open until we can implement a better solution.

> Out Of memory doing delta import with fetch size set to -1
> --
>
> Key: SOLR-846
> URL: https://issues.apache.org/jira/browse/SOLR-846
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.3
> Environment: Linux 2.6.18-92.1.13.el5xen, mysql 5.0
>Reporter: Ricky Leung
> Attachments: SOLR-846.patch
>
>
> Database has about 3 million records.  Doing full-import there is no problem. 
>  However, when a large number of changes occurred 2558057, delta-import 
> throws OutOfMemory error after 1288338 documents processed.  The stack trace 
> is below
> Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap space
>   at org.tartarus.snowball.ext.EnglishStemmer.(EnglishStemmer.java:4
> 9)
>   at org.apache.solr.analysis.EnglishPorterFilter.(EnglishPorterFilt
> erFactory.java:83)
>   at org.apache.solr.analysis.EnglishPorterFilterFactory.create(EnglishPor
> terFilterFactory.java:66)
>   at org.apache.solr.analysis.EnglishPorterFilterFactory.create(EnglishPor
> terFilterFactory.java:35)
>   at org.apache.solr.analysis.TokenizerChain.tokenStream(TokenizerChain.ja
> va:48)
>   at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.tokenStream(Inde
> xSchema.java:348)
>   at org.apache.lucene.analysis.Analyzer.reusableTokenStream(Analyzer.java
> :44)
>   at org.apache.lucene.index.DocInverterPerField.processFields(DocInverter
> PerField.java:117)
>   at org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFi
> eldConsumersPerField.java:36)
>   at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(Do
> cFieldProcessorPerThread.java:234)
>   at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWrite
> r.java:765)
>   at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWrite
> r.java:748)
>   at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2
> 118)
>   at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2
> 095)
>   at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandle
> r2.java:232)
>   at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpd
> ateProcessorFactory.java:59)
>   at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:
> 69)
>   at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImp
> ortHandler.java:288)
>   at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> r.java:319)
>   at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java
> :211)
>   at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
> :133)
>   at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImp
> orter.java:359)
>   at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
> ava:388)
>   at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
> va:377)
> dataSource in data-config.xml has been with the batchSize of "-1".
>  user="*" password="*" batchSize="-1"/>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655570#action_12655570
 ] 

Shalin Shekhar Mangar commented on SOLR-895:


Multiple documents is a legacy design issue that we are trying to remove 
altogether. Initially DataImportHandler was an external stand-alone server 
writing documents to Solr. Therefore it had multiple  elements to 
write to multiple cores. Once it was integrated inside Solr itself, it made 
sense to move away from that design.

Just use a single document always and use multiple root entities if needed. If 
you have multiple cores, each core should have its own DataImportHandler 
configuration.

> DataImportHandler does not import multiple documents specified in 
> db-data-config.xml
> 
>
> Key: SOLR-895
> URL: https://issues.apache.org/jira/browse/SOLR-895
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.3, 1.3.1, 1.4
>Reporter: Cameron Pope
> Attachments: import-multiple-documents.patch
>
>
> In our system we have multiple kinds of items that need to be indexed. In the 
> database, they are represented as 'one table per concrete class'. We are 
> using the DataImportHandler to automatically create an index from our 
> database. The db-data-config.xml file that we are using contains two 
> 'Document' elements: one for each class of item that we are indexing.
> Expected behavior: the DataImportHandler imports items for each 'Document' 
> tag defined in the configuration file
> Actual behavior: the DataImportHandler stops importing it completes indexing 
> of the first document
> I am attaching a patch, with a unit test that verifies the correct behavior, 
> it should apply against the trunk without problems. I can also supply a patch 
> against the 1.3 branch if you would like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-853) Make DIH API friendly

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655568#action_12655568
 ] 

Shalin Shekhar Mangar commented on SOLR-853:


It can surely be used to write to HBase and CouchDB :)

> Make DIH API friendly
> -
>
> Key: SOLR-853
> URL: https://issues.apache.org/jira/browse/SOLR-853
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-853.patch
>
>
> DIH currently can only be run inside Solr. But the core of DIH is quite 
> independent of Solr. There are only a few points where it requires Solr core 
> classes.They can be isolated out and we have an API in hand. If we limit the 
> dependency down to common util then DIH can be used by 
>  * Lucene users directly
>  * Run DIH remotely with SolrJ
>  * By any other tools using Lucene as their underlying  datastore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-888) DateFormatTransformer cannot convert non-string type

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-888.


Resolution: Fixed

Committed revision 725618.

Thanks Amit!

> DateFormatTransformer cannot convert non-string type
> 
>
> Key: SOLR-888
> URL: https://issues.apache.org/jira/browse/SOLR-888
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.3
> Environment: any
>Reporter: Amit Nithian
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.4
>
> Attachments: DateFormatTransformer.java
>
>   Original Estimate: 0.02h
>  Remaining Estimate: 0.02h
>
> When using the DateFormatTransformer, if the source column is of type 
> java.sql.TimeStamp, an exception is thrown converting this to a String. 
> Solution is to not typecast to a String but rather invoke the .toString() 
> method of the object to retrieve the string representation of the object.
> (About line 68)
> } else {
>   String value = (String) o;
>   aRow.put(column, process(value, fmt));
> }
> should be
> } else {
>   String value = o.toString();
>   aRow.put(column, process(value, fmt));
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-888) DateFormatTransformer cannot convert non-string type

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-888:
---

Fix Version/s: 1.4
 Assignee: Shalin Shekhar Mangar
 Priority: Minor  (was: Major)
  Summary: DateFormatTransformer cannot convert non-string type  (was: 
DateFormatTransformer can't handle objects of type java.sql.TimeStamp)

> DateFormatTransformer cannot convert non-string type
> 
>
> Key: SOLR-888
> URL: https://issues.apache.org/jira/browse/SOLR-888
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.3
> Environment: any
>Reporter: Amit Nithian
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.4
>
> Attachments: DateFormatTransformer.java
>
>   Original Estimate: 0.02h
>  Remaining Estimate: 0.02h
>
> When using the DateFormatTransformer, if the source column is of type 
> java.sql.TimeStamp, an exception is thrown converting this to a String. 
> Solution is to not typecast to a String but rather invoke the .toString() 
> method of the object to retrieve the string representation of the object.
> (About line 68)
> } else {
>   String value = (String) o;
>   aRow.put(column, process(value, fmt));
> }
> should be
> } else {
>   String value = o.toString();
>   aRow.put(column, process(value, fmt));
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-886) DataImportHandler should rollback when an import fails or is aborted

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-886.


Resolution: Fixed

Committed revision 725616.

> DataImportHandler should rollback when an import fails or is aborted
> 
>
> Key: SOLR-886
> URL: https://issues.apache.org/jira/browse/SOLR-886
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-886.patch
>
>
> DataImportHandler should call rollback when an import fails or is aborted. 
> This will make sure that uncommitted changes are not committed when the 
> IndexWriter is closed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-886) DataImportHandler should rollback when an import fails or is aborted

2008-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-886:
---

Attachment: SOLR-886.patch

Rollback if an import is aborted or if an exception is thrown during full or 
delta imports.

I'll commit this shortly.

> DataImportHandler should rollback when an import fails or is aborted
> 
>
> Key: SOLR-886
> URL: https://issues.apache.org/jira/browse/SOLR-886
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-886.patch
>
>
> DataImportHandler should call rollback when an import fails or is aborted. 
> This will make sure that uncommitted changes are not committed when the 
> IndexWriter is closed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.