Re: SOLR automatic failover

2009-07-13 Thread Jason Rutherglen
Basic failover, we can build from there?

2009/7/13 Noble Paul നോബിള്‍ नोब्ळ् 

> nope .
>
> what do you have in mind?
>
> On Tue, Jul 14, 2009 at 4:56 AM, Jason
> Rutherglen wrote:
> > Has anyone looked at implementing automatic failover in SOLR using a
> naming
> > service (like Zookeeper)?
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>


Re: SOLR automatic failover

2009-07-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
nope .

what do you have in mind?

On Tue, Jul 14, 2009 at 4:56 AM, Jason
Rutherglen wrote:
> Has anyone looked at implementing automatic failover in SOLR using a naming
> service (like Zookeeper)?
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


SOLR automatic failover

2009-07-13 Thread Jason Rutherglen
Has anyone looked at implementing automatic failover in SOLR using a naming
service (like Zookeeper)?


[jira] Created: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-07-13 Thread Jason Rutherglen (JIRA)
Add expungeDeletes to DirectUpdateHandler2
--

 Key: SOLR-1275
 URL: https://issues.apache.org/jira/browse/SOLR-1275
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.4


expungeDeletes is a useful method somewhat like optimize is offered by 
IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: cleaning up example

2009-07-13 Thread Yonik Seeley
On Mon, Jul 13, 2009 at 5:23 PM, Grant Ingersoll wrote:
>
> On Jul 4, 2009, at 11:54 AM, Yonik Seeley wrote:
>
>> One concrete step to both clean up "example" and lower the size of our
>> downloads is
>> - move example/clustering to contrib/example
>> - move clustering/lib to clustering/solr/lib
>> - download jars directly to clustering/solr/lib instead of
>> clustering/lib/downloads
>
> Remember that packaging/release also uses these directories.  The release
> mechanism now explicitly excludes the downloads directory.  Collapsing these
> will require you to explicitly exclude individual libraries.

Yet another reason it would be nice to allow subdirectories (or
multiple lib directories)... that way we could keep the "download"
dir.

-Yonik
http://www.lucidimagination.com


Re: cleaning up example

2009-07-13 Thread Grant Ingersoll


On Jul 4, 2009, at 11:54 AM, Yonik Seeley wrote:


One concrete step to both clean up "example" and lower the size of our
downloads is
- move example/clustering to contrib/example
- move clustering/lib to clustering/solr/lib
- download jars directly to clustering/solr/lib instead of
clustering/lib/downloads


Remember that packaging/release also uses these directories.  The  
release mechanism now explicitly excludes the downloads directory.   
Collapsing these will require you to explicitly exclude individual  
libraries.




- run the clustering example from example with
-Dsolr.solr.home=../contrib/clustering/solr

This will avoid a copy of all the clustering libs to a different
directory, and remove an entry from the "example" directory... which
was originally meant to contain a single server.

-Yonik
http://www.lucidimagination.com




[jira] Updated: (SOLR-284) Parsing Rich Document Types

2009-07-13 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-284:
--

Attachment: SOLR-284.patch

OK, here's my first crack at cleaning things up a little before release.  
Changes:
- there were no tests for XML attribute indexing.
- capture had no unit tests
- boost has no unit tests
- ignoring unknown fields had no unit test
- metadata prefix had no unit test
- logging ignored fields at the INFO level for each document loaded is too 
verbose
- removed handling of undeclared fields and let downstream components
  handle this.
- avoid the String catenation code for single valued fields when Tika only
  produces a single value (for performance)
- remove multiple literal detection handling for single valued fields - let a 
downstream component handle it
- map literal values just as one would with generated metadata, since the user 
may be just supplying the extra metadata.  also apply transforms (date 
formatting currently)
- fixed a bug where null field values were being added (and later dropped by 
Solr... hence it was never caught).
- avoid catching previously thrown SolrExceptions... let them fly through
- removed some unused code (id generation, etc)
- added lowernames option to map field names to lowercase/underscores
- switched builderStack from synchronized Stack to LinkedList 
- fixed a bug that caused content to be appended with no whitespace in between
- made extracting request handler lazy loading in example config
- added ignored_ and attr_ dynamic fields in example schema

Interface:
{code}
The default field is always "content" - use map to change it to something else
lowernames=true/false  // if true, map names like Content-Type to content_type
map.=
boost.=
literal.=
xpath=  - only generate content for the matching xpath expr
extractOnly=true/false - if true, just return the extracted content
capture=  // separate out these elements 
captureAttr=   // separate out the attributes for these 
elements
uprefix=  // unknown field prefix - any unknown fields will be 
prepended with this value
stream.type
resource.name
{code}

To try and make things more uniform, all fields, whether "content" or metadata 
or attributes or literals, all go through the same process.
1) map to lowercase if lowernames=true
2) apply map.field rules
3) if the resulting field is unknown, prefix it with uprefix

Hopefully people will agree that this is an improvement in general.  I think in 
the future we'll need more advanced options, esp around dealing with links in 
HTML and more powerful xpath constructs, but that's for after 1.4 IMO.

> Parsing Rich Document Types
> ---
>
> Key: SOLR-284
> URL: https://issues.apache.org/jira/browse/SOLR-284
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Eric Pugh
>Assignee: Grant Ingersoll
> Fix For: 1.4
>
> Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
> rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284-no-key-gen.patch, 
> SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
> SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
> solr-word.pdf, source.zip, test-files.zip, test-files.zip, test.zip, 
> un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
> that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
> Solr.
> There is a wiki page with information here: 
> http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2009-07-13 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730513#action_12730513
 ] 

Peter Wolanin commented on SOLR-874:


possibly a fix could be rolled into this existing method in 
SolrPluginUtils.java ?

{code}
  /**
   * Strips operators that are used illegally, otherwise reuturns it's
   * input.  Some examples of illegal user queries are: "chocolate +-
   * chip", "chocolate - - chip", and "chocolate chip -".
   */
  public static CharSequence stripIllegalOperators(CharSequence s) {
String temp = CONSECUTIVE_OP_PATTERN.matcher( s ).replaceAll( " " );
return DANGLING_OP_PATTERN.matcher( temp ).replaceAll( "" );
  }
{code}

This seems only to be called from:

org/apache/solr/search/DisMaxQParser.java:156:  userQuery = 
SolrPluginUtils.stripIllegalOperators(userQuery).toString();

> Dismax parser exceptions on trailing OPERATOR
> -
>
> Key: SOLR-874
> URL: https://issues.apache.org/jira/browse/SOLR-874
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.3
>Reporter: Erik Hatcher
>
> Dismax is supposed to be immune to parse exceptions, but alas it's not:
> http://localhost:8983/solr/select?defType=dismax&qf=name&q=ipod+AND
> kaboom!
> Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod 
> AND': Encountered "" at line 1, column 8.
> Was expecting one of:
>  ...
> "+" ...
> "-" ...
> "(" ...
> "*" ...
>  ...
>  ...
>  ...
>  ...
> "[" ...
> "{" ...
>  ...
>  ...
> "*" ...
> 
>   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175)
>   at 
> org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138)
>   at org.apache.solr.search.QParser.getQuery(QParser.java:88)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2009-07-13 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730492#action_12730492
 ] 

Peter Wolanin commented on SOLR-874:


I get the same sort of exception with a *leading* operator and the dismax 
handler.


Jul 13, 2009 1:47:06 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.queryParser.ParseException: Cannot parse 'OR vti OR bin OR 
vti OR aut OR author OR dll': Encountered "  "OR "" at line
1, column 0.
Was expecting one of:
...
   "+" ...
   "-" ...
   "(" ...
   "*" ...
...
...
...
...
   "[" ...
   "{" ...
...
...
   "*" ...

   at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:110)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

> Dismax parser exceptions on trailing OPERATOR
> -
>
> Key: SOLR-874
> URL: https://issues.apache.org/jira/browse/SOLR-874
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.3
>Reporter: Erik Hatcher
>
> Dismax is supposed to be immune to parse exceptions, but alas it's not:
> http://localhost:8983/solr/select?defType=dismax&qf=name&q=ipod+AND
> kaboom!
> Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod 
> AND': Encountered "" at line 1, column 8.
> Was expecting one of:
>  ...
> "+" ...
> "-" ...
> "(" ...
> "*" ...
>  ...
>  ...
>  ...
>  ...
> "[" ...
> "{" ...
>  ...
>  ...
> "*" ...
> 
>   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175)
>   at 
> org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138)
>   at org.apache.solr.search.QParser.getQuery(QParser.java:88)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-940) TrieRange support

2009-07-13 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-940:
---

Attachment: SOLR-940-LUCENE-1701-addition.patch

Hi Shalin,

here is an additional patch (but only for the trie parts), that is more 
intelligent and also uses NumericTokenStream for the query time factory. Your 
previous patch must be applied, then revert the changes in 
analysis.TrieXxxxTokenizerFactory and TrieField. Then apply the patch, which 
removes the old factories and creates a new one TrieTokenizerFactory. It should 
compile, but not really tested (it is hard to apply all your changes). If there 
are compile errors, they can be easily fixed :-)

The idea is to use the same tokenstream for query time analysis. To only 
produce the highest precision token needed for that, it is simply using a 
precisionStep of 32 for int/float and 64 for long/double/date of the former 
TrieIndexTokenizerFactory. No magic with KeywordTokenizer needed. NumericUtils, 
which is a expert Lucene class (not really public) is not needed anymore.

> TrieRange support
> -
>
> Key: SOLR-940
> URL: https://issues.apache.org/jira/browse/SOLR-940
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-940-LUCENE-1602.patch, SOLR-940-LUCENE-1602.patch, 
> SOLR-940-LUCENE-1701-addition.patch, SOLR-940-LUCENE-1701.patch, 
> SOLR-940-LUCENE-1701.patch, SOLR-940-newTrieAPI.patch, 
> SOLR-940-newTrieAPI.patch, SOLR-940-rangequery.patch, 
> SOLR-940-rangequery.patch, SOLR-940-test.patch, SOLR-940.patch, 
> SOLR-940.patch, SOLR-940.patch, SOLR-940.patch, SOLR-940.patch, 
> SOLR-940.patch, SOLR-940.patch, SOLR-940.patch, SOLR-940.patch, SOLR-940.patch
>
>
> We need support in Solr for the new TrieRange Lucene functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler

2009-07-13 Thread Peter Wolanin (JIRA)
Provide multiple output formats in extract-only mode for tika handler
-

 Key: SOLR-1274
 URL: https://issues.apache.org/jira/browse/SOLR-1274
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4


The proposed feature is to accept a URL parameter when using extract-only mode 
to specify an output format.  This parameter might just overload the existing 
"ext.extract.only" so that one can optionally specify a format, e.g. 
false|true|xml|text  where true and xml give the same response (i.e. xml 
remains the default)

I had been assuming that I could choose among possible tika output
formats when using the extracting request handler in extract-only mode
as if from the CLI with the tika jar:

   -x or --xmlOutput XHTML content (default)
   -h or --html   Output HTML content
   -t or --text   Output plain text content
   -m or --metadata   Output only metadata

However, looking at the docs and source, it seems that only the xml
option is available (hard-coded) in ExtractingDocumentLoader.java
{code}
serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", true));
{code}

Providing at least a plain-text response seems to work if you change the 
serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer).





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1129) SolrJ cannot bind dynamic fields to beans

2009-07-13 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1129:
-

Attachment: SOLR-1129.patch

updated to the trunk . I plan to commit this shortly

> SolrJ cannot bind dynamic fields to beans
> -
>
> Key: SOLR-1129
> URL: https://issues.apache.org/jira/browse/SOLR-1129
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-1129.patch, SOLR-1129.patch, SOLR-1129.patch, 
> SOLR-1129.patch, SOLR-1129.patch, SOLR-1129.patch
>
>
> SolrJ does not support binding of dynamic fields to bean fields
> The field declaration could be as follows
> {code:java}
> @Field("*_s")
> public String anyString;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-914) Presence of finalize() in the codebase

2009-07-13 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730273#action_12730273
 ] 

Noble Paul commented on SOLR-914:
-

bq.Code to release resources should be avoided as a finalize is no equivalent 
to a C++ dtor.

yes. But if the user has forgotten to do so  It is not a good idea to punish 
him by blowing up. A warning should be enough. 


> Presence of finalize() in the codebase 
> ---
>
> Key: SOLR-914
> URL: https://issues.apache.org/jira/browse/SOLR-914
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 1.3
> Environment: Tomcat 6, JRE 6
>Reporter: Kay Kay
>Assignee: Noble Paul
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-914.patch
>
>   Original Estimate: 480h
>  Remaining Estimate: 480h
>
> There seems to be a number of classes - that implement finalize() method.  
> Given that it is perfectly ok for a Java VM to not to call it - may be - 
> there has to some other way  { try .. finally - when they are created to 
> destroy them } to destroy them and the presence of finalize() method , ( 
> depending on implementation ) might not serve what we want and in some cases 
> can end up delaying the gc process, depending on the algorithms. 
> $ find . -name *.java | xargs grep finalize
> ./contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/JdbcDataSource.java:
>   protected void finalize() {
> ./src/java/org/apache/solr/update/SolrIndexWriter.java:  protected void 
> finalize() {
> ./src/java/org/apache/solr/core/CoreContainer.java:  protected void 
> finalize() {
> ./src/java/org/apache/solr/core/SolrCore.java:  protected void finalize() {
> ./src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:  protected 
> void finalize() throws Throwable {
> May be we need to revisit these occurences from a design perspective to see 
> if they are necessary / if there is an alternate way of managing guaranteed 
> destruction of resources. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1272) Java Replication does not log actions

2009-07-13 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730267#action_12730267
 ] 

Noble Paul commented on SOLR-1272:
--

Java replication currently logs it's actions quite extensively. It goes into 
the main log  now. But, the logging properties can be configured to make this 
log go into a separate file

> Java Replication does not log actions
> -
>
> Key: SOLR-1272
> URL: https://issues.apache.org/jira/browse/SOLR-1272
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Affects Versions: 1.4
>Reporter: Lance Norskog
> Fix For: 1.4
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Java Replication actions are not logged. There is no trail of full and 
> partial replications.
> All full and partial replications, failed replications, and communication 
> failures should be logged in solr/logs/ the way that the script replication 
> system logs activity.
> This is a basic requirement for production use. If such a log does exist, 
> please document it on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1272) Java Replication does not log actions

2009-07-13 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-1272:


Assignee: Noble Paul

> Java Replication does not log actions
> -
>
> Key: SOLR-1272
> URL: https://issues.apache.org/jira/browse/SOLR-1272
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Affects Versions: 1.4
>Reporter: Lance Norskog
>Assignee: Noble Paul
> Fix For: 1.4
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Java Replication actions are not logged. There is no trail of full and 
> partial replications.
> All full and partial replications, failed replications, and communication 
> failures should be logged in solr/logs/ the way that the script replication 
> system logs activity.
> This is a basic requirement for production use. If such a log does exist, 
> please document it on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

2009-07-13 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1229:
-

Attachment: SOLR-1229.patch

ideally , for your usecases , the pk attribute is not required. So i have 
removed it. Now it uses the user provided pk if it is not present it falls back 
to the solr schema uniqueKey

> deletedPkQuery feature does not work when pk and uniqueKey field do not have 
> the same value
> ---
>
> Key: SOLR-1229
> URL: https://issues.apache.org/jira/browse/SOLR-1229
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
> Fix For: 1.4
>
> Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, 
> SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the 
> database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> 
>  
>  
>pk="board_id"
>transformer="TemplateTransformer"
>deletedPkQuery="select board_id from boards where deleted = 'Y'"
>query="select * from boards where deleted = 'N'"
>deltaImportQuery="select * from boards where deleted = 'N'"
>deltaQuery="select * from boards where deleted = 'N'"
>preImportDeleteQuery="datasource:board">
>  
>  
>  
>
>  
> 
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a 
> template board-.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In 
> our definition, unique key of Solr document is the primary key of the top 
> level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.