[jira] Commented: (SOLR-561) Solr replication by Solr (for windows also)

2009-09-16 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755894#action_12755894
 ] 

Bill Bell commented on SOLR-561:


I am not a huge fan of PollInterval. It would be great to add an option to get 
the Index based on exact time: PollTime=*/15 * * * * That would run at every 
15 minutes based on the clock. i.e. 1:00pm, 1:15pm, 1:30pm, 1:45pm, etc.  All 
my slaves are sync'd using NTP, so this would work better. Since each slave 
starts differently, we cannot set the PollInterval=00:15:00 since they would 
get different indexes based on when they start. The other option would be to 
suspend polling - and start - which would be very manual I guess. Setting the 
PollInterval to 10 seconds would be getting a new index when the old one is 
still warming up. Even 10 seconds interval would not be good, since we get so 
many updates, each server would have different indexes. With Snap we don't have 
this issue.

We get SOLR updates frequently and since they are large we cannot wait to do a 
commit at the 15 minute mark using cron. Optimize just takes too long.

On our system we need to limit how often the slaves get the new index. We would 
like all slaves to get the index at the same time.

Bill


 Solr replication by Solr (for windows also)
 ---

 Key: SOLR-561
 URL: https://issues.apache.org/jira/browse/SOLR-561
 Project: Solr
  Issue Type: New Feature
  Components: replication (scripts)
Affects Versions: 1.4
 Environment: All
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4

 Attachments: deletion_policy.patch, SOLR-561-core.patch, 
 SOLR-561-fixes.patch, SOLR-561-fixes.patch, SOLR-561-fixes.patch, 
 SOLR-561-full.patch, SOLR-561-full.patch, SOLR-561-full.patch, 
 SOLR-561-full.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
 SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
 SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
 SOLR-561.patch, SOLR-561.patch, SOLR-561.patch


 The current replication strategy in solr involves shell scripts . The 
 following are the drawbacks with the approach
 *  It does not work with windows
 * Replication works as a separate piece not integrated with solr.
 * Cannot control replication from solr admin/JMX
 * Each operation requires manual telnet to the host
 Doing the replication in java has the following advantages
 * Platform independence
 * Manual steps can be completely eliminated. Everything can be driven from 
 solrconfig.xml .
 ** Adding the url of the master in the slaves should be good enough to enable 
 replication. Other things like frequency of
 snapshoot/snappull can also be configured . All other information can be 
 automatically obtained.
 * Start/stop can be triggered from solr/admin or JMX
 * Can get the status/progress while replication is going on. It can also 
 abort an ongoing replication
 * No need to have a login into the machine 
 * From a development perspective, we can unit test it
 This issue can track the implementation of solr replication in java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-561) Solr replication by Solr (for windows also)

2009-09-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755908#action_12755908
 ] 

Noble Paul commented on SOLR-561:
-

The default pollInterval can behave the vway you want (so that the fetches are 
synchronized in time by the clock). Raise a separate issue and we can fix it

 Solr replication by Solr (for windows also)
 ---

 Key: SOLR-561
 URL: https://issues.apache.org/jira/browse/SOLR-561
 Project: Solr
  Issue Type: New Feature
  Components: replication (scripts)
Affects Versions: 1.4
 Environment: All
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4

 Attachments: deletion_policy.patch, SOLR-561-core.patch, 
 SOLR-561-fixes.patch, SOLR-561-fixes.patch, SOLR-561-fixes.patch, 
 SOLR-561-full.patch, SOLR-561-full.patch, SOLR-561-full.patch, 
 SOLR-561-full.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
 SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
 SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
 SOLR-561.patch, SOLR-561.patch, SOLR-561.patch


 The current replication strategy in solr involves shell scripts . The 
 following are the drawbacks with the approach
 *  It does not work with windows
 * Replication works as a separate piece not integrated with solr.
 * Cannot control replication from solr admin/JMX
 * Each operation requires manual telnet to the host
 Doing the replication in java has the following advantages
 * Platform independence
 * Manual steps can be completely eliminated. Everything can be driven from 
 solrconfig.xml .
 ** Adding the url of the master in the slaves should be good enough to enable 
 replication. Other things like frequency of
 snapshoot/snappull can also be configured . All other information can be 
 automatically obtained.
 * Start/stop can be triggered from solr/admin or JMX
 * Can get the status/progress while replication is going on. It can also 
 abort an ongoing replication
 * No need to have a login into the machine 
 * From a development perspective, we can unit test it
 This issue can track the implementation of solr replication in java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1435) ensure that all slaves with same pollInteval fetches index at same time

2009-09-16 Thread Noble Paul (JIRA)
ensure that all slaves with same pollInteval fetches index at same time
---

 Key: SOLR-1435
 URL: https://issues.apache.org/jira/browse/SOLR-1435
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Reporter: Noble Paul
Assignee: Noble Paul


When pollInterval is set to be some value ensure that al slaves fetch index at 
the same time (if their clocks are synchronized) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1435) ensure that all slaves with same pollInteval fetches index at same time

2009-09-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1435:
-

Attachment: SOLR-1435.patch

Should we fix this in Solr1.4

 ensure that all slaves with same pollInteval fetches index at same time
 ---

 Key: SOLR-1435
 URL: https://issues.apache.org/jira/browse/SOLR-1435
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-1435.patch


 When pollInterval is set to be some value ensure that al slaves fetch index 
 at the same time (if their clocks are synchronized) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1435) ensure that all slaves with same pollInteval fetches index at same time

2009-09-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1435:
-

Attachment: (was: SOLR-1435.patch)

 ensure that all slaves with same pollInteval fetches index at same time
 ---

 Key: SOLR-1435
 URL: https://issues.apache.org/jira/browse/SOLR-1435
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Reporter: Noble Paul
Assignee: Noble Paul

 When pollInterval is set to be some value ensure that al slaves fetch index 
 at the same time (if their clocks are synchronized) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1435) ensure that all slaves with same pollInteval fetches index at same time

2009-09-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1435:
-

Attachment: SOLR-1435.patch

 ensure that all slaves with same pollInteval fetches index at same time
 ---

 Key: SOLR-1435
 URL: https://issues.apache.org/jira/browse/SOLR-1435
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-1435.patch


 When pollInterval is set to be some value ensure that al slaves fetch index 
 at the same time (if their clocks are synchronized) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1435) ensure that all slaves with same pollInteval fetches index at same time

2009-09-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1435:
-

Fix Version/s: 1.4

 ensure that all slaves with same pollInteval fetches index at same time
 ---

 Key: SOLR-1435
 URL: https://issues.apache.org/jira/browse/SOLR-1435
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1435.patch


 When pollInterval is set to be some value ensure that al slaves fetch index 
 at the same time (if their clocks are synchronized) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1407) SpellingQueryConverter now disallows underscores and digits in field names (but allows all UTF-8 letters)

2009-09-16 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-1407:


Attachment: SOLR-1407.patch

# Uses Michael's NMTOKEN regex.
# Added tests chinese chars and special characters in field names/values

I added the same NMTOKEN for values also. Otherwise values which have an 
underscore or digit or hyphen are split into multiple tokens at these 
characters. I don't think that should happen. Grant, any thoughts?

 SpellingQueryConverter now disallows underscores and digits in field names 
 (but allows all UTF-8 letters)
 -

 Key: SOLR-1407
 URL: https://issues.apache.org/jira/browse/SOLR-1407
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Affects Versions: 1.3
Reporter: David Bowen
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 1.4

 Attachments: SOLR-1407.patch, SpellingQueryConverter.java, 
 SpellingQueryConverter.java


 SpellingQueryConverter was extended to cover the full UTF-8 range instead of 
 handling US-ASCII only, but in the process it was broken for field names that 
 contain underscores or digits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Solr-trunk #926

2009-09-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/926/

--
[...truncated 443 lines...]
A src/test/test-files
A src/test/test-files/solr
A src/test/test-files/solr/crazy-path-to-schema.xml
A src/test/test-files/solr/crazy-path-to-config.xml
A src/test/test-files/solr/conf
AUsrc/test/test-files/solr/conf/solrconfig-duh-optimize.xml
AUsrc/test/test-files/solr/conf/solrconfig_perf.xml
AUsrc/test/test-files/solr/conf/schema-required-fields.xml
AUsrc/test/test-files/solr/conf/elevate.xml
AUsrc/test/test-files/solr/conf/schema-replication1.xml
AUsrc/test/test-files/solr/conf/solrconfig-transformers.xml
AUsrc/test/test-files/solr/conf/schema-replication2.xml
A src/test/test-files/solr/conf/xslt
A src/test/test-files/solr/conf/xslt/dummy.xsl
AUsrc/test/test-files/solr/conf/solrconfig-master.xml
AUsrc/test/test-files/solr/conf/solrconfig-slave1.xml
A src/test/test-files/solr/conf/schema.xml
AUsrc/test/test-files/solr/conf/schema11.xml
AUsrc/test/test-files/solr/conf/schema-spellchecker.xml
A src/test/test-files/solr/conf/stop-1.txt
A src/test/test-files/solr/conf/stop-2.txt
AUsrc/test/test-files/solr/conf/schema12.xml
AUsrc/test/test-files/solr/conf/solrconfig-nocache.xml
AUsrc/test/test-files/solr/conf/schema-not-required-unique-key.xml
AUsrc/test/test-files/solr/conf/solrconfig-spellchecker.xml
AUsrc/test/test-files/solr/conf/solrconfig-altdirectory.xml
A src/test/test-files/solr/conf/solrconfig-enableplugin.xml
AUsrc/test/test-files/solr/conf/solrconfig-querysender.xml
AUsrc/test/test-files/solr/conf/solrconfig-facet-sort.xml
AUsrc/test/test-files/solr/conf/solrconfig-slave.xml
AUsrc/test/test-files/solr/conf/schema-reversed.xml
A src/test/test-files/solr/conf/synonyms.txt
AUsrc/test/test-files/solr/conf/solrconfig-functionquery.xml
AUsrc/test/test-files/solr/conf/solrconfig-master1.xml
AUsrc/test/test-files/solr/conf/solrconfig-master2.xml
A src/test/test-files/solr/conf/protwords.txt
A src/test/test-files/solr/conf/stopwords.txt
AUsrc/test/test-files/solr/conf/bad-schema.xml
AUsrc/test/test-files/solr/conf/schema-minimal.xml
A src/test/test-files/solr/conf/schema-binaryfield.xml
A src/test/test-files/solr/conf/solrconfig-solcoreproperties.xml
AUsrc/test/test-files/solr/conf/solrconfig-elevate.xml
AUsrc/test/test-files/solr/conf/mapping-ISOLatin1Accent.txt
AUsrc/test/test-files/solr/conf/schema-copyfield-test.xml
AUsrc/test/test-files/solr/conf/schema-trie.xml
A src/test/test-files/solr/conf/keep-1.txt
A src/test/test-files/solr/conf/keep-2.txt
AUsrc/test/test-files/solr/conf/solrconfig-termindex.xml
AUsrc/test/test-files/solr/conf/solrconfig-SOLR-749.xml
A src/test/test-files/solr/conf/schema-stop-keep.xml
AUsrc/test/test-files/solr/conf/solrconfig.xml
AUsrc/test/test-files/solr/conf/solrconfig-delpolicy1.xml
AUsrc/test/test-files/solr/conf/solrconfig-delpolicy2.xml
AUsrc/test/test-files/solr/conf/solrconfig-highlight.xml
A src/test/test-files/solr/conf/bad_solrconfig.xml
A src/test/test-files/solr/shared
A src/test/test-files/solr/shared/conf
AUsrc/test/test-files/solr/shared/conf/schema.xml
AUsrc/test/test-files/solr/shared/conf/stopwords-en.txt
AUsrc/test/test-files/solr/shared/conf/solrconfig.xml
AUsrc/test/test-files/solr/shared/conf/stopwords-fr.txt
AUsrc/test/test-files/solr/shared/solr.xml
AUsrc/test/test-files/sampleDateFacetResponse.xml
AUsrc/test/test-files/mailing_lists.pdf
AUsrc/test/test-files/books.csv
AUsrc/test/test-files/htmlStripReaderTest.html
A src/test/test-files/README
AUsrc/test/test-files/spellings.txt
A src/test/org
A src/test/org/apache
A src/test/org/apache/solr
A src/test/org/apache/solr/update
A src/test/org/apache/solr/update/processor
AU
src/test/org/apache/solr/update/processor/UpdateRequestProcessorFactoryTest.java
AU
src/test/org/apache/solr/update/processor/SignatureUpdateProcessorFactoryTest.java
AU
src/test/org/apache/solr/update/processor/CustomUpdateRequestProcessorFactory.java
A src/test/org/apache/solr/update/AutoCommitTest.java
AUsrc/test/org/apache/solr/update/DocumentBuilderTest.java
AUsrc/test/org/apache/solr/update/TestIndexingPerformance.java
A src/test/org/apache/solr/update/DirectUpdateHandlerTest.java
AUsrc/test/org/apache/solr/update/DirectUpdateHandlerOptimizeTest.java
AUsrc/test/org/apache/solr/TestTrie.java
A src/test/org/apache/solr/analysis
AU

[jira] Commented: (SOLR-1407) SpellingQueryConverter now disallows underscores and digits in field names (but allows all UTF-8 letters)

2009-09-16 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755976#action_12755976
 ] 

Grant Ingersoll commented on SOLR-1407:
---

Looks good, the only thing I can see doing is moving to incrementToken() 
instead of next(), but that isn't required just yet.

 SpellingQueryConverter now disallows underscores and digits in field names 
 (but allows all UTF-8 letters)
 -

 Key: SOLR-1407
 URL: https://issues.apache.org/jira/browse/SOLR-1407
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Affects Versions: 1.3
Reporter: David Bowen
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 1.4

 Attachments: SOLR-1407.patch, SpellingQueryConverter.java, 
 SpellingQueryConverter.java


 SpellingQueryConverter was extended to cover the full UTF-8 range instead of 
 handling US-ASCII only, but in the process it was broken for field names that 
 contain underscores or digits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1436) Consider changing multi-term queries to use CONSTANT_SCORE_AUTO_REWRITE_DEFAULT

2009-09-16 Thread Mark Miller (JIRA)
Consider changing multi-term queries to use CONSTANT_SCORE_AUTO_REWRITE_DEFAULT 


 Key: SOLR-1436
 URL: https://issues.apache.org/jira/browse/SOLR-1436
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
Priority: Minor


They are CONSTANT_SCORE_AUTO_REWRITE_DEFAULT now, but 
ConstantScoreBooleanQueryRewrite can be faster and 
CONSTANT_SCORE_AUTO_REWRITE_DEFAULT is likely the best setting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1407) SpellingQueryConverter now disallows underscores and digits in field names (but allows all UTF-8 letters)

2009-09-16 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755994#action_12755994
 ] 

Shalin Shekhar Mangar commented on SOLR-1407:
-

bq. Looks good, the only thing I can see doing is moving to incrementToken() 
instead of next(), but that isn't required just yet. 

Thanks, I'll commit this then.

 SpellingQueryConverter now disallows underscores and digits in field names 
 (but allows all UTF-8 letters)
 -

 Key: SOLR-1407
 URL: https://issues.apache.org/jira/browse/SOLR-1407
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Affects Versions: 1.3
Reporter: David Bowen
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 1.4

 Attachments: SOLR-1407.patch, SpellingQueryConverter.java, 
 SpellingQueryConverter.java


 SpellingQueryConverter was extended to cover the full UTF-8 range instead of 
 handling US-ASCII only, but in the process it was broken for field names that 
 contain underscores or digits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1407) SpellingQueryConverter now disallows underscores and digits in field names (but allows all UTF-8 letters)

2009-09-16 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-1407:


Attachment: SOLR-1407.patch

# Removed NMTOKEN from value because it will disallow special characters such 
as comma, period etc.
# Any and all characters are permitted in a value except a space character 
(which is the delimiter)
# Added test for the above

The SpellingQueryConverter still breaks for phrase queries which have a space 
in them like field_s:foo bar. But this issue existed in 1.3 too.

I'll commit this soon.

 SpellingQueryConverter now disallows underscores and digits in field names 
 (but allows all UTF-8 letters)
 -

 Key: SOLR-1407
 URL: https://issues.apache.org/jira/browse/SOLR-1407
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Affects Versions: 1.3
Reporter: David Bowen
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 1.4

 Attachments: SOLR-1407.patch, SOLR-1407.patch, 
 SpellingQueryConverter.java, SpellingQueryConverter.java


 SpellingQueryConverter was extended to cover the full UTF-8 range instead of 
 handling US-ASCII only, but in the process it was broken for field names that 
 contain underscores or digits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1437) DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.

2009-09-16 Thread Fergus McMenemie (JIRA)
DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.


 Key: SOLR-1437
 URL: https://issues.apache.org/jira/browse/SOLR-1437
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Fergus McMenemie
Priority: Minor
 Fix For: 1.4


As per 
http://www.nabble.com/Re%3A-Extract-info-from-parent-node-during-data-import-%28redirect%3A%29-td25471162.html
 it would be nice to be able to use expressions such as //tagname when parsing 
XML documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1437) DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.

2009-09-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1437:
-

Fix Version/s: (was: 1.4)
   1.5

it may not be viable to target this for 1.4

 DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.
 

 Key: SOLR-1437
 URL: https://issues.apache.org/jira/browse/SOLR-1437
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Fergus McMenemie
Priority: Minor
 Fix For: 1.5

   Original Estimate: 672h
  Remaining Estimate: 672h

 As per 
 http://www.nabble.com/Re%3A-Extract-info-from-parent-node-during-data-import-%28redirect%3A%29-td25471162.html
  it would be nice to be able to use expressions such as //tagname when 
 parsing XML documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: CSV Update - Need help mapping csv field to schema's ID

2009-09-16 Thread Grant Ingersoll


On Sep 15, 2009, at 8:25 PM, Yonik Seeley wrote:


: .map={sku.field}:{id}

the map param is for replacing a *value* with a different'  
value ... it's
useful for things like numeric codes in CSV files that you want to  
replace

with strings in your index.


Darn... I shouldn't trust my memory.
From http://issues.apache.org/jira/browse/SOLR-284
'''drop ext. from parameter names, and revisit naming to try and
unify with other update handlers like CSV'''

So now map.a=b in CSV is for values but map.a=b in SolrCell is for  
fields

perhaps we should change map in SolrCell to fmap?


That's fine by me.  Just update the docs when you're done.



My longer range idea was to pull out some generally useful things like
field mapping, etc, such that they could be shared across update
handlers.


See also:
 SOLR-1032, SOLR-1069 for related things.  We should be able to  
refactor the field mapping code easy enough.




-Yonik
http://www.lucidimagination.com


-- Forwarded message --
From: Chris Hostetter hossman_luc...@fucit.org
Date: Tue, Sep 15, 2009 at 8:12 PM
Subject: Re: CSV Update - Need help mapping csv field to schema's ID
To: solr-u...@lucene.apache.org



: I would like to add an additional name:value pair for every line,  
mapping the

: sku field to my schema's id field:
:
: .map={sku.field}:{id}

the map param is for replacing a *value* with a different' value ...  
it's
useful for things like numeric codes in CSV files that you want to  
replace

with strings in your index.

: I would prefer NOT to change the schema by adding a copyField  
source=sku

: dest=id/.

that's the only solution i can think of unless you want to write an
UpdateProcessor.


-Hoss


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: distinct example for Solr Cell?

2009-09-16 Thread Yonik Seeley
On Tue, Sep 15, 2009 at 7:31 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
 I remember a discussion about removing the /update/extract handler from
 ./example/solr/conf/solrconfig.xml so that we could stop copying all the
 jars into ./example/solr/lib/ and have a smaller, simpler, example.

I don't recall that...  I did make the extract handler lazy such that
one could easily remove all of the tika jars from the example w/o
triggering an exception.  We should look into updating the example
README at a minimum.

We should certainly strive for simplicity, but that can go either
way... I like the batteries included mentality of python too.

Future idea:
 - make example slightly more formal by naming it server
 - make server/solr/lib the home for some of these jars (preferably
separated by sub-directory) and make compilation and tests go against
these jars

That would keep the server dir self contained (no outside references
- copy it somewhere else to deploy), make our download smaller, and
eliminate the copying around of libs.

 The
 idea being that then there would be a seperate distinct set of configs
 providing an example of the extraction handler (with all of it's jars)

If it's an example with all of it's jars, it seems like it's still a
copy of all those jars, right?
Or, we could put the example in contrib/extracting and make it such
that the code and example server shared the libraries?


-Yonik
http://www.lucidimagination.com


[jira] Commented: (SOLR-1314) Upgrade Carrot2 to version 3.1.0

2009-09-16 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756049#action_12756049
 ] 

Grant Ingersoll commented on SOLR-1314:
---

bq. As a follow-up of the discussion on legal-discuss

OK, I think that leaves only the patent wording.  My takeaway from the 
legal-discuss thread is that particular line doesn't hold water, so you 
probably could just drop it.  At a minimum, it needs to make explicit it 
pertains to Carrot2 and not be ambiguous as it is now.

Thanks!

 Upgrade Carrot2 to version 3.1.0
 

 Key: SOLR-1314
 URL: https://issues.apache.org/jira/browse/SOLR-1314
 Project: Solr
  Issue Type: Task
Reporter: Stanislaw Osinski
Assignee: Grant Ingersoll
 Fix For: 1.4


 As soon as Lucene 2.9 is releases, Carrot2 3.1.0 will come out with bug fixes 
 in clustering algorithms and improved clustering in Chinese. The upgrade 
 should be a matter of upgrading {{carrot2-mini.jar}} and 
 {{google-collections.jar}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-16 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756050#action_12756050
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

bq. These enable suffix compression and create much smaller word graphs.

DAWGs are problematic, because they are essentially immutable once created (the 
cost of insert / delete is very high). So I propose to stick to TSTs for now.

Also, I think that populating TST from the index would have to be 
discriminative, perhaps based on a threshold (so that it only adds terms with 
large enough docFreq), and it would be good to adjust the content of the tree 
based on actual queries that return some results (poor man's auto-learning), 
gradually removing least frequent strings to save space.. We could also use as 
a source a field with 1-3 word shingles (no tf, unstored, to save space in the 
source index, with a similar thresholding mechanism).

Ankul, I'm not sure what's the behavior of your implementation when dynamically 
adding / removing keys? Does it still remain balanced?

I also found a MIT-licensed  impl. of radix tree here: 
http://code.google.com/p/radixtree, which looks good too, one spelling mistake 
in the API notwithstanding ;)


 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: TernarySearchTree.tar.gz

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1437) DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.

2009-09-16 Thread Fergus McMenemie (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756051#action_12756051
 ] 

Fergus McMenemie commented on SOLR-1437:


A pity we may not make the 1.4 release, but I guess there is no harm in trying!

Looking through the code for XPathRecordReader I see a variable skipNextEvent 
inside the parse method. Can anybody explain why we need to skip an event at 
the end of a text block?

 DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.
 

 Key: SOLR-1437
 URL: https://issues.apache.org/jira/browse/SOLR-1437
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Fergus McMenemie
Priority: Minor
 Fix For: 1.5

   Original Estimate: 672h
  Remaining Estimate: 672h

 As per 
 http://www.nabble.com/Re%3A-Extract-info-from-parent-node-during-data-import-%28redirect%3A%29-td25471162.html
  it would be nice to be able to use expressions such as //tagname when 
 parsing XML documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1433) files included in release that shouldn't be

2009-09-16 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-1433:
-

Assignee: Grant Ingersoll

 files included in release that shouldn't be
 ---

 Key: SOLR-1433
 URL: https://issues.apache.org/jira/browse/SOLR-1433
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Grant Ingersoll
 Fix For: 1.4


 some files are making it into the release artifacts that shouldn't be ... 
 need to take care of this in the build file prior to releasing 1.4.  details 
 to follow in comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-16 Thread Ankul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756072#action_12756072
 ] 

Ankul Garg commented on SOLR-1316:
--

Removing keys shall not affect the balancing of the tree as it can be easily
done by making the boolean end at the leaf as false. Adding keys dynamically
wont really keep the tree balanced in my implementation, as in my
implementation the tree is balanced by ordered insertion of keys. So while
adding more keys, the TST will have to be rebuilt to make it balanced. Will
that be problematic?




 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: TernarySearchTree.tar.gz

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr 1.4 Open Issues Status

2009-09-16 Thread Grant Ingersoll


On Sep 15, 2009, at 6:23 PM, Chris Hostetter wrote:



:   Hoss' patch is a reasonable start. I think this can be  
committed. We

:  can iterate in 1.5. Mark or Hoss?
: I think this is a great start for 1.4 and the rest can wait till  
1.5,

: but I'll defer to Hoss. I had started working on something more
: complicated, but I prefer Hoss' route.

Mark: my patch was just something i cranked out really quick and  
dirty to
sanity test that various solr components weren't already causing  
insanity

... feel free to run with it, i'm a little burnt out right now just
trying to keep up with email.



I vote we go with it for now and then improve in 1.5


Re: CSV Update - Need help mapping csv field to schema's ID

2009-09-16 Thread Insight 49, LLC

Darn. I hate when I create work for people.

My need is to take a csv file, use the CSV update handler, but then add 
an additional copyfield (sku from csv to id from schema) to create a 
unique id for each record.


Thanks guys. Terrific work on SOLR.

Dan


Grant Ingersoll wrote:


On Sep 15, 2009, at 8:25 PM, Yonik Seeley wrote:


Darn... I shouldn't trust my memory.
From http://issues.apache.org/jira/browse/SOLR-284
'''drop ext. from parameter names, and revisit naming to try and
unify with other update handlers like CSV'''

So now map.a=b in CSV is for values but map.a=b in SolrCell is for 
fields

perhaps we should change map in SolrCell to fmap?


That's fine by me.  Just update the docs when you're done.



My longer range idea was to pull out some generally useful things like
field mapping, etc, such that they could be shared across update
handlers.


See also:
 SOLR-1032, SOLR-1069 for related things.  We should be able to refactor 
the field mapping code easy enough.




Re: Solr 1.4 Open Issues Status

2009-09-16 Thread Grant Ingersoll


On Sep 15, 2009, at 12:01 PM, Andrzej Bialecki wrote:


Grant Ingersoll wrote:

Here's where we are at for 1.4.  My comments are marked by .
I think we are in pretty good shape, people just need to make some  
final commits. If things are still unassigned tomorrow morning, I'm  
going to push them to 1.5.

KeySummaryAssignee
SOLR-1427SearchComponents aren't listed on registry.jsp 
Grant Ingersoll

 I just put up a patch that I believe is ready to commit.
SOLR-1423Lucene 2.9 RC4 may need some changes in Solr Analyzers  
using CharStream  othersKoji Sekiguchi

 Koji?
SOLR-1407SpellingQueryConverter now disallows underscores and  
digits in field names (but allows all UTF-8 letters)Shalin  
Shekhar Mangar

Needs a patch and a unit test.  Push to 1.5?
SOLR-1396standardize the updateprocessorchain syntax 
Unassigned
 No patch exists and no work has been done on it.  Seems like we  
should get this right.  Volunteers?
SOLR-1366UnsupportedOperationException may be thrown when
using custom IndexReaderMark Miller

 Patch exists.  Mark?


That patch doesn't solve the issue - it can't be solved without  
serious changes in the replication handler. For now we can only  
clarify the breakage in the documentation.


Care to take up that documentation, Andrzej?


Re: CSV Update - Need help mapping csv field to schema's ID

2009-09-16 Thread Grant Ingersoll


On Sep 16, 2009, at 9:41 AM, Grant Ingersoll wrote:



On Sep 15, 2009, at 8:25 PM, Yonik Seeley wrote:


: .map={sku.field}:{id}

the map param is for replacing a *value* with a different'  
value ... it's
useful for things like numeric codes in CSV files that you want to  
replace

with strings in your index.


Darn... I shouldn't trust my memory.
From http://issues.apache.org/jira/browse/SOLR-284
'''drop ext. from parameter names, and revisit naming to try and
unify with other update handlers like CSV'''

So now map.a=b in CSV is for values but map.a=b in SolrCell is for  
fields

perhaps we should change map in SolrCell to fmap?


That's fine by me.  Just update the docs when you're done.


Actually, I can do this now.





My longer range idea was to pull out some generally useful things  
like

field mapping, etc, such that they could be shared across update
handlers.


See also:
SOLR-1032, SOLR-1069 for related things.  We should be able to  
refactor the field mapping code easy enough.




-Yonik
http://www.lucidimagination.com


-- Forwarded message --
From: Chris Hostetter hossman_luc...@fucit.org
Date: Tue, Sep 15, 2009 at 8:12 PM
Subject: Re: CSV Update - Need help mapping csv field to schema's ID
To: solr-u...@lucene.apache.org



: I would like to add an additional name:value pair for every line,  
mapping the

: sku field to my schema's id field:
:
: .map={sku.field}:{id}

the map param is for replacing a *value* with a different'  
value ... it's
useful for things like numeric codes in CSV files that you want to  
replace

with strings in your index.

: I would prefer NOT to change the schema by adding a copyField  
source=sku

: dest=id/.

that's the only solution i can think of unless you want to write an
UpdateProcessor.


-Hoss


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



[jira] Resolved: (SOLR-1407) SpellingQueryConverter now disallows underscores and digits in field names (but allows all UTF-8 letters)

2009-09-16 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-1407.
-

Resolution: Fixed

Committed revision 815801.

Thanks David  Michael!

 SpellingQueryConverter now disallows underscores and digits in field names 
 (but allows all UTF-8 letters)
 -

 Key: SOLR-1407
 URL: https://issues.apache.org/jira/browse/SOLR-1407
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Affects Versions: 1.3
Reporter: David Bowen
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 1.4

 Attachments: SOLR-1407.patch, SOLR-1407.patch, 
 SpellingQueryConverter.java, SpellingQueryConverter.java


 SpellingQueryConverter was extended to cover the full UTF-8 range instead of 
 handling US-ASCII only, but in the process it was broken for field names that 
 contain underscores or digits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-16 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756094#action_12756094
 ] 

Shalin Shekhar Mangar commented on SOLR-1316:
-

bq. DAWGs are problematic, because they are essentially immutable once created 
(the cost of insert / delete is very high)

Andrej, why would immutability be a problem? Wouldn't we have to re-build the 
TST if the source index changes?

bq. Also, I think that populating TST from the index would have to be 
discriminative, perhaps based on a threshold

I think the building of the data structure can be done in a way similar to what 
SpellCheckComponent does. We can re-use the HighFrequencyDictionary which can 
give tokens above a certain threshold frequency. The field names to use for 
building the data structure and the analysis can also be done like SCC. The 
response format for this component can also be similar to SCC.

 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: TernarySearchTree.tar.gz

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1314) Upgrade Carrot2 to version 3.1.0

2009-09-16 Thread Stanislaw Osinski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756110#action_12756110
 ] 

Stanislaw Osinski commented on SOLR-1314:
-

Hi Grant,

I've just dropped the patenting clause entirely. The updated license is in the 
repo and at: http://www.carrot2.org/carrot2.LICENSE.

S.

 Upgrade Carrot2 to version 3.1.0
 

 Key: SOLR-1314
 URL: https://issues.apache.org/jira/browse/SOLR-1314
 Project: Solr
  Issue Type: Task
Reporter: Stanislaw Osinski
Assignee: Grant Ingersoll
 Fix For: 1.4


 As soon as Lucene 2.9 is releases, Carrot2 3.1.0 will come out with bug fixes 
 in clustering algorithms and improved clustering in Chinese. The upgrade 
 should be a matter of upgrading {{carrot2-mini.jar}} and 
 {{google-collections.jar}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-16 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756149#action_12756149
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

bq. Andrej, why would immutability be a problem? Wouldn't we have to re-build 
the TST if the source index changes?

Well, the use case I have in mind is a TST that improves itself over time based 
on the observed query log. I.e. you would bootstrap a TST from the index (and 
here indeed you can do this on every searcher refresh), but it's often claimed 
that real query logs provide a far better source of autocomplete than the index 
terms. My idea was to start with what you have - in the absence of query logs - 
and then improve upon it by adding successful queries (and removing least-used 
terms to keep the tree at a more or less constant size).

Alternatively we could provide an option to bootstrap it from a real query log 
data.

This use case requires mutability, hence my negative opinion about DAGWs 
(besides, we are lacking an implementation, don't we, whereas we already have a 
few suitable TST implementations). Perhaps this doesn't have to be an 
either/or, if we come up with a pluggable interface for this type of component?

bq. I think the building of the data structure can be done in a way similar to 
what SpellCheckComponent does. [..]

+1


 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: TernarySearchTree.tar.gz

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-284) Parsing Rich Document Types

2009-09-16 Thread Chris Harris (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756154#action_12756154
 ] 

Chris Harris commented on SOLR-284:
---

This caught me by surprise, so I'm noting it here in case it helps anyone else:

In SVN r815830 (September 16, 2009), Grant renamed the field name mapping 
argument map to fmap. The reason was to make naming more consistent with 
the CSV handler. For more info on this see the following thread:

http://www.nabble.com/Fwd%3A-CSV-Update---Need-help-mapping-csv-field-to-schema%27s-ID-td25463942.html



 Parsing Rich Document Types
 ---

 Key: SOLR-284
 URL: https://issues.apache.org/jira/browse/SOLR-284
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Eric Pugh
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
 rich.patch, rich.patch, rich.patch, rich.patch, schema_update.patch, 
 SOLR-284-no-key-gen.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
 SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
 SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, test-files.zip, 
 test-files.zip, test.zip, un-hardcode-id.diff


 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
 that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
 Solr.
 There is a wiki page with information here: 
 http://wiki.apache.org/solr/UpdateRichDocuments
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1336) Add support for lucene's SmartChineseAnalyzer

2009-09-16 Thread Stanislaw Osinski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756177#action_12756177
 ] 

Stanislaw Osinski commented on SOLR-1336:
-

Keeping the Chinese analyzer JAR optional sounds good. As Carrot2 also uses it, 
I'd need to make sure the clustering contrib doesn't fail when the JAR is not 
there and clustering in Chinese is requested (I think I'd simply log a WARN 
saying that the Chinese analyzer JAR is required for best clustering results).

 Add support for lucene's SmartChineseAnalyzer
 -

 Key: SOLR-1336
 URL: https://issues.apache.org/jira/browse/SOLR-1336
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Reporter: Robert Muir
 Attachments: SOLR-1336.patch, SOLR-1336.patch, SOLR-1336.patch


 SmartChineseAnalyzer was contributed to lucene, it indexes simplified chinese 
 text as words.
 if the factories for the tokenizer and word token filter are added to solr it 
 can be used, although there should be a sample config or wiki entry showing 
 how to apply the built-in stopwords list.
 this is because it doesn't contain actual stopwords, but must be used to 
 prevent indexing punctuation... 
 note: we did some refactoring/cleanup on this analyzer recently, so it would 
 be much easier to do this after the next lucene update.
 it has also been moved out of -analyzers.jar due to size, and now builds in 
 its own smartcn jar file, so that would need to be added if this feature is 
 desired.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1438) Timeout distributed query stage get fields

2009-09-16 Thread Jason Rutherglen (JIRA)
Timeout distributed query stage get fields
--

 Key: SOLR-1438
 URL: https://issues.apache.org/jira/browse/SOLR-1438
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


In a distributed query, timeouts work for PURPOSE_GET_TOP_IDS
but we need them for PURPOSE_GET_FIELDS (obtaining the document
data). We'll reuse the timeAllowed parameter and pass it to the
shards during the get fields distributed request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-284) Parsing Rich Document Types

2009-09-16 Thread Chris Harris (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756259#action_12756259
 ] 

Chris Harris commented on SOLR-284:
---

Grant and company: I just noticed that the example solrconfig.xml at the head 
of SVN trunk still uses map, not fmap. (In particular, there's map.content, 
map.a, and map.div.) I assume this should be fixed for the 1.4 release. 
Interestingly, this doesn't seem to make any unit tests fail.

 Parsing Rich Document Types
 ---

 Key: SOLR-284
 URL: https://issues.apache.org/jira/browse/SOLR-284
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Eric Pugh
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
 rich.patch, rich.patch, rich.patch, rich.patch, schema_update.patch, 
 SOLR-284-no-key-gen.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
 SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
 SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, test-files.zip, 
 test-files.zip, test.zip, un-hardcode-id.diff


 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
 that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
 Solr.
 There is a wiki page with information here: 
 http://wiki.apache.org/solr/UpdateRichDocuments
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-284) Parsing Rich Document Types

2009-09-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756266#action_12756266
 ] 

Yonik Seeley commented on SOLR-284:
---

bq. example solrconfig.xml at the head of SVN trunk still uses map, not fmap.

Thanks, I just fixed this.

 Parsing Rich Document Types
 ---

 Key: SOLR-284
 URL: https://issues.apache.org/jira/browse/SOLR-284
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Eric Pugh
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
 rich.patch, rich.patch, rich.patch, rich.patch, schema_update.patch, 
 SOLR-284-no-key-gen.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
 SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
 SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, test-files.zip, 
 test-files.zip, test.zip, un-hardcode-id.diff


 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
 that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
 Solr.
 There is a wiki page with information here: 
 http://wiki.apache.org/solr/UpdateRichDocuments
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1432) FunctionQueries aren't correctly weighted

2009-09-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1432:
---

Attachment: SOLR-1432.patch

Updated patch with tests that fail w/o correct weighting behavior.

 FunctionQueries aren't correctly weighted
 -

 Key: SOLR-1432
 URL: https://issues.apache.org/jira/browse/SOLR-1432
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 1.4

 Attachments: SOLR-1432.patch, SOLR-1432.patch


 Nested queries in function queries aren't weighted correctly with the proper 
 Searcher, and this is now even more serious with per-segment searching in 
 Lucene/Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1439) Enhance PollInterval for Java Replication

2009-09-16 Thread Bill Bell (JIRA)
Enhance PollInterval for Java Replication
-

 Key: SOLR-1439
 URL: https://issues.apache.org/jira/browse/SOLR-1439
 Project: Solr
  Issue Type: New Feature
  Components: replication (java)
 Environment: ALL
Reporter: Bill Bell
 Fix For: 1.4


I am not a huge fan of PollInterval. It would be great to add an option to get 
the Index based on exact time: PollTime=*/15 * * * * That would run at every 
15 minutes based on the clock. i.e. 1:00pm, 1:15pm, 1:30pm, 1:45pm, etc. All my 
slaves are sync'd using NTP, so this would work better. Since each slave starts 
differently, we cannot set the PollInterval=00:15:00 since they would get 
different indexes based on when they start. The other option would be to 
suspend polling - and start - which would be very manual I guess. Setting the 
PollInterval to 10 seconds would be getting a new index when the old one is 
still warming up. Even 10 seconds interval would not be good, since we get so 
many updates, each server would have different indexes. With Snap we don't have 
this issue.

We get SOLR updates frequently and since they are large we cannot wait to do a 
commit at the 15 minute mark using cron. Optimize just takes too long.

On our system we need to limit how often the slaves get the new index. We would 
like all slaves to get the index at the same time.

From Noble Paul:
The default pollInterval can behave the way you want (so that the fetches are 
synchronized in time by the clock). Raise a separate issue and we can fix it


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1439) Enhance PollInterval for Java Replication

2009-09-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756352#action_12756352
 ] 

Noble Paul edited comment on SOLR-1439 at 9/16/09 8:41 PM:
---

isn't it same as SOLR-1435?

  was (Author: noble.paul):
isn't it same as SOLR-1431?
  
 Enhance PollInterval for Java Replication
 -

 Key: SOLR-1439
 URL: https://issues.apache.org/jira/browse/SOLR-1439
 Project: Solr
  Issue Type: New Feature
  Components: replication (java)
 Environment: ALL
Reporter: Bill Bell
 Fix For: 1.4


 I am not a huge fan of PollInterval. It would be great to add an option to 
 get the Index based on exact time: PollTime=*/15 * * * * That would run at 
 every 15 minutes based on the clock. i.e. 1:00pm, 1:15pm, 1:30pm, 1:45pm, 
 etc. All my slaves are sync'd using NTP, so this would work better. Since 
 each slave starts differently, we cannot set the PollInterval=00:15:00 
 since they would get different indexes based on when they start. The other 
 option would be to suspend polling - and start - which would be very manual I 
 guess. Setting the PollInterval to 10 seconds would be getting a new index 
 when the old one is still warming up. Even 10 seconds interval would not be 
 good, since we get so many updates, each server would have different indexes. 
 With Snap we don't have this issue.
 We get SOLR updates frequently and since they are large we cannot wait to do 
 a commit at the 15 minute mark using cron. Optimize just takes too long.
 On our system we need to limit how often the slaves get the new index. We 
 would like all slaves to get the index at the same time.
 From Noble Paul:
 The default pollInterval can behave the way you want (so that the fetches are 
 synchronized in time by the clock). Raise a separate issue and we can fix it

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1439) Enhance PollInterval for Java Replication

2009-09-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756352#action_12756352
 ] 

Noble Paul commented on SOLR-1439:
--

isn't it same as SOLR-1431?

 Enhance PollInterval for Java Replication
 -

 Key: SOLR-1439
 URL: https://issues.apache.org/jira/browse/SOLR-1439
 Project: Solr
  Issue Type: New Feature
  Components: replication (java)
 Environment: ALL
Reporter: Bill Bell
 Fix For: 1.4


 I am not a huge fan of PollInterval. It would be great to add an option to 
 get the Index based on exact time: PollTime=*/15 * * * * That would run at 
 every 15 minutes based on the clock. i.e. 1:00pm, 1:15pm, 1:30pm, 1:45pm, 
 etc. All my slaves are sync'd using NTP, so this would work better. Since 
 each slave starts differently, we cannot set the PollInterval=00:15:00 
 since they would get different indexes based on when they start. The other 
 option would be to suspend polling - and start - which would be very manual I 
 guess. Setting the PollInterval to 10 seconds would be getting a new index 
 when the old one is still warming up. Even 10 seconds interval would not be 
 good, since we get so many updates, each server would have different indexes. 
 With Snap we don't have this issue.
 We get SOLR updates frequently and since they are large we cannot wait to do 
 a commit at the 15 minute mark using cron. Optimize just takes too long.
 On our system we need to limit how often the slaves get the new index. We 
 would like all slaves to get the index at the same time.
 From Noble Paul:
 The default pollInterval can behave the way you want (so that the fetches are 
 synchronized in time by the clock). Raise a separate issue and we can fix it

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1440) DIH:LineEntityprocessor does not reinitialize the reader after init

2009-09-16 Thread Noble Paul (JIRA)
DIH:LineEntityprocessor does not reinitialize the reader after init
---

 Key: SOLR-1440
 URL: https://issues.apache.org/jira/browse/SOLR-1440
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.4


instead of just closing the reader it should also be set to null;

see the mail thread 
http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-to25476443.html#a25476443

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1440) DIH:LineEntityprocessor does not reinitialize the reader after init

2009-09-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1440:
-

Attachment: SOLR-1440.patch

 DIH:LineEntityprocessor does not reinitialize the reader after init
 ---

 Key: SOLR-1440
 URL: https://issues.apache.org/jira/browse/SOLR-1440
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1440.patch


 instead of just closing the reader it should also be set to null;
 see the mail thread 
 http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-to25476443.html#a25476443

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1437) DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.

2009-09-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756358#action_12756358
 ] 

Noble Paul commented on SOLR-1437:
--

for any normal event , parser.next(); should be called in each iteration. But 
for CDATA it should not do so because handling of CDATA itself would have 
consumed the next event

 DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.
 

 Key: SOLR-1437
 URL: https://issues.apache.org/jira/browse/SOLR-1437
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Fergus McMenemie
Priority: Minor
 Fix For: 1.5

   Original Estimate: 672h
  Remaining Estimate: 672h

 As per 
 http://www.nabble.com/Re%3A-Extract-info-from-parent-node-during-data-import-%28redirect%3A%29-td25471162.html
  it would be nice to be able to use expressions such as //tagname when 
 parsing XML documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr API

2009-09-16 Thread Asish Kumar Mohanty
Hi,

I am writing this code but still not getting the command properly.

SolrServer solr = new
CommonsHttpSolrServer(http://localhost:8080/solr/db;);



SolrQuery q = new
SolrQuery().setParam(qt,/dataimport).setParam(command,full-import);


QueryResponse response = solr.query(q);

System.out.println(*response is**  + response);

}

- Original Message - 
From: Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
To: solr-dev@lucene.apache.org
Sent: Tuesday, September 15, 2009 12:01 PM
Subject: Re: Solr API


SolrQuery q = new
SolrQuery().setParam(qt,/dataimport).setParam(command,
full-import);
solrServer.query(q);

On Tue, Sep 15, 2009 at 11:32 AM, Asish Kumar Mohanty
amoha...@del.aithent.com wrote:
 Hi Sir,
 still facing problem..

 i cannot understand how to provide the command
 http://localhost:8983/solr/db/dataimport?command=full-import..

 can anybody plz help me out???
 - Original Message -
 From: Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 To: solr-dev@lucene.apache.org
 Sent: Monday, September 14, 2009 5:26 PM
 Subject: Re: Solr API


 SolrJ can be used to make any name value request to Solr.
 use the SolrQuery#set(name,val)



 On Mon, Sep 14, 2009 at 4:47 PM, Asish Kumar Mohanty
 amoha...@del.aithent.com wrote:
 Yes Sir..

 SolrJ API...



 Regards
 Asish

 - Original Message -
 From: Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 To: solr-dev@lucene.apache.org
 Sent: Monday, September 14, 2009 4:40 PM
 Subject: Re: Solr API


 did you mean SolrJ API?

 On Mon, Sep 14, 2009 at 4:15 PM, Asish Kumar Mohanty
 amoha...@del.aithent.com wrote:
  Hi,
 
  I just want to write a Solr API for full-import. Can anybody please
 help
 me
  out???
 
  It's very urgent.
 
  Regards
  Asish
 
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com







 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com






-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com