[jira] Commented: (SOLR-896) Solr Query Parser Plugin for Mark Miller's Qsol Parser

2010-03-27 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850508#action_12850508
 ] 

Otis Gospodnetic commented on SOLR-896:
---

This looks super straight forward.  The only problem is that Qsol itself seems 
to be gone.

Mark, any way you can put Qsol somewhere?  Maybe just attach the Jar to this 
issue?

 Solr Query Parser Plugin for Mark Miller's Qsol Parser
 --

 Key: SOLR-896
 URL: https://issues.apache.org/jira/browse/SOLR-896
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Chris Harris
 Attachments: SOLR-896.patch, SOLR-896.patch


 An extremely basic plugin to get the Qsol query parser 
 (http://www.myhardshadow.com/qsol.php) working in Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: lucene and solr trunk

2010-03-17 Thread Otis Gospodnetic
+1 for this structure and this set of steps.
 Otis




- Original Message 
 From: Chris Hostetter hossman_luc...@fucit.org
 To: solr-dev@lucene.apache.org
 Sent: Tue, March 16, 2010 6:46:19 PM
 Subject: Re: lucene and solr trunk
 
 : Otis, yes, I think so, eventually.  But that's gonna take much more 
 discussion.
: 
: I don't think this initial cutover should try to solve 
 how modules
: will be organized, yet... we'll get there, 
 eventually.

But we should at least consider it, and not move in a 
 direction that's 
distinct from the ultimate goal of better refactoring 
 (especailly since 
that was one of the main goals of unifying development 
 efforts)

Here's my concrete suggestion that could be done today (for 
 simplicity: 
$svn = 
 target=_blank https://svn.apache.org/repos/asf/lucene)...

  svn 
 mv $svn/java/trunk $svn/java/tmp-migration
  svn mkdir 
 $svn/java/trunk
  svn mv $svn/solr/trunk $svn/java/trunk/solr
  
 svn mv $svn/java/tmp-migration $svn/java/trunk/core

At which 
 point:

0. People who want to work only on Lucene-Java can start 
 checking out 
$svn/java/trunk/core (i'm pretty sure existing checkouts will 
 continue to 
work w/o any changes, the svn info should just update 
 itself)

1. build files can be added to (the new) $svn/java/trunk to build 
 ./core 
followed by ./solr

2. the build files in $svn/java/trunk/solr 
 can be modified to look at 
../core/ to find lucene jars

3. people who 
 care about Solr (including all committers) should start 
checking out and 
 building all of $svn/java/trunk

4. Long term, we could choose to branch 
 all of $svn/java/trunk 
for releases ... AND/OR we could choose to branch 
 specific modules 
(ie: solr) independently (with modifications to the build 
 files on those 
branches to pull in their dependencies from alternate 
 locations

5. Long term, we can start refactoring additional modules out 
 of 
$svn/java/trunk/solr and $svn/java/trunk/core (like 
 
$svn/java/trunk/core/contrib) into their own directory in 
 $svn/java/trunk

6. Long term, people who want to work on more then just 
 core but don't 
care about certain modules (like solr) can do a simple 
 non-recursive 
checkout of $svn/java/trunk and then do full checkouts of 
 whatever modules 
they care about


(Please note: I'm just trying to 
 list things we *could* do if we go this 
route, i'm not advocating that we 
 *should* do any of these things)

I can't think of any objections people 
 have raised to any of the previous 
suggestions which apply to this 
 suggestion.  Is there anything people can 
think of that would be 
 useful, but not possible, if we go this route?


-Hoss


[jira] Commented: (SOLR-1822) SEVERE: Unable to move index file from: tempfile to: indexfile

2010-03-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846099#action_12846099
 ] 

Otis Gospodnetic commented on SOLR-1822:


When Solr starts, doesn't it create the index directory?  If so, this patch is 
not needed, unless we want to make sure replication succeeds even if 
someone/something removes the whole index directory on a slave after the slave 
had already started.

Is this rally needed?

 SEVERE: Unable to move index file from: tempfile to: indexfile
 --

 Key: SOLR-1822
 URL: https://issues.apache.org/jira/browse/SOLR-1822
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: Linux, JDK6,SOLR 1.4
Reporter: wyhw whon
Priority: Critical
 Fix For: 1.5

 Attachments: SnapPuller.patch


 SOLR index directory remvoed,but do not know what the reasons for this.
 I add some codes on SnapPuller.java 577 line can reslove this bug.
 line 576   
 File indexFileInIndex = new File(indexDir, fname);
 +
 if (!indexDir.exists()) indexDir.mkdir();
 boolean success = indexFileInTmpDir.renameTo(indexFileInIndex);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1375) BloomFilter on a field

2010-03-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846139#action_12846139
 ] 

Otis Gospodnetic commented on SOLR-1375:


Heh, with the Lucene/Solr merge taking place now, my previous comment above 
makes even more sense.  What do you think?

 BloomFilter on a field
 --

 Key: SOLR-1375
 URL: https://issues.apache.org/jira/browse/SOLR-1375
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch, 
 SOLR-1375.patch, SOLR-1375.patch

   Original Estimate: 120h
  Remaining Estimate: 120h

 * A bloom filter is a read only probabilistic set. Its useful
 for verifying a key exists in a set, though it returns false
 positives. http://en.wikipedia.org/wiki/Bloom_filter 
 * The use case is indexing in Hadoop and checking for duplicates
 against a Solr cluster (which when using term dictionary or a
 query) is too slow and exceeds the time consumed for indexing.
 When a match is found, the host, segment, and term are returned.
 If the same term is found on multiple servers, multiple results
 are returned by the distributed process. (We'll need to add in
 the core name I just realized). 
 * When new segments are created, and commit is called, a new
 bloom filter is generated from a given field (default:id) by
 iterating over the term dictionary values. There's a bloom
 filter file per segment, which is managed on each Solr shard.
 When segments are merged away, their corresponding .blm files is
 also removed. In a future version we'll have a central server
 for the bloom filters so we're not abusing the thread pool of
 the Solr proxy and the networking of the Solr cluster (this will
 be done sooner than later after testing this version). I held
 off because the central server requires syncing the Solr
 servers' files (which is like reverse replication). 
 * The patch uses the BloomFilter from Hadoop 0.20. I want to jar
 up only the necessary classes so we don't have a giant Hadoop
 jar in lib.
 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html
 * Distributed code is added and seems to work, I extended
 TestDistributedSearch to test over multiple HTTP servers. I
 chose this approach rather than the manual method used by (for
 example) TermVectorComponent.testDistributed because I'm new to
 Solr's distributed search and wanted to learn how it works (the
 stages are confusing). Using this method, I didn't need to setup
 multiple tomcat servers and manually execute tests.
 * We need more of the bloom filter options passable via
 solrconfig
 * I'll add more test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: lucene and solr trunk

2010-03-16 Thread Otis Gospodnetic
Hi,

Check out the dir structure mentioned here: 
http://markmail.org/message/gwpmaevw7tavqqge

Isn't that what we want?

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: Mark Miller markrmil...@gmail.com
 To: solr-dev@lucene.apache.org
 Sent: Mon, March 15, 2010 11:43:48 PM
 Subject: Re: lucene and solr trunk
 
 On 03/15/2010 11:28 PM, Yonik Seeley wrote:
 So, we have a few options on 
 where to put Solr's new trunk:


 Solr moves to Lucene's 
 trunk:
/java/trunk, /java/trunk/sol
+1. With the goal of 
 merged dev, merged tests, this looks the best to 
me. Simple to do patches 
 that span both, simple to setup
Solr to use Lucene trunk rather than jars. 
 Short paths. Simple. I like it.

-- 
- Mark


 href=http://www.lucidimagination.com; target=_blank 
 http://www.lucidimagination.com


[jira] Commented: (SOLR-1553) extended dismax query parser

2010-03-01 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839907#action_12839907
 ] 

Otis Gospodnetic commented on SOLR-1553:


What does u in uf stand for?

 extended dismax query parser
 

 Key: SOLR-1553
 URL: https://issues.apache.org/jira/browse/SOLR-1553
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Fix For: 1.5

 Attachments: edismax.unescapedcolon.bug.test.patch, 
 edismax.userFields.patch, SOLR-1553.patch, SOLR-1553.pf-refactor.patch


 An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1375) BloomFilter on a field

2010-02-25 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838446#action_12838446
 ] 

Otis Gospodnetic commented on SOLR-1375:


{quote}
When new segments are created, and commit is called, a new
bloom filter is generated from a given field (default:id) by
iterating over the term dictionary values. There's a bloom
filter file per segment, which is managed on each Solr shard.
When segments are merged away, their corresponding .blm files is
also removed. 
{quote}

Doesn't this hint at some of this stuff (haven't looked at the patch) really 
needing to live in Lucene index segment files merging land?


 BloomFilter on a field
 --

 Key: SOLR-1375
 URL: https://issues.apache.org/jira/browse/SOLR-1375
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch, 
 SOLR-1375.patch, SOLR-1375.patch

   Original Estimate: 120h
  Remaining Estimate: 120h

 * A bloom filter is a read only probabilistic set. Its useful
 for verifying a key exists in a set, though it returns false
 positives. http://en.wikipedia.org/wiki/Bloom_filter 
 * The use case is indexing in Hadoop and checking for duplicates
 against a Solr cluster (which when using term dictionary or a
 query) is too slow and exceeds the time consumed for indexing.
 When a match is found, the host, segment, and term are returned.
 If the same term is found on multiple servers, multiple results
 are returned by the distributed process. (We'll need to add in
 the core name I just realized). 
 * When new segments are created, and commit is called, a new
 bloom filter is generated from a given field (default:id) by
 iterating over the term dictionary values. There's a bloom
 filter file per segment, which is managed on each Solr shard.
 When segments are merged away, their corresponding .blm files is
 also removed. In a future version we'll have a central server
 for the bloom filters so we're not abusing the thread pool of
 the Solr proxy and the networking of the Solr cluster (this will
 be done sooner than later after testing this version). I held
 off because the central server requires syncing the Solr
 servers' files (which is like reverse replication). 
 * The patch uses the BloomFilter from Hadoop 0.20. I want to jar
 up only the necessary classes so we don't have a giant Hadoop
 jar in lib.
 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html
 * Distributed code is added and seems to work, I extended
 TestDistributedSearch to test over multiple HTTP servers. I
 chose this approach rather than the manual method used by (for
 example) TermVectorComponent.testDistributed because I'm new to
 Solr's distributed search and wanted to learn how it works (the
 stages are confusing). Using this method, I didn't need to setup
 multiple tomcat servers and manually execute tests.
 * We need more of the bloom filter options passable via
 solrconfig
 * I'll add more test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1788) Remove duplicate field in schema.xml

2010-02-25 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-1788.


Resolution: Won't Fix

Please email questions to solr-user list.

 Remove duplicate field in schema.xml
 

 Key: SOLR-1788
 URL: https://issues.apache.org/jira/browse/SOLR-1788
 Project: Solr
  Issue Type: New Feature
Reporter: Bill Bell

 Is there a way to remove duplicates in a multiValue field? For example if I 
 add the following - is there a way to remove the duplicates? If not directly 
 in schema.xml how about in DIH?
 arr name=options
 strFull Bathrooms = 2/str
 strBedrooms = 2/str
 strBedrooms = 2/str
 strFull Bathrooms = 2/str
 strProperty Address = Orange,92805/str
 strProperty Type = Apartments/str
 /arr
 This would be changed to:
 arr name=options
 strBedrooms = 2/str
 strFull Bathrooms = 2/str
 strProperty Address = Orange,92805/str
 strProperty Type = Apartments/str
 /arr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1719) stock TokenFilterFactory for flattening positions

2010-01-19 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802342#action_12802342
 ] 

Otis Gospodnetic commented on SOLR-1719:


Does PositionFilterFactory  fix the problem?

 stock TokenFilterFactory for flattening positions
 -

 Key: SOLR-1719
 URL: https://issues.apache.org/jira/browse/SOLR-1719
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man

 People seem to occasionally be confused by why certain inputs result in 
 PhraseQueries instead of BooleanQueries...
 http://old.nabble.com/Understanding-the-query-parser-to27071483.html
 http://old.nabble.com/Tokenizer-question-to27099119.html
 ...it would probably be handy if there was a TokenFilterFactory provided out 
 of the box that just set the positionIncrement of every token to 0 to deal 
 with situations where people don't care about term positions at query time, 
 and are just using tokenization/analysis as a way to split up some input 
 string into multiple SHOULD clauses for a BooleanQuery

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-577) added support for boosting fields and documents to python solr interface

2010-01-15 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-577.
---

Resolution: Won't Fix

Closing per comment.

 added support for boosting fields and documents to python solr interface
 

 Key: SOLR-577
 URL: https://issues.apache.org/jira/browse/SOLR-577
 Project: Solr
  Issue Type: Improvement
  Components: clients - python
 Environment: linux, python
Reporter: Rob Young
 Attachments: solr.py


 Added the ability to set boosts on fields and documents when indexing. This 
 is done through two new classes solr.Document and solr.Field
 c = solr.SolrConnection(host='localhost:8081')
 c.add(id='123', name=solr.Field('this is a field', boost=1.5))
 doc = solr.Document(boost=1.5)
 doc.add(solr.Field(name='title', value=a value for my field, boost=1.1))
 c.addDoc(doc)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-216) Improvements to solr.py

2010-01-15 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-216.
---

Resolution: Won't Fix

Closing per comment

 Improvements to solr.py
 ---

 Key: SOLR-216
 URL: https://issues.apache.org/jira/browse/SOLR-216
 Project: Solr
  Issue Type: Improvement
  Components: clients - python
Affects Versions: 1.2
Reporter: Jason Cater
Assignee: Mike Klaas
Priority: Trivial
 Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, 
 solr.py, test_all.py


 I've taken the original solr.py code and extended it to include higher-level 
 functions.
   * Requires python 2.3+
   * Supports SSL (https://) schema
   * Conforms (mostly) to PEP 8 -- the Python Style Guide
   * Provides a high-level results object with implicit data type conversion
   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-758) Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies.

2010-01-15 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800816#action_12800816
 ] 

Otis Gospodnetic commented on SOLR-758:
---

I this still needed with enhanced dismax now available?


 Enhance DisMaxQParserPlugin to support full-Solr syntax and to support 
 alternate escaping strategies.
 -

 Key: SOLR-758
 URL: https://issues.apache.org/jira/browse/SOLR-758
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3
Reporter: David Smiley
 Fix For: 1.5

 Attachments: AdvancedQParserPlugin.java, AdvancedQParserPlugin.java, 
 DisMaxQParserPlugin.java, DisMaxQParserPlugin.java, UserQParser.java, 
 UserQParser.java, UserQParser.java-umlauts.patch


 The DisMaxQParserPlugin has a variety of nice features; chief among them is 
 that is uses the DisjunctionMaxQueryParser.  However it imposes limitations 
 on the syntax.  
 I've enhanced the DisMax QParser plugin to use a pluggable query string 
 re-writer (via subclass extension) instead of hard-coding the logic currently 
 embedded within it (i.e. the escape nearly everything logic). Additionally, 
 I've made this QParser have a notion of a simple syntax (the default) or 
 non-simple in which case some of the logic in this QParser doesn't occur 
 because it's irrelevant (phrase boosting and min-should-max in particular). 
 As part of my work I significantly moved the code around to make it clearer 
 and more extensible.  I also chose to rename it to suggest it's role as a 
 parser for user queries.
 Attachment to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-12-21 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793300#action_12793300
 ] 

Otis Gospodnetic commented on SOLR-773:
---

Dave - useful, thanks!
Do you think creating/editing a Wiki page with this information would be good?
See: http://wiki.apache.org/solr/LocalSolr


 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, 
 lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, 
 solrGeoQuery.tar, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1632) Distributed IDF

2009-12-11 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789379#action_12789379
 ] 

Otis Gospodnetic commented on SOLR-1632:


I didn't look a the patch, but from your comments it looks like you already 
have that 1 merged big idf map, which is really what I was aiming at, so 
that's good!

I was just thinking that this map (file) would be periodically updated and 
pushed to slaves, so that slaves can compute the global IDF *locally* instead 
of any kind of extra requests.


 Distributed IDF
 ---

 Key: SOLR-1632
 URL: https://issues.apache.org/jira/browse/SOLR-1632
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.5
Reporter: Andrzej Bialecki 
 Attachments: distrib.patch


 Distributed IDF is a valuable enhancement for distributed search across 
 non-uniform shards. This issue tracks the proposed implementation of an API 
 to support this functionality in Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1632) Distributed IDF

2009-12-10 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789120#action_12789120
 ] 

Otis Gospodnetic commented on SOLR-1632:


What about this approach: http://markmail.org/message/mjfmpzfspguepixx ?

 Distributed IDF
 ---

 Key: SOLR-1632
 URL: https://issues.apache.org/jira/browse/SOLR-1632
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.5
Reporter: Andrzej Bialecki 
 Attachments: distrib.patch


 Distributed IDF is a valuable enhancement for distributed search across 
 non-uniform shards. This issue tracks the proposed implementation of an API 
 to support this functionality in Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-03 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785694#action_12785694
 ] 

Otis Gospodnetic commented on SOLR-1277:


How about this idea for the what to do with the default core name.
What if the default/empty-named core always pointed to the Solr admin/dashboard 
page, something that shows all the info about the system (pulled from ZK)?


 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1553) extended dismax query parser

2009-11-11 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776661#action_12776661
 ] 

Otis Gospodnetic commented on SOLR-1553:


I think you need to click on Issue Links link, delete, and re-link.

I have a feeling once this is in, people won't need the original dismax.

Yonik, did you mean to attach a patch, but forgot?

 extended dismax query parser
 

 Key: SOLR-1553
 URL: https://issues.apache.org/jira/browse/SOLR-1553
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Fix For: 1.5


 An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1550) statistics for request handlers should report std dev

2009-11-09 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775163#action_12775163
 ] 

Otis Gospodnetic commented on SOLR-1550:


Haven't tried the patch yet, just had a quick look at it . in a browser.  
It looks like it has tabs? (should be replaced by 2 spaces)
Thanks!


 statistics for request handlers should report std dev
 -

 Key: SOLR-1550
 URL: https://issues.apache.org/jira/browse/SOLR-1550
 Project: Solr
  Issue Type: Improvement
Reporter: Mike Anderson
Priority: Trivial
 Attachments: SOLR-1550.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1537) Dedupe Sharded Search Results by Shard Order or Score

2009-11-05 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774053#action_12774053
 ] 

Otis Gospodnetic commented on SOLR-1537:


The ID here being the uniqueKey?  i.e. the use case is the removal of dupes 
when the same document is indexed in multiple shards and more than 1 shard 
return that document in the result set?


 Dedupe Sharded Search Results by Shard Order or Score
 -

 Key: SOLR-1537
 URL: https://issues.apache.org/jira/browse/SOLR-1537
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4, 1.5
 Environment: All
Reporter: Dennis Kubes
 Fix For: 1.5

 Attachments: solr-dedupe-20091031-2.patch, solr-dedupe-20091031.patch


 Allows sharded search results to dedupe results by ID based on either the 
 order of the shards in the shards param or by score.  Allows the result 
 returned to be deterministic.  If by shards then shards that appear first in 
 the shards param have a higher precedence than shards that appear later.  If 
 by score then higher scores beat out lower scores.  This doesn't allow 
 multiple duplicates because currently SOLR only permits a single result by ID 
 to be returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1536) Support for TokenFilters that may modify input documents

2009-11-05 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774057#action_12774057
 ] 

Otis Gospodnetic commented on SOLR-1536:


Is this better than writing a custom UpdateRequestProcessor that takes the 
value of the incoming SolrInputDocument (SID), does something to it, removes 
the original field, and adds the modified version back to SID?


 Support for TokenFilters that may modify input documents
 

 Key: SOLR-1536
 URL: https://issues.apache.org/jira/browse/SOLR-1536
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Affects Versions: 1.5
Reporter: Andrzej Bialecki 
 Attachments: altering.patch


 In some scenarios it's useful to be able to create or modify fields in the 
 input document based on analysis of other fields of this document. This need 
 arises e.g. when indexing multilingual documents, or when doing NLP 
 processing such as NER. However, currently this is not possible to do.
 This issue provides an implementation of this functionality that consists of 
 the following parts:
 * DocumentAlteringFilterFactory - abstract superclass that indicates that 
 TokenFilter-s created from this factory may modify fields in a 
 SolrInputDocument.
 * TypeAsFieldFilterFactory - example implementation that illustrates this 
 concept, with a JUnit test.
 * DocumentBuilder modifications to support this functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Avro in Solr

2009-11-03 Thread Otis Gospodnetic
Hello,

Avro is still young, from what I know, but I'm wondering if anyone has any 
thoughts on whether there is a place or need for Avro in Solr?

http://www.cloudera.com/blog/2009/11/02/avro-a-format-for-big-data/


Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



Re: Avro in Solr

2009-11-03 Thread Otis Gospodnetic
I don't know yet.

Otis


- Original Message 
 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
 To: solr-dev@lucene.apache.org
 Sent: Tue, November 3, 2009 9:58:38 PM
 Subject: Re: Avro in Solr
 
 Structured formats have  a lot of limitations when it comes to solr.
 The number and name of fields in any document is completely arbitrary
 in Solr. Is it possible to represent such a datastructure in avro?
 
 On Wed, Nov 4, 2009 at 3:43 AM, Otis Gospodnetic
 wrote:
  Hello,
 
  Avro is still young, from what I know, but I'm wondering if anyone has any 
 thoughts on whether there is a place or need for Avro in Solr?
 
  http://www.cloudera.com/blog/2009/11/02/avro-a-format-for-big-data/
 
 
  Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



[jira] Resolved: (SOLR-1541) lowering ranking of certain documents while search

2009-11-03 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-1541.


Resolution: Invalid

 lowering ranking of certain documents while search
 --

 Key: SOLR-1541
 URL: https://issues.apache.org/jira/browse/SOLR-1541
 Project: Solr
  Issue Type: Wish
  Components: search
Reporter: arvind

 The requirement is as below:
 Suppose, there are some documents already stored in Solr. These 
 documents/records belong to various sources like, source1, source2 etc 
 (stored in 'Source' Solr field). Now, when user searches for documents 
 (simple text search) then, is there any possibilities in Solr so that results 
 of certain sources always come with lower rank? (ie, such sources always come 
 in trailing pages).
 I believe, there should be some way for this in functional query but not sure!
 Any help on this is greately appreciated.
 Thanks in advance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1533) Partition data directories into multiple bucket directories

2009-10-29 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771480#action_12771480
 ] 

Otis Gospodnetic commented on SOLR-1533:


Trying to understand the need for this (I might have missed the discussion on 
the ML?).
Isn't the creator of the core in control of the data dir ( 
http://wiki.apache.org/solr/CoreAdmin#CREATE ) and thus their distribution?
Or is the goal of this to remove the logic and knowledge from the client and 
let Solr control where core's data is going to be placed, depending on the 
core data distribution policy?


 Partition data directories into multiple bucket directories
 -

 Key: SOLR-1533
 URL: https://issues.apache.org/jira/browse/SOLR-1533
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Shalin Shekhar Mangar
 Fix For: 1.5


 Provide a way to partition data directories into multiple bucket 
 directories. For example, instead of creating 10,000 data directories inside 
 one base data directory, Solr can assign a core to one of 4 base directories, 
 thereby distributing them.
 The underlying problem is that with large number of indexes, we see slower 
 and slower system performance as one goes on increasing the number of cores, 
 thereby increasing the number of directories in the single data directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1335) load core properties from a properties file

2009-08-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743310#action_12743310
 ] 

Otis Gospodnetic commented on SOLR-1335:


Mind including an example properties file, so we can see what's in it?

 load core properties from a properties file
 ---

 Key: SOLR-1335
 URL: https://issues.apache.org/jira/browse/SOLR-1335
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1335.patch, SOLR-1335.patch, SOLR-1335.patch


 There are  few ways of loading properties in runtime,
 # using env property using in the command line
 # if you use a multicore drop it in the solr.xml
 if not , the only way is to  keep separate solrconfig.xml for each instance.  
 #1 is error prone if the user fails to start with the correct system 
 property. 
 In our case we have four different configurations for the same deployment  . 
 And we have to disable replication of solrconfig.xml. 
 It would be nice if I can distribute four properties file so that our ops can 
 drop  the right one and start Solr. Or it is possible for the operations to 
 edit a properties file  but it is risky to edit solrconfig.xml if he does not 
 understand solr
 I propose a properties file in the instancedir as solrcore.properties . If 
 present would be loaded and added as core specific properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler

2009-08-03 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738793#action_12738793
 ] 

Otis Gospodnetic commented on SOLR-1274:


Try:
{code}
if (text.equals(extractFormat)) {
{code}

:)


 Provide multiple output formats in extract-only mode for tika handler
 -

 Key: SOLR-1274
 URL: https://issues.apache.org/jira/browse/SOLR-1274
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1274.patch


 The proposed feature is to accept a URL parameter when using extract-only 
 mode to specify an output format.  This parameter might just overload the 
 existing ext.extract.only so that one can optionally specify a format, e.g. 
 false|true|xml|text  where true and xml give the same response (i.e. xml 
 remains the default)
 I had been assuming that I could choose among possible tika output
 formats when using the extracting request handler in extract-only mode
 as if from the CLI with the tika jar:
-x or --xmlOutput XHTML content (default)
-h or --html   Output HTML content
-t or --text   Output plain text content
-m or --metadata   Output only metadata
 However, looking at the docs and source, it seems that only the xml
 option is available (hard-coded) in ExtractingDocumentLoader.java
 {code}
 serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, 
 true));
 {code}
 Providing at least a plain-text response seems to work if you change the 
 serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: trie fields default in example schema

2009-08-02 Thread Otis Gospodnetic
Would it make sense to instead add new tint(eger) type instead of renaming 
integer to pinteger? (thinking about people upgrading to Solr 1.4).

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR




- Original Message 
 From: Yonik Seeley yo...@lucidimagination.com
 To: solr-dev@lucene.apache.org
 Sent: Sunday, August 2, 2009 3:01:09 PM
 Subject: trie fields default in example schema
 
 I'm working on a jumbo trie patch (just many smaller trie related
 issues at once) - SOLR-1288.
 
 Anyway, I think support will be good enough for 1.4 that we should
 make types like integer in the example schema be based on the trie
 fields.  The current integer fields should be renamed to pinteger
 (for plain integer), and have a recommended use only for compatibility
 with other/older indexes.  People have mistakenly used the plain
 integer in the past based on the name, so I think we should fix the
 naming.
 
 The trie based fields should have lower memory footprint in the
 fieldcache and are faster for a lookup (the only reason to use plain
 ints in the past)... sint uses StringIndex for historical reasons - we
 had no other option... we could upgrade the existing sint fields, but
 it wouldn't be quite 100% compatible and there's little reason since
 we have the trie fields now.
 
 -Yonik
 http://www.lucidimagination.com



[jira] Commented: (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2009-08-01 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737838#action_12737838
 ] 

Otis Gospodnetic commented on SOLR-1293:


Do you have any thoughts on handling the situation where each core belongs to a 
different party and each party has access *only* to its own core via Solr Admin 
(i.e. doesn't see all the other cores hosted by the instance)?  Only the 
privileged administrator user can see and access all cores.

Have you done any work in on this or is this on your TODO?


 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2009-08-01 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737844#action_12737844
 ] 

Otis Gospodnetic commented on SOLR-1293:


OK, thanks.
When you go to your Solr Admin page today, it lists all cores, even if there 
are 1 of them?


 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-08-01 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic reassigned SOLR-908:
-

Assignee: Shalin Shekhar Mangar

I won't get to it before going on vacation.  Assigning to you if you want it.

 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, 
 SOLR-908.patch, SOLR-908.patch, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1305) Notification based replication instead of polling

2009-07-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734498#action_12734498
 ] 

Otis Gospodnetic commented on SOLR-1305:


With a little help from Zookeeper, is that the plan?

 Notification based replication instead of polling
 -

 Key: SOLR-1305
 URL: https://issues.apache.org/jira/browse/SOLR-1305
 Project: Solr
  Issue Type: New Feature
  Components: replication (java)
Reporter: Noble Paul
 Fix For: 1.5


 Currently the only way for the slave to know about the availability of of new 
 commit points is by polling. This means slaves should 'poll'  very frequently 
  to ensure that it gets the commit point immediately. if the changes to the 
 master is less frequent, then this can be an unnecessary overhead. If would 
 be nice if the slave can register itself with the master for notification on 
 availability of new changes. After receiving the notification , the slave can 
 trigger a poll and do what it does now.  This may require SOLR-727 so that 
 the slave can register its url with the master

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1303) Wildcard queries on fields with LowerCaseFilterFactory not being lowercased.

2009-07-22 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-1303.


Resolution: Invalid

I think that's due to wildcard queries not being analyzed (and thus lowercased 
to match your indexed tokens).  Explanation is in the Lucene FAQ Wiki page.

 Wildcard queries on fields with LowerCaseFilterFactory not being lowercased.
 

 Key: SOLR-1303
 URL: https://issues.apache.org/jira/browse/SOLR-1303
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Matt Schraeder
Priority: Minor

 I have a field defined as follows:
 fieldType name=keyword class=solr.TextField sortMissingLast=true 
 omitNorms=true
   analyzer
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.TrimFilterFactory /
   /analyzer
 /fieldType
 field name=reviews type=keyword index=true stored=true 
 multiValued=true /
 The data being index is a single letter followed by a space, a +,-,M, or A 
 ... so basically two characters.
 When I do the following queries:
 reviews: K+
 reviews: k+
 I get results as expected. However, when I replace the + in the query with a 
 * or ?, then the uppercase version no longer works, only the lowercase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1304) Make it possible to force replication of at least some of the config files even if the index hasn't changed

2009-07-22 Thread Otis Gospodnetic (JIRA)
Make it possible to force replication of at least some of the config files even 
if the index hasn't changed
---

 Key: SOLR-1304
 URL: https://issues.apache.org/jira/browse/SOLR-1304
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Reporter: Otis Gospodnetic
Priority: Minor
 Fix For: 1.5


From http://markmail.org/thread/vpk2fsjns7u2uopd

Here is a use case:
* Index is mostly static (nightly updates)
* elevate.xml needs to be changed throughout the day
* elevate.xml needs to be pushed to slaves and solr needs to reload it

This is currently not possible because replication will happen only if the index
changed in some way. You can't force a commit to fake index change. So one has
to either:
* add/delete dummy docs on master to force index change
* write an external script that copies the config file to slaves


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1304) Make it possible to force replication of at least some of the config files even if the index hasn't changed

2009-07-22 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734332#action_12734332
 ] 

Otis Gospodnetic commented on SOLR-1304:


From Paul:
+1

We should have a separate attributes in the master other than the standard
str name=confFilesa.xml/str

say

str name=realTimeConfFilesb.xml/str

the files specified in this can be replicated always irrespective of the index


 Make it possible to force replication of at least some of the config files 
 even if the index hasn't changed
 ---

 Key: SOLR-1304
 URL: https://issues.apache.org/jira/browse/SOLR-1304
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Reporter: Otis Gospodnetic
Priority: Minor
 Fix For: 1.5


 From http://markmail.org/thread/vpk2fsjns7u2uopd
 Here is a use case:
 * Index is mostly static (nightly updates)
 * elevate.xml needs to be changed throughout the day
 * elevate.xml needs to be pushed to slaves and solr needs to reload it
 This is currently not possible because replication will happen only if the 
 index
 changed in some way. You can't force a commit to fake index change. So one has
 to either:
 * add/delete dummy docs on master to force index change
 * write an external script that copies the config file to slaves

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1304) Make it possible to force replication of at least some of the config files even if the index hasn't changed

2009-07-22 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734334#action_12734334
 ] 

Otis Gospodnetic commented on SOLR-1304:


Would it make more sense for the caller to specify (in the request) which files 
to replicate, thus giving it full control over what to replicate when?  Maybe 
the realTimeConfFiles should then not list all conf files that should always 
be replicated, but instead list all the conf files that are *allowed* to be 
replicated when the caller request some of them to be replicated?


 Make it possible to force replication of at least some of the config files 
 even if the index hasn't changed
 ---

 Key: SOLR-1304
 URL: https://issues.apache.org/jira/browse/SOLR-1304
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Reporter: Otis Gospodnetic
Priority: Minor
 Fix For: 1.5


 From http://markmail.org/thread/vpk2fsjns7u2uopd
 Here is a use case:
 * Index is mostly static (nightly updates)
 * elevate.xml needs to be changed throughout the day
 * elevate.xml needs to be pushed to slaves and solr needs to reload it
 This is currently not possible because replication will happen only if the 
 index
 changed in some way. You can't force a commit to fake index change. So one has
 to either:
 * add/delete dummy docs on master to force index change
 * write an external script that copies the config file to slaves

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Forcing config replication without index change

2009-07-21 Thread Otis Gospodnetic

OK, I'll create a JIRA for 1.5 tomorrow.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Mark Miller markrmil...@gmail.com
 To: solr-dev@lucene.apache.org
 Sent: Thursday, July 16, 2009 11:37:52 AM
 Subject: Re: Forcing config replication without index change
 
 bq. Shouldn't it be possible to force replication of at least *some* of the
 config files even if the index hasn't changed?
 Indeed. Perhaps another call? forceIndexFetch? it replicates configs whether
 the index has changed or not, but wouldn't replicate the index if it didn't
 need to?
 
 Or a separate call altogether? fetchConfig, that just updates the configs?
 
 
 
 On Thu, Jul 16, 2009 at 3:00 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:
 
 
  Hi,
 
  Shouldn't it be possible to force replication of at least *some* of the
  config files even if the index hasn't changed?
  (see Paul Noble's comment on 
  http://markmail.org/message/hgdwumfuuwixfxvqand 
 the 4-message thread)
 
  Here is a use case:
  * Index is mostly static (nightly updates)
  * elevate.xml needs to be changed throughout the day
  * elevate.xml needs to be pushed to slaves and solr needs to reload it
 
  This is currently not possible because replication will happen only if the
  index changed in some way.  You can't force a commit to fake index change.
   So one has to either:
  * add/delete dummy docs on master to force index change
  * write an external script that copies the config file to slaves
 
 
  Shouldn't it be possible to force replication of at least *some* of the
  config files even if the index hasn't changed?
 
  Thanks,
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 
 -- 
 -- 
 - Mark
 
 http://www.lucidimagination.com



Forcing config replication without index change

2009-07-16 Thread Otis Gospodnetic

Hi,

Shouldn't it be possible to force replication of at least *some* of the config 
files even if the index hasn't changed?
(see Paul Noble's comment on http://markmail.org/message/hgdwumfuuwixfxvq and 
the 4-message thread)

Here is a use case:
* Index is mostly static (nightly updates)
* elevate.xml needs to be changed throughout the day
* elevate.xml needs to be pushed to slaves and solr needs to reload it

This is currently not possible because replication will happen only if the 
index changed in some way.  You can't force a commit to fake index change.  So 
one has to either:
* add/delete dummy docs on master to force index change
* write an external script that copies the config file to slaves


Shouldn't it be possible to force replication of at least *some* of the config 
files even if the index hasn't changed?

Thanks,
Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



[jira] Commented: (SOLR-1041) dataDir is not set relative to instanceDir

2009-07-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731994#action_12731994
 ] 

Otis Gospodnetic commented on SOLR-1041:


I worked around it by using the relative directory in instanceDir instead of 
using the absolute directory.  I think one should able to use either an 
absolute or a relative directory.

If it matter, note that I don't have dataDir in cores' solrconfig.xml files or 
in solr.xml, so Solr uses defaults (data/) for that.


 dataDir is not set relative to instanceDir 
 ---

 Key: SOLR-1041
 URL: https://issues.apache.org/jira/browse/SOLR-1041
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4

 Attachments: SOLR-1041.patch, SOLR-1041.patch


 see the mail thread. http://markmail.org/thread/ebd7vumj3uyzpyt6
 A recent bug fix has broken the feature. Now it is always relative to current 
 working directory for single core

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-07-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731999#action_12731999
 ] 

Otis Gospodnetic commented on SOLR-1275:


Patch looks good to me (also not tested it)

 Add expungeDeletes to DirectUpdateHandler2
 --

 Key: SOLR-1275
 URL: https://issues.apache.org/jira/browse/SOLR-1275
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
Reporter: Jason Rutherglen
Assignee: Noble Paul
Priority: Trivial
 Fix For: 1.4

 Attachments: SOLR-1275.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 expungeDeletes is a useful method somewhat like optimize is offered by 
 IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size

2009-07-15 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-1192.


Resolution: Fixed

Should be taken care of with Lucene upgrade now.

 solr.NGramFilterFactory stops to index the content if it find a token smaller 
 than minim ngram size
 ---

 Key: SOLR-1192
 URL: https://issues.apache.org/jira/browse/SOLR-1192
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.3
 Environment: any
Reporter: viobade
Assignee: Otis Gospodnetic
 Fix For: 1.4


 If a field is split in tokens (by a tokenizer) and after that is aplied the 
 NGramFilterFactory for these tokens...the indexing goes well while the length 
 of the tokens is greater or equal with minim ngram size (ussually is 3). 
 Otherwise the indexing breaks in this point and the rest of tokens  are no 
 more indexed. This behaviour can be easy observed with the analysis tool 
 which is in Solr admin interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size

2009-07-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731082#action_12731082
 ] 

Otis Gospodnetic commented on SOLR-1192:


LUCENE-1491 fix is in Lucene repository now, so as soon as we pull new Lucene 
jars into Solr, I'll mark this as fixed.  Feel free to test with local copies 
of the Lucene nightly jars tomorrow and report back.


 solr.NGramFilterFactory stops to index the content if it find a token smaller 
 than minim ngram size
 ---

 Key: SOLR-1192
 URL: https://issues.apache.org/jira/browse/SOLR-1192
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.3
 Environment: any
Reporter: viobade
Assignee: Otis Gospodnetic
 Fix For: 1.4


 If a field is split in tokens (by a tokenizer) and after that is aplied the 
 NGramFilterFactory for these tokens...the indexing goes well while the length 
 of the tokens is greater or equal with minim ngram size (ussually is 3). 
 Otherwise the indexing breaks in this point and the rest of tokens  are no 
 more indexed. This behaviour can be easy observed with the analysis tool 
 which is in Solr admin interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-862) Solr must declare crypto usage pending SOLR-284

2009-07-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731084#action_12731084
 ] 

Otis Gospodnetic commented on SOLR-862:
---

Did this already happen?

 Solr must declare crypto usage pending SOLR-284
 ---

 Key: SOLR-862
 URL: https://issues.apache.org/jira/browse/SOLR-862
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Blocker
 Fix For: 1.4


 Since Solr will be shipping Tika in 1.4, which uses PDFBox, which uses 
 BouncyCastle, Solr must declare it's Crypto usage per ASF guidelines.
 See http://www.apache.org/dev/crypto.html
 and https://issues.apache.org/jira/browse/NUTCH-621 for references and 
 examples of what to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Using Synonyms is actually narrowing the result set in some cases

2009-07-08 Thread Otis Gospodnetic

Raj, could you please use the solr-user list for this?
When reposting there, please include debugQuery=true output for both queries.



Thanks,
Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: rajkovuru r...@elance.com
 To: solr-dev@lucene.apache.org
 Sent: Tuesday, July 7, 2009 3:41:13 PM
 Subject: Using Synonyms is actually narrowing the result set in some cases
 
 
 Hi, 
 
 I recently introduced a small set of synonyms to be expanded at query time.
 I didn't and would not want to modify the index so applied synonyms to
 query. 
 
 Synonyms match correctly and the query is expanded indeed, however in some
 cases , usually multi word synonyms the query is returning less results than
 it would without synonyms.. 
 
 Any pointers to where the problem could be?
 
 
 Example: 
 
 before synonyms 
 
 search for crm about 1000 results 
 
 synonyms implemented 
 
 crm, customer relationship management
 
 search for crm about 200 results 
 
 One would expect solr to return more results as a result of using synonyms
 but the effect is exactly opposite..
 
 
 Thanks 
 Rah
 -- 
 View this message in context: 
 http://www.nabble.com/Using-Synonyms-is-actually-narrowing-the-result-set-in-some-cases-tp24380034p24380034.html
 Sent from the Solr - Dev mailing list archive at Nabble.com.



Re: subdirectories under lib

2009-07-06 Thread Otis Gospodnetic

This sounds good to me and I like Yonik's idea, too.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Erik Hatcher e...@ehatchersolutions.com
 To: solr-dev@lucene.apache.org
 Sent: Monday, July 6, 2009 8:41:19 AM
 Subject: Re: subdirectories under lib
 
 Another option is to have a config option for the lib directories (plural) 
 allowing multiple to be specified that can live anywhere, not just under 
 solr-home.
 
 Erik
 
 On Jul 4, 2009, at 12:03 PM, Yonik Seeley wrote:
 
  How hard would it be to allow subdirectories under example/solr/lib?
  
  Seems like it would be nice to allow jars to be partitioned, so
  everything related to solr cell could be put under the
  solr/lib/solrcell directory.  Then extracting request handler could be
  defined as lazy and we could simply tell people to
  remove solr/lib/solrcell if you don't need it.
  
  -Yonik
  http://www.lucidimagination.com



[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-07-01 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12726209#action_12726209
 ] 

Otis Gospodnetic commented on SOLR-908:
---

Thanks Tom.  TODOs are good reminders, so I'd say leave them.

 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1198) confine all solrconfig.xml parsing to SolrConfig.java

2009-06-26 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724581#action_12724581
 ] 

Otis Gospodnetic commented on SOLR-1198:


{quote}
My real objective is to make it possible to start solr w/o a simgle line of xml.
{quote}

Could you elaborate please?  Where would various configuration settings be 
specified?


 confine all solrconfig.xml parsing to SolrConfig.java
 -

 Key: SOLR-1198
 URL: https://issues.apache.org/jira/browse/SOLR-1198
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, 
 SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch


 Currently , xpath evaluations are spread across Solr code. It would be 
 cleaner if if can do it all in one place . All the parsing can be done in 
 SolrConfig.java
 another problem with the current design is that we are not able to benefit 
 from re-use of solrconfig object across cores. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1250) Seach the words having ampersand () symbol

2009-06-26 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-1250.


Resolution: Invalid

Please ask on solr-user list.

 Seach the words having ampersand () symbol
 ---

 Key: SOLR-1250
 URL: https://issues.apache.org/jira/browse/SOLR-1250
 Project: Solr
  Issue Type: Task
  Components: search
Affects Versions: 1.3
 Environment: Linux
Reporter: Secpath
 Fix For: 1.3

   Original Estimate: 24h
  Remaining Estimate: 24h

 I am indexing titles in my index.My titles can also have special characters 
 like (+ -  || ! ( ) { } [ ] ^  ~ * ? : \)
 When i am querieing the index to search with the matching titles , I am using 
 the escape sequence '\'
 as per the doc http://lucene.apache.org/java/2_3_2/queryparsersyntax.html
 It looks fine for most the case except for when the title consists of the 
 character '' or ''
 The query I used to search the index is as below in normal cases...
 http://myurl/solr/mdrs/select/?q=title:someTitle 
 How do I search my index to get the titles like jakarta  apache
 I tried by giving the below query
 http://myurl/solr/mdrs/select/?q=title:jakarta  apache
 http://myurl/solr/mdrs/select/?q=title:jakarta  apache
 http://myurl/solr/mdrs/select/?q=title:jakarta \ apache
 http://myurl/solr/mdrs/select/?q=title:jakarta \ apache
 Each of the above queries are giving errors... Unable to search my title 
 jakarta  apache
 Please let me know how can i search the words having ampersand ()  character

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: lucene releases vs trunk

2009-06-25 Thread Otis Gospodnetic

I kind of agree... But will this (not) affect how quickly new features in 
Luceneland will get their Solr support?  In other words, if we have to wait for 
a proper Lucene release, doesn't that mean that:

1) Solr releases will depend on Lucene releases (unless there are some 
Solr-only changes that don't depend on newer version of Lucene)
2) Solr releases will lag Lucene releases quite a bit because only after Lucene 
has been released Solr developers/contributors will be able to start work on 
integrating new Lucene features into Solr?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Yonik Seeley yo...@lucidimagination.com
 To: solr-dev@lucene.apache.org
 Sent: Thursday, June 25, 2009 7:18:31 AM
 Subject: lucene releases vs trunk
 
 For the next release cycle (presumably 1.5?) I think we should really
 try to stick to released versions of Lucene, and not use dev/trunk
 versions.
 Early in Solr's lifetime, Lucene trunk was more stable (APIs changed
 little, even on non-released versions), and Lucene releases were few
 and far between.
 Today, the pace of change in Lucene has quickened, and Lucene APIs are
 much more in flux until a release is made.  It's also now harder to
 support a Lucene dev release given the growth in complexity
 (particularly for indexing code).  Releases are made more often too,
 making using released versions more practical.
 Many of our users dislike our use of dev versions of Lucene too.
 
 And yes, 1.4 isn't out the door yet - but people often tend to hit the
 ground running on the next release.
 
 -Yonik
 http://www.lucidimagination.com



Re: lucene releases vs trunk

2009-06-25 Thread Otis Gospodnetic

Hello,


- Original Message 
 From: Yonik Seeley yo...@lucidimagination.com
 To: solr-dev@lucene.apache.org
 Sent: Thursday, June 25, 2009 1:41:39 PM
 Subject: Re: lucene releases vs trunk
 
 On Thu, Jun 25, 2009 at 1:29 PM, Chris
 Hostetterwrote:
  : This proposal was just for the next (1.5?) release cycle though.
 ...
  : I agree though - there is rapid movement in Lucene these days, and things 
 can
  : be pulled back or altered fairly easily during trunk dev. Sometimes even 
 index
  : format changing issues - which can be a real pain (having suffered that 
 first
  : hand in the past). The closer we can stay to actual Lucene releases in
  : general, the better I think.
 
  I suggest we not worry about it too much until the situation arrises.
 
 I'm calling attention to it because I don't believe the move to
 2.9-dev was ever discussed on solr-dev.
 AFAIK it was committed as part of SOLR-805... something I missed, and
 I doubt I'm the only one.
 
 The default should be to use released Lucene versions, and we should
 reluctantly move off of that.
 
  Once upon a time the decision to bump the lucene-java rev in Solr was
  drien largely based on wether we people that that version was had useful
  additions *and* was relatively solid.  My impression more recently is
  that people have been bumping the rev primarily with the
  features/improvements in mind, and less consideration of the stability
  probably due to the (completely valid) assumption that solr trunk doesn't
  *need* to be any more stable then the lucene-java trunk, so we might as
  well go ahead and rev and help shake things out.
 
 Right - if we're relatively sure that a Lucene release is imminent
 (and will happen before a Solr release), it's not such a bad idea to
 upgrade.

Aha, so this makes sense.  Stick with the stable version until we see Lucene is 
preparing for a release.  Then upgrade to the latest (nightly) Lucene and catch 
up with the goal of releasing Solr not too long after Lucene has been released.
Like that?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


[jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr

2009-06-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723418#action_12723418
 ] 

Otis Gospodnetic commented on SOLR-908:
---

I took a super quick look and noticed:
* not all classes have ASL (I think unit test classes need it, too)
* Mentions of Copyright 2009, The Regents of The University of Michigan.  I 
have a feeling this would need to be removed
* @author and @version. I know we remove @author lines, and I'm not sure if 
@version is really desired

Looks like a very thorough and complete patch, but I haven't tried it yet.


 Port of Nutch  CommonGrams filter to Solr
 -

 Key: SOLR-908
 URL: https://issues.apache.org/jira/browse/SOLR-908
 Project: Solr
  Issue Type: Wish
  Components: Analysis
Reporter: Tom Burton-West
Priority: Minor
 Attachments: CommonGramsPort.zip, SOLR-908.patch


 Phrase queries containing common words are extremely slow.  We are reluctant 
 to just use stop words due to various problems with false hits and some 
 things becoming impossible to search with stop words turned on. (For example 
 to be or not to be, the who, man in the moon vs man on the moon etc.) 
  
 Several postings regarding slow phrase queries have suggested using the 
 approach used by Nutch.  Perhaps someone with more Java/Solr experience might 
 take this on.
 It should be possible to port the Nutch CommonGrams code to Solr  and create 
 a suitable Solr FilterFactory so that it could be used in Solr by listing it 
 in the Solr schema.xml.
 Construct n-grams for frequently occuring terms and phrases while indexing. 
 Optimize phrase queries to use the n-grams. Single terms are still indexed 
 too, with n-grams overlaid.
 http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: SNMP monitoring

2009-06-18 Thread Otis Gospodnetic

Absolutely and thank you in advance!

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Development Team dev.and...@gmail.com
 To: solr-dev@lucene.apache.org
 Sent: Thursday, June 18, 2009 6:48:26 AM
 Subject: Re: SNMP monitoring
 
 Hi devs,
  A while ago I posted a question to the solr-users list asking about
 SNMP monitoring of Solr. I got one reply suggesting the use of JMX-SNMP
 bridges, but upon researching these I could find a) nothing that seemed
 particularly good, and/or b) none of those that were free/OSS.
  Since then I've found that deploying Solr in JBoss/Jetty with the
 JBoss-SNMP SAR was the easiest way to get this job done. --But it still
 wasn't easy.
  Thus, my question is; would anybody like to me write up a Solr-Wiki
 page on how to expose Solr stats through SNMP? It's a bit involved, and is
 JBoss-specific, however it is a useful feature that other Solr users may
 benefit from. Let me know.
 
 - Daryl.
 
 
 On Wed, Apr 15, 2009 at 3:18 PM, Development Team wrote:
 
  Hi everybody,
   How would I set up SNMP monitoring of my Solr server? I've done some
  searching of the wiki and Google and have come up with a blank. Any
  pointers?
 
  - Daryl.
 



[jira] Resolved: (SOLR-1100) Typo fixes for solrjs docs

2009-06-18 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-1100.


   Resolution: Fixed
Fix Version/s: 1.4
 Assignee: Otis Gospodnetic

Thanks Eric.

Sendingjavascript/src/clientside/AutocompleteWidget.js
Sendingjavascript/src/clientside/CalendarWidget.js
Sendingjavascript/src/clientside/FacetWidget.js
Sendingjavascript/src/clientside/TagcloudWidget.js
Sendingjavascript/src/core/AbstractServerSideWidget.js
Transmitting file data .
Committed revision 786134.


 Typo fixes for solrjs docs
 --

 Key: SOLR-1100
 URL: https://issues.apache.org/jira/browse/SOLR-1100
 Project: Solr
  Issue Type: Improvement
Reporter: Eric Pugh
Assignee: Otis Gospodnetic
Priority: Minor
 Fix For: 1.4

 Attachments: typos.patch


 Matthias suggested I put in a bug here for me small documentation fixes which 
 were done against http://solrstuff.org/svn/solrjs/trunk/.  Not sure if that 
 is the latest or what is in the ASF solr contrib/javascript directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1189) Support basic auth

2009-06-05 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716621#action_12716621
 ] 

Otis Gospodnetic commented on SOLR-1189:


It would be good to have that in that example solrconfig.xml for people to see.


 Support basic auth
 --

 Key: SOLR-1189
 URL: https://issues.apache.org/jira/browse/SOLR-1189
 Project: Solr
  Issue Type: New Feature
  Components: replication (java)
Reporter: Matthew Gregg
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1189.patch


 It would be extremely useful, if replication supported basic authentication.  
  Currently a basic auth protected master/slave, cannot replicate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1205) add a field alias feature

2009-06-05 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716623#action_12716623
 ] 

Otis Gospodnetic commented on SOLR-1205:


Am I the only person who finds that {!foo=bar} syntax very hard to parse and 
understand?


 add a field alias feature
 -

 Key: SOLR-1205
 URL: https://issues.apache.org/jira/browse/SOLR-1205
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
 Fix For: 1.5


 A feature which is similar to the SQL 'as' can be helpful 
 see the mail thread
 http://www.lucidimagination.com/search/document/63b63edc15092922/customizing_results#63b63edc15092922
 it can be implemented as a separate request param
 say 
 {code}
 fl.alias=from_name1:to_name1fl.alias=from_name2:to_name2
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1145) Patch to set IndexWriter.defaultInfoStream from solr.xml

2009-06-04 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716417#action_12716417
 ] 

Otis Gospodnetic commented on SOLR-1145:


I agree about this belonging to solrconfig.xml -- I bet  50% people use 
multicore.


 Patch to set IndexWriter.defaultInfoStream from solr.xml
 

 Key: SOLR-1145
 URL: https://issues.apache.org/jira/browse/SOLR-1145
 Project: Solr
  Issue Type: Improvement
Reporter: Chris Harris
 Fix For: 1.4

 Attachments: SOLR-1145.patch, SOLR-1145.patch


 Lucene IndexWriters use an infoStream to log detailed info about indexing 
 operations for debugging purpose. This patch is an extremely simple way to 
 allow logging this info to a file from within Solr: After applying the patch, 
 set the new defaultInfoStreamFilePath attribute of the solr element in 
 solr.xml to the path of the file where you'd like to save the logging 
 information.
 Note that, in a multi-core setup, all cores will end up logging to the same 
 infoStream log file. This may not be desired. (But it does justify putting 
 the setting in solr.xml rather than solrconfig.xml.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size

2009-06-02 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-1192:
---


That stems from Lucene, see LUCENE-1491.


 solr.NGramFilterFactory stops to index the content if it find a token smaller 
 than minim ngram size
 ---

 Key: SOLR-1192
 URL: https://issues.apache.org/jira/browse/SOLR-1192
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.3
 Environment: any
Reporter: viobade
 Fix For: 1.3


 If a field is split in tokens (by a tokenizer) and after that is aplied the 
 NGramFilterFactory for these tokens...the indexing goes well while the length 
 of the tokens is greater or equal with minim ngram size (ussually is 3). 
 Otherwise the indexing breaks in this point and the rest of tokens  are no 
 more indexed. This behaviour can be easy observed with the analysis tool 
 which is in Solr admin interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size

2009-06-02 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-1192:
---

Fix Version/s: (was: 1.3)
   1.4

 solr.NGramFilterFactory stops to index the content if it find a token smaller 
 than minim ngram size
 ---

 Key: SOLR-1192
 URL: https://issues.apache.org/jira/browse/SOLR-1192
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.3
 Environment: any
Reporter: viobade
 Fix For: 1.4


 If a field is split in tokens (by a tokenizer) and after that is aplied the 
 NGramFilterFactory for these tokens...the indexing goes well while the length 
 of the tokens is greater or equal with minim ngram size (ussually is 3). 
 Otherwise the indexing breaks in this point and the rest of tokens  are no 
 more indexed. This behaviour can be easy observed with the analysis tool 
 which is in Solr admin interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-990) Add pid file to snapinstaller to skip script overruns, and recover from failure

2009-06-02 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-990.
---

Resolution: Fixed

Thank you, Dan!

Sendingsrc/scripts/snapinstaller
Transmitting file data .
Committed revision 781069.


 Add pid file to snapinstaller to skip script overruns, and recover from 
 failure
 ---

 Key: SOLR-990
 URL: https://issues.apache.org/jira/browse/SOLR-990
 Project: Solr
  Issue Type: Improvement
  Components: replication (scripts)
Reporter: Dan Rosher
Assignee: Otis Gospodnetic
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-990.patch, SOLR-990.patch, SOLR-990.patch, 
 SOLR-990.patch


 The pid file will allow snapinstaller to be run as fast as possible without 
 overruns. Also it will recover from a last failed run should an older 
 snapinstaller process no longer be running. 
 Avoiding overruns means that snapinstaller can be run as fast as possible, 
 but without suffering from the performance issue described here:
 http://wiki.apache.org/solr/SolrPerformanceFactors#head-fc7f22035c493431d58c5404ab22aef0ee1b9909
  
 This means that one can do the following
 */1 * * * * /bin/snappuller/bin/snapinstaller
 Even with a 'properly tuned' setup, there can be times where snapinstaller 
 can suffer from overruns due to a lack of resources, or an unoptimized index 
 using more resources etc.
 currently the pid will live in /tmp ... perhaps it should be in the logs dir?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: update Lucene

2009-05-30 Thread Otis Gospodnetic

Clearly I meant ...along with *Lucene* jars :)

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 To: solr-dev@lucene.apache.org
 Sent: Wednesday, May 27, 2009 11:59:18 PM
 Subject: Re: update Lucene
 
 
 I wonder if it would be useful to commit Lucene's CHANGES.txt into Solr along 
 with Solr jars.  It would then be very easy to tell what changed in Lucene 
 since 
 the version Solr has and the current version of Lucene (or some newer 
 released 
 version, if we were able to be behind).
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
  From: Yonik Seeley 
  To: solr-dev@lucene.apache.org
  Sent: Wednesday, May 27, 2009 4:58:39 PM
  Subject: update Lucene
  
  I think we should upgrade Lucene again since the index file format has 
 changed:
  https://issues.apache.org/jira/browse/LUCENE-1654
  
  This also contains a fix for unifying the FieldCache and
  ExtendedFieldCache instances.
  
  $ svn diff -r r776177 CHANGES.txt
  Index: CHANGES.txt
  ===
  --- CHANGES.txt(revision 776177)
  +++ CHANGES.txt(working copy)
  @@ -27,7 +27,11 @@
   implement Searchable or extend Searcher, you should change you
   code to implement this method.  If you already extend
   IndexSearcher, no further changes are needed to use Collector.
  -(Shai Erera via Mike McCandless)
  +
  +Finally, the values Float.Nan, Float.NEGATIVE_INFINITY and
  +Float.POSITIVE_INFINITY are not valid scores.  Lucene uses these
  +values internally in certain places, so if you have hits with such
  +scores it will cause problems. (Shai Erera via Mike McCandless)
  
  Changes in runtime behavior
  
  @@ -107,10 +111,10 @@
  that's visited.  All core collectors now use this API.  (Mark
  Miller, Mike McCandless)
  
  -8. LUCENE-1546: Add IndexReader.flush(String commitUserData), allowing
  -   you to record an opaque commitUserData into the commit written by
  -   IndexReader.  This matches IndexWriter's commit methods.  (Jason
  -   Rutherglen via Mike McCandless)
  +8. LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
  +   you to record an opaque commitUserData (maps String - String) into
  +   the commit written by IndexReader.  This matches IndexWriter's
  +   commit methods.  (Jason Rutherglen via Mike McCandless)
  
  9. LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
  enable compressing  decompressing binary content, external to
  @@ -135,6 +139,9 @@
   not make sense for all subclasses of MultiTermQuery. Check individual
   subclasses to see if they support #getTerm().  (Mark Miller)
  
  +14. LUCENE-1636: Make TokenFilter.input final so it's set only
  +once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
  +
  Bug fixes
  
  1. LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals()
  @@ -176,6 +183,9 @@
  sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC
  was used vs.
  when it wasn't). (Shai Erera via Michael McCandless)
  
  +10. LUCENE-1647: Fix case where IndexReader.undeleteAll would cause
  +the segment's deletion count to be incorrect. (Mike McCandless)
  +
New features
  
1. LUCENE-1411: Added expert API to open an IndexWriter on a prior
  @@ -186,10 +196,11 @@
   when building transactional support on top of Lucene.  (Mike
   McCandless)
  
  - 2. LUCENE-1382: Add an optional arbitrary String commitUserData to
  -IndexWriter.commit(), which is stored in the segments file and is
  -then retrievable via IndexReader.getCommitUserData instance and
  -static methods.  (Shalin Shekhar Mangar via Mike McCandless)
  + 2. LUCENE-1382: Add an optional arbitrary Map (String - String)
  +commitUserData to IndexWriter.commit(), which is stored in the
  +segments file and is then retrievable via
  +IndexReader.getCommitUserData instance and static methods.
  +(Shalin Shekhar Mangar via Mike McCandless)
  
3. LUCENE-1406: Added Arabic analyzer.  (Robert Muir via Grant Ingersoll)
  
  @@ -311,6 +322,10 @@
  25. LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take
   deletions into account when considering merges.  (Yasuhiro Matsuda
   via Mike McCandless)
  +
  +26. LUCENE-1550: Added new n-gram based String distance measure for
  spell checking.
  +See the Javadocs for NGramDistance.java for a reference paper on
  why this is helpful (Tom Morton via Grant Ingersoll)
  +
  
  Optimizations
  
  
  -Yonik
  http://www.lucidimagination.com



Re: update Lucene

2009-05-27 Thread Otis Gospodnetic

I wonder if it would be useful to commit Lucene's CHANGES.txt into Solr along 
with Solr jars.  It would then be very easy to tell what changed in Lucene 
since the version Solr has and the current version of Lucene (or some newer 
released version, if we were able to be behind).

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Yonik Seeley yo...@lucidimagination.com
 To: solr-dev@lucene.apache.org
 Sent: Wednesday, May 27, 2009 4:58:39 PM
 Subject: update Lucene
 
 I think we should upgrade Lucene again since the index file format has 
 changed:
 https://issues.apache.org/jira/browse/LUCENE-1654
 
 This also contains a fix for unifying the FieldCache and
 ExtendedFieldCache instances.
 
 $ svn diff -r r776177 CHANGES.txt
 Index: CHANGES.txt
 ===
 --- CHANGES.txt(revision 776177)
 +++ CHANGES.txt(working copy)
 @@ -27,7 +27,11 @@
  implement Searchable or extend Searcher, you should change you
  code to implement this method.  If you already extend
  IndexSearcher, no further changes are needed to use Collector.
 -(Shai Erera via Mike McCandless)
 +
 +Finally, the values Float.Nan, Float.NEGATIVE_INFINITY and
 +Float.POSITIVE_INFINITY are not valid scores.  Lucene uses these
 +values internally in certain places, so if you have hits with such
 +scores it will cause problems. (Shai Erera via Mike McCandless)
 
 Changes in runtime behavior
 
 @@ -107,10 +111,10 @@
 that's visited.  All core collectors now use this API.  (Mark
 Miller, Mike McCandless)
 
 -8. LUCENE-1546: Add IndexReader.flush(String commitUserData), allowing
 -   you to record an opaque commitUserData into the commit written by
 -   IndexReader.  This matches IndexWriter's commit methods.  (Jason
 -   Rutherglen via Mike McCandless)
 +8. LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
 +   you to record an opaque commitUserData (maps String - String) into
 +   the commit written by IndexReader.  This matches IndexWriter's
 +   commit methods.  (Jason Rutherglen via Mike McCandless)
 
 9. LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
 enable compressing  decompressing binary content, external to
 @@ -135,6 +139,9 @@
  not make sense for all subclasses of MultiTermQuery. Check individual
  subclasses to see if they support #getTerm().  (Mark Miller)
 
 +14. LUCENE-1636: Make TokenFilter.input final so it's set only
 +once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
 +
 Bug fixes
 
 1. LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals()
 @@ -176,6 +183,9 @@
 sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC
 was used vs.
 when it wasn't). (Shai Erera via Michael McCandless)
 
 +10. LUCENE-1647: Fix case where IndexReader.undeleteAll would cause
 +the segment's deletion count to be incorrect. (Mike McCandless)
 +
   New features
 
   1. LUCENE-1411: Added expert API to open an IndexWriter on a prior
 @@ -186,10 +196,11 @@
  when building transactional support on top of Lucene.  (Mike
  McCandless)
 
 - 2. LUCENE-1382: Add an optional arbitrary String commitUserData to
 -IndexWriter.commit(), which is stored in the segments file and is
 -then retrievable via IndexReader.getCommitUserData instance and
 -static methods.  (Shalin Shekhar Mangar via Mike McCandless)
 + 2. LUCENE-1382: Add an optional arbitrary Map (String - String)
 +commitUserData to IndexWriter.commit(), which is stored in the
 +segments file and is then retrievable via
 +IndexReader.getCommitUserData instance and static methods.
 +(Shalin Shekhar Mangar via Mike McCandless)
 
   3. LUCENE-1406: Added Arabic analyzer.  (Robert Muir via Grant Ingersoll)
 
 @@ -311,6 +322,10 @@
 25. LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take
  deletions into account when considering merges.  (Yasuhiro Matsuda
  via Mike McCandless)
 +
 +26. LUCENE-1550: Added new n-gram based String distance measure for
 spell checking.
 +See the Javadocs for NGramDistance.java for a reference paper on
 why this is helpful (Tom Morton via Grant Ingersoll)
 +
 
 Optimizations
 
 
 -Yonik
 http://www.lucidimagination.com



[jira] Commented: (SOLR-920) Cache and reuse IndexSchema

2009-05-22 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712113#action_12712113
 ] 

Otis Gospodnetic commented on SOLR-920:
---

So if my core has its own schema.xml in the right place (in conf/schema.xml), 
that schema will be used, not the shard one?

 Cache and reuse IndexSchema
 ---

 Key: SOLR-920
 URL: https://issues.apache.org/jira/browse/SOLR-920
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-920.patch


 if there are 1000's of cores then the cost of loading unloading schema.xml 
 can be prohibitive
 similar to SOLR-919 we can also cache the DOM object of schema.xml if the 
 location on disk is same.  All the dynamic properties can be replaced lazily 
 when they are read.
 We can go one step ahead in this case. Th IndexSchema object is immutable . 
 So if there are no core properties then the same IndexSchema object can be 
 used across all the cores

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1142) faster example schema

2009-05-21 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711814#action_12711814
 ] 

Otis Gospodnetic commented on SOLR-1142:


I'd comment-out dynamic fields and I'd leave uniqueKey as I bet 99% of users 
need it.

 faster example schema
 -

 Key: SOLR-1142
 URL: https://issues.apache.org/jira/browse/SOLR-1142
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
 Fix For: 1.4


 need faster example schema:
 http://www.lucidimagination.com/search/document/d46ea3fa441b6d94

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-920) Cache and reuse IndexSchema

2009-05-21 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711856#action_12711856
 ] 

Otis Gospodnetic commented on SOLR-920:
---

Looks good to me.  What happens when a core has a copy of schema.xml in its 
conf/ dir and that schema.xml is potentially different from the shared one?


 Cache and reuse IndexSchema
 ---

 Key: SOLR-920
 URL: https://issues.apache.org/jira/browse/SOLR-920
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul

 if there are 1000's of cores then the cost of loading unloading schema.xml 
 can be prohibitive
 similar to SOLR-919 we can also cache the DOM object of schema.xml if the 
 location on disk is same.  All the dynamic properties can be replaced lazily 
 when they are read.
 We can go one step ahead in this case. Th IndexSchema object is immutable . 
 So if there are no core properties then the same IndexSchema object can be 
 used across all the cores

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1147) QueryElevationComponent : updating elevate.xml through HTTP

2009-05-18 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12710397#action_12710397
 ] 

Otis Gospodnetic commented on SOLR-1147:


Nicolas - I think it may make sense to edit and rename/redescribe this issue 
now, if you are going to make this more generic.


 QueryElevationComponent : updating elevate.xml through HTTP
 ---

 Key: SOLR-1147
 URL: https://issues.apache.org/jira/browse/SOLR-1147
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3, 1.4, 1.5
 Environment: Any
Reporter: Nicolas Pastorino
Priority: Minor
 Attachments: QueryElevationAdministrationRequestHandler.java, 
 QueryElevationAdministrationRequestHandler.java


 If one wants to update the configuration file for the 
 QueryElevationComponent, direct edition of the file is mandatory. Currently 
 the process seems to be : 
 # Replace elevate.xml in Solr's dataDir
 # Commit. It appears that when having elevate.xml in Solr's dataDir, and 
 solely in this case, commiting triggers a reload of elevate.xml. This does 
 not happen when elevate.xml is stored in Solr's conf dir.
 As a system using Solr, i would find handy to be able to push an updated 
 elevate.xml file/XML through HTTP, with an automatic reload of it. This would 
 remove the currently mandatory requirement of having a direct access to the 
 elevate.xml file, allowing more distributed architectures. This would also 
 increase the Query Elevation system's added value by making it dynamic, 
 configuration-wise.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1171) dynamic field name with spaces causes error

2009-05-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12710143#action_12710143
 ] 

Otis Gospodnetic commented on SOLR-1171:


Should field names with spaces be supported?  Are they supported in Lucene 
(ignoring the lack of support by the QP)?



 dynamic field name with spaces causes error
 ---

 Key: SOLR-1171
 URL: https://issues.apache.org/jira/browse/SOLR-1171
 Project: Solr
  Issue Type: Bug
Reporter: Ryan McKinley
 Fix For: 1.4


 Stumbled into this bug.  I have a dynamic field meta_set_*  When I add the 
 field:  meta_set_NoData Value and try to open luke, I get this exception:
 {panel}
 May 15, 2009 3:42:06 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: undefined field Value
   at 
 org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1132)
   at 
 org.apache.solr.schema.IndexSchema.getFieldType(IndexSchema.java:1094)
   at 
 org.apache.solr.search.SolrQueryParser.getRangeQuery(SolrQueryParser.java:121)
   at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1514)
   at 
 org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1349)
   at 
 org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1306)
   at 
 org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1266)
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:172)
   at 
 org.apache.solr.handler.admin.LukeRequestHandler.getIndexedFieldsInfo(LukeRequestHandler.java:310)
   at 
 org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:147)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
 {panel}
 note the field is meta_set_gdal_NoData Value not Value
 I think the query parser is grabbing it...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1149) Make QParserPlugin and related classes extendible

2009-05-13 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12709104#action_12709104
 ] 

Otis Gospodnetic commented on SOLR-1149:


It's set for release in 1.4, but subject to more review.
+1 from me.

 Make QParserPlugin and related classes extendible
 -

 Key: SOLR-1149
 URL: https://issues.apache.org/jira/browse/SOLR-1149
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Kaktu Chakarabati
 Fix For: 1.4

 Attachments: SOLR-1149.patch


 In a recent attempt to create a QParserPlugin which extends 
 DisMaxQParser/FunctionQParser functionality, 
 it became apparent that in the current state of these classes, it is not 
 straight forward and in fact impossible to seriously build
 upon the existing code. 
 To this end, I've refactored some of the involved classes which enabled me to 
 reuse existing logic to great results.
 I thought I will share these changes and comment on their nature in the hope 
 these will make sense to other solr developers/users, and
 at the very least cultivate a fruitful discussion about this particular area 
 of the solr codebase.
 The relevant changes are as follows:
 * Renamed DismaxQParser class to DisMaxQParser ( in accordance with the 
 apparent naming convention, e.g DisMaxQParserPlugin )
 * Moved DisMaxQParser to its own .java file, making it a public class rather 
 than its previous package-private visibility. This makes
   it possible for users to build upon its logic, which is considerable, and 
 to my mind is a good place to start alot of custom
   QParser implementations.
 * Changed access modifiers for the QParser abstract base class to protected 
 (were package-private). Again as above, it makes this
   object usable by user-defined classes that wish to define custom QParser 
 classes. More generally, and on the philosophy-of-code 
   side of things, it seems misleading to define some class members as having 
 the default access modifier (package-private) and then
   letting other package-scope derived classes use these while not explicitly 
 allowing user-defined derived classes to make use of these members.
   In specific i'm thinking of how DisMaxQParser makes use of these members: 
 **not because it is derived from QParser, but because it
   simply resides in the same namespace**
 * Changed access modifier for the QueryParsing.StrParser inner class and its 
 constructors to public. Again as in above, same issue
   of having same-package classes enjoy the benefit of being in the same 
 namespace (FunctionQParser.parse() uses it like so), 
   while user defined classes cannot. Particulary in this case it is pretty 
 bad since this class advertises itself as a collection of utilities
   for query parsing in general - great resource, should probably even live 
 elsewhere (common.utils?)
 * Changed Function.FunctionWeight inner class data member modifiers to 
 protected (were default - package-private). This allowed me
   to inherit from FunctionQuery as well as make use of its original 
 FunctionWeight inner class while overriding some of the latter's
   methods. This is in the same spirit of the changes above. Please also note 
 this follows the common Query/Weight implementation pattern
   in the lucene codebase, see for example the BooleanQuery/BooleanWeight code.
 All in all these are relatively minor changes which unlock a great deal of 
 functionality to 3rd party developers, which i think is
 ultimately a big part of what solr is all about - extendability. It is also 
 perhaps a cue for a more serious refactoring of the
 QParserPlugin hierarchy, although i will leave such bold exclamations to 
 another occasion.
 Attached is a patch file, having passed the usual coding-style/unit testing 
 cycle.
 -Chak

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1147) QueryElevationComponent : updating elevate.xml through HTTP

2009-05-13 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12709241#action_12709241
 ] 

Otis Gospodnetic commented on SOLR-1147:


I'm +1 for the idea of being able to push elevate config (and really other 
config files, too!) from a remote system into Solr.
I only skimmed the patch.
It would be good to add a unit test.  Could you do that?
You'll also want to add the ASL on top of the source code.  It may also be good 
to remove eZ publish references from the Javadoc (having that in the javadoc 
doesn't really help developers using Solr)
Is that  at the end of QueryElevationAdministrationRequestHandler.class +   
really needed?
Please note the bit about the code formatting here:
http://wiki.apache.org/solr/HowToContribute#head-59ae13df098fbdcc46abdf980aa8ee76d3ee2e3b

Thanks!


 QueryElevationComponent : updating elevate.xml through HTTP
 ---

 Key: SOLR-1147
 URL: https://issues.apache.org/jira/browse/SOLR-1147
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3, 1.4, 1.5
 Environment: Any
Reporter: Nicolas Pastorino
Priority: Minor
 Attachments: QueryElevationAdministrationRequestHandler.java, 
 QueryElevationAdministrationRequestHandler.java


 If one wants to update the configuration file for the 
 QueryElevationComponent, direct edition of the file is mandatory. Currently 
 the process seems to be : 
 # Replace elevate.xml in Solr's dataDir
 # Commit. It appears that when having elevate.xml in Solr's dataDir, and 
 solely in this case, commiting triggers a reload of elevate.xml. This does 
 not happen when elevate.xml is stored in Solr's conf dir.
 As a system using Solr, i would find handy to be able to push an updated 
 elevate.xml file/XML through HTTP, with an automatic reload of it. This would 
 remove the currently mandatory requirement of having a direct access to the 
 elevate.xml file, allowing more distributed architectures. This would also 
 increase the Query Elevation system's added value by making it dynamic, 
 configuration-wise.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-822) CharFilter - normalize characters before tokenizer

2009-04-24 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702434#action_12702434
 ] 

Otis Gospodnetic commented on SOLR-822:
---

Todd's comment from Oct 23, 2008 caught my attention:

{quote}
It should also work for existing filters like LowerCase. Seems like it has the 
potential to be faster then the filters, as it doesn't have to perform the same 
replacement multiple times if a particular character is replicated into 
multiple tokens, like in NGramTokenizer or CJKTokenizer. 
{quote}

Couldn't we replace LowerCaseFilter then?  Or does LCF still have some unique 
value?  Ah, it does - it makes it possible to put it *after* something like 
WordDelimiterFilterFactory.  Lowercasing at the very beginning would make it 
impossible for WDFF to do its job.  Never mind.  Leaving for posterity.

 CharFilter - normalize characters before tokenizer
 --

 Key: SOLR-822
 URL: https://issues.apache.org/jira/browse/SOLR-822
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Affects Versions: 1.3
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.4

 Attachments: character-normalization.JPG, sample_mapping_ja.txt, 
 sample_mapping_ja.txt, SOLR-822-for-1.3.patch, SOLR-822-renameMethod.patch, 
 SOLR-822.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch


 A new plugin which can be placed in front of tokenizer/.
 {code:xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping_ja.txt /
 tokenizer class=solr.MappingCJKTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 {code}
 charFilter/ can be multiple (chained). I'll post a JPEG file to show 
 character normalization sample soon.
 MOTIVATION:
 In Japan, there are two types of tokenizers -- N-gram (CJKTokenizer) and 
 Morphological Analyzer.
 When we use morphological analyzer, because the analyzer uses Japanese 
 dictionary to detect terms,
 we need to normalize characters before tokenization.
 I'll post a patch soon, too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-633) QParser for use with user-entered query which recognizes subphrases as well as allowing some other customizations on per field basis

2009-04-17 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700422#action_12700422
 ] 

Otis Gospodnetic commented on SOLR-633:
---

This description could sure use an example! :)  I read it 3 times and still 
don't have a good picture of what this is really about.


 QParser for use with user-entered query which recognizes subphrases as well 
 as allowing some other customizations on per field basis
 

 Key: SOLR-633
 URL: https://issues.apache.org/jira/browse/SOLR-633
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
 Environment: All
Reporter: Preetam Rao
Priority: Minor
 Fix For: 1.5


 Create a request handler (actually a QParser) for use with user entered 
 queries with following features-
 a) Take a user query string and try to match it against multiple fields, 
 while recognizing sub-phrase matches.
 b) For each field give the below parameters:
1) phraseBoost - the factor which decides how good a n token sub phrase 
 match is compared to n-1 token sub-phrase match.
2) maxScoreOnly - If there are multiple sub-phrase matches pick, only the 
 highest
3) ignoreDuplicates - If the same sub-phrase query matches multiple times, 
 pick only one.
4) disableOtherScoreFactors - Ignore tf, query norm, idf and any other 
 parameters which are not relevant.
 c) Try to provide all the parameters similar to dismax. Reuse or extend 
 dismax.  
 Other suggestions and feedback appreciated :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Contributing Translations

2009-04-16 Thread Otis Gospodnetic

I like the multilingualness is general... but in this case I think Grant is 
correct about non-primary language docs getting outdated quickly.  It's hard to 
keep even just English docs up to date!  And stale, incorrect docs are worse 
than no docs.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Grant Ingersoll gsing...@apache.org
 To: solr-dev@lucene.apache.org
 Sent: Monday, April 13, 2009 3:11:33 PM
 Subject: Re: Contributing Translations
 
 First off, let me say that I would love to see translations of Solr docs.
 
 My main concern is one of maintainability.  If we agree to commit 
 translations, 
 then we as committers need to be able to maintain them as well.  I am not 
 sure 
 which is worse, no translations or out of date translations.
 
 Say, for example, that I make a patch that changes how the spell checker 
 works 
 in Solr.  As an English speaker, I can easily update the English docs as part 
 of 
 my patch, but I wouldn't even know where to begin with, say, Swahili (picking 
 a 
 language I feel safe saying that none of our committers speak for an example, 
 not b/c anyone is proposing a Swahili translation).  So, now, it is up to the 
 community to fix that documentation.  Which, is, of course, fine, except I'd 
 venture to say most committers wouldn't even be in the position to know 
 whether 
 the patch is good, so we'd have to take it on faith.  Committing on faith 
 isn't 
 usually a good thing.
 
 We should look into how other Apache projects handle it before committing to 
 saying we are going to support other languages.  I can ask over on 
 commun...@apache.org if people would like.
 
 On Apr 9, 2009, at 10:40 PM, Green Crescent Translations wrote:
 
  Hello,
  
  I'm a project manager for Green Crescent Translations and I'm always 
  looking 
 to assist the open source community by providing translations of web sites, 
 manuals, user interfaces and such.  If you're interested, please let us know. 
  
 We'd be happy to translate you web site documentation into needed languages.  
 Just let me know which languages and what texts are essential and we'd be 
 happy 
 to help.
  
  Many thanks,
  
  Jonathan
  
  
  



[jira] Commented: (SOLR-634) Solr user interface

2009-04-02 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695093#action_12695093
 ] 

Otis Gospodnetic commented on SOLR-634:
---

Lars, how come you opted to use HTTPClient directly instead of using SolrJ? (I 
see no mention of solrj in the manual either).  Or perhaps you have a 
SolrAdapter version that uses SolrJ by now?  Thanks. 

 Solr user interface
 ---

 Key: SOLR-634
 URL: https://issues.apache.org/jira/browse/SOLR-634
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Lars Kotthoff
 Attachments: SOLR-634.patch, solr-ui.tar.gz


 Provide an example user interface for Solr (web application) for people to 
 try out Solr's capabilities.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: replication (request handler) Qtime goes mad?

2009-03-26 Thread Otis Gospodnetic

Could you please re-send your message to solr-user instead?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: sunnyfr johanna...@gmail.com
 To: solr-dev@lucene.apache.org
 Sent: Thursday, March 26, 2009 9:29:40 AM
 Subject: replication (request handler) Qtime goes mad?
 
 
 Hi,
 
 Just applied replication by requestHandler.
 And since this the Qtime went mad and can reach long time 
 name=QTime9068
 Without this replication Qtime can be around 1sec. 
 
 I've 14Mdocs stores for 11G. so not a lot of data stores.
 I've servers with 8G and tomcat use 7G.
 I'm updating every 30mn which is about 50 000docs.
 Have a look as well at my cpu which are aswell quite full ? 
 
 Have you an idea? Do I miss a patch ? 
 Thanks a lot,
 
 Solr Specification Version: 1.3.0.2009.01.22.13.51.22
 Solr Implementation Version: 1.4-dev exported - root - 2009-01-22 13:51:22
 
 http://www.nabble.com/file/p22722028/cpu_.jpg cpu_.jpg 
 -- 
 View this message in context: 
 http://www.nabble.com/replication-%28request-handler%29-Qtime-goes-mad--tp22722028p22722028.html
 Sent from the Solr - Dev mailing list archive at Nabble.com.



Re: LinkedIn open source project: kamikaze/lucene-ext

2009-03-25 Thread Otis Gospodnetic

Hi,

At which point would you say the number of cached bitsets should be considered 
excessive?  Simply a function of bitset size (index size) and memory/JVM heap?

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Jason Rutherglen jason.rutherg...@gmail.com
 To: solr-dev@lucene.apache.org
 Sent: Tuesday, March 24, 2009 2:48:25 PM
 Subject: Re: LinkedIn open source project: kamikaze/lucene-ext
 
 http://bobo-browse.wiki.sourceforge.net/
 
 For faceting, the Bobo library from LinkedIn may be useful in cases where
 the number of cached bitsets is excessive.
 
 On Sun, Mar 22, 2009 at 8:35 PM, Lance Norskog wrote:
 
  LinkedIn open-sourced a pile of DocSet compression implementations as
  Lucene-Ext, or kamikaze:
  http://code.google.com/p/lucene-ext/wiki/Kamikaze
 
  Has anyone looked at using these in Solr?
 
  --
  Lance Norskog
  goks...@gmail.com
  650-922-8831 (US)
 



[jira] Commented: (SOLR-1079) Rename omitTf to omitTermFreqAndPositions

2009-03-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12688382#action_12688382
 ] 

Otis Gospodnetic commented on SOLR-1079:


I agree with shorter is better... though we should avoid cryptic or misleading. 
 I think omitTf is misleading.  I'd rather we think of something that's maybe 
less descriptive (since one will need to look at the docs anyway), but not 
misleading (making the person think looking at the docs is not necessary)

maybe omitTermsomething?  Info?  That would sort of match Lucene's TermInfo 
object (which doesn't encompass Payloads, though).


 Rename omitTf to omitTermFreqAndPositions
 -

 Key: SOLR-1079
 URL: https://issues.apache.org/jira/browse/SOLR-1079
 Project: Solr
  Issue Type: Improvement
  Components: documentation, update
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.4


 LUCENE-1561 has renamed omitTf.
 See 
 http://www.lucidimagination.com/search/document/376c1c12dd464164/lucene_1561_and_omittf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: JCache API and EHCache

2009-03-23 Thread Otis Gospodnetic

Want to open a JIRA issue (Enhancement?)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: KaktuChakarabati jimmoe...@gmail.com
 To: solr-dev@lucene.apache.org
 Sent: Monday, March 23, 2009 3:12:16 PM
 Subject: JCache API and EHCache
 
 
 Hey,
 What do you guys think about overhauling the caching layer to be compliant
 with the upcoming Jcache api? (jsr-107)
 In specific, I've been experimenting some with ehcache
 (http://ehcache.sourceforge.net/ , Apache OS license) and it seems to be a
 very comprehensive implementation, as well as fully compliant with the API. 
 I think the benefits are numerous: in respect to ehcache itself, it seems to
 be a very mature implementation, supporting most classical cache schemes as
 well as some interesting distributed cache options (and of course,
 performance-wise its very lucrative in terms of reported multi-cpu scaling
 performance and  some of the benchmark figures they show). 
 
 Further, abstracting away the caches to use the jcache api would probably
 make it easier in the future to make the whole caching layer more easily
 swappable with some other implementations that will probably crop up.
 
 Maybe for the 1.5 roadmap? just a thought...
 
 Chak
 
 -- 
 View this message in context: 
 http://www.nabble.com/JCache-API-and-EHCache-tp22667097p22667097.html
 Sent from the Solr - Dev mailing list archive at Nabble.com.



[jira] Commented: (SOLR-1065) Add a ContentStreamDataSource to DIH to accept post data

2009-03-19 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683502#action_12683502
 ] 

Otis Gospodnetic commented on SOLR-1065:


{quote}
regular update handler can only handle xml in the standard format. With DIH you 
can post any xml or any other file . Moreover DIH lets you have custom 
transformations to the data.

It is also possible to mix the uploaded data with other DatSources (DB) before 
creating the documents
{quote}

Is there a reason why this can't be added to the core update handler?


 Add a ContentStreamDataSource to DIH to accept post data
 

 Key: SOLR-1065
 URL: https://issues.apache.org/jira/browse/SOLR-1065
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4

 Attachments: SOLR-1065.patch, SOLR-1065.patch, SOLR-1065.patch


 It is a common requirement to push data to DIH. Currently it is not possible 
 . If we have a ContentStreamDataSource it can easily solve this problem
 sample configuration
 {code:xml}
 dataSource type=ContentStreamDataSource/
 {code}
 This datasource does not need any extra configuration. Make a normal POST 
 request with the data as the body. The params remain same.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: faster example schema

2009-03-10 Thread Otis Gospodnetic

+1


I think we could try just comment out the kitchen sick portions and avoid 
maintaining 2 config files.
 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Shalin Shekhar Mangar shalinman...@gmail.com
 To: solr-dev@lucene.apache.org; yo...@lucidimagination.com
 Sent: Sunday, March 8, 2009 1:20:50 AM
 Subject: Re: faster example schema
 
 +1
 
 perhaps schema.xml and schema-example.xml?
 
 On Sat, Mar 7, 2009 at 8:42 PM, Yonik Seeley wrote:
 
  I've occasionally run across people going with another search engine
  because it was faster at indexing.
  The example schema that people may be using as a base to do their
  benchmarking (with perhaps minimal modifications) is slow.
  There are many people out there that check what's fastest first, and
  *then* check if it is satisfactory to meet their needs in other areas.
 
  With very simple synthetic test documents (just a few fields each) and
  the CSV loader, I've personally seen the indexing rate go from
  ~330/sec to ~3000/sec, when I removed the default field values, term
  vectors, copyFields, etc.  The default example schema should still be
  able to show how something can be done, but that doesn't mean it needs
  to be enabled by default.
 
  So what do people think about speeding up the default/example schema before
  1.4?
 
  -Yonik
  http://www.lucidimagination.com
 
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.



[jira] Updated: (SOLR-346) need to improve snapinstaller to ignore non-snapshots in data directory

2009-03-02 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-346:
--

Fix Version/s: (was: 1.3.1)
   1.4

 need to improve snapinstaller to ignore non-snapshots in data directory
 ---

 Key: SOLR-346
 URL: https://issues.apache.org/jira/browse/SOLR-346
 Project: Solr
  Issue Type: Improvement
  Components: replication (scripts)
Affects Versions: 1.2, 1.3
Reporter: Bill Au
Assignee: Bill Au
Priority: Minor
 Fix For: 1.4

 Attachments: solr-346.patch


 http://www.mail-archive.com/solr-u...@lucene.apache.org/msg05734.html
  latest snapshot /opt/solr/data/temp-snapshot.20070816120113 already
  installed
 A directory in the Solr data directory is causing snapinstaller to fail.  
 Snapinstaller should be improved to ignore any much non-snapshot as possible. 
  It can use a regular expression to look for snapshot.dd where d 
 is a digit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (SOLR-346) need to improve snapinstaller to ignore non-snapshots in data directory

2009-03-02 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic reopened SOLR-346:
---


 need to improve snapinstaller to ignore non-snapshots in data directory
 ---

 Key: SOLR-346
 URL: https://issues.apache.org/jira/browse/SOLR-346
 Project: Solr
  Issue Type: Improvement
  Components: replication (scripts)
Affects Versions: 1.2, 1.3
Reporter: Bill Au
Assignee: Bill Au
Priority: Minor
 Fix For: 1.4

 Attachments: solr-346.patch


 http://www.mail-archive.com/solr-u...@lucene.apache.org/msg05734.html
  latest snapshot /opt/solr/data/temp-snapshot.20070816120113 already
  installed
 A directory in the Solr data directory is causing snapinstaller to fail.  
 Snapinstaller should be improved to ignore any much non-snapshot as possible. 
  It can use a regular expression to look for snapshot.dd where d 
 is a digit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: An ExternalIndexField implementation with multicore

2009-02-19 Thread Otis Gospodnetic
Isn't Lucene's ParallelReader meant to address such use cases?  Don't ask me 
for details, the actual use of PR always seemed a bit fuzzy to me because of 
its requirement to keep docIDs in sync.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com
To: solr-dev@lucene.apache.org
Sent: Thursday, February 19, 2009 8:12:28 PM
Subject: An ExternalIndexField implementation with multicore

hi,

Just the way we have an ExternalFileField is it possible to refer to a
field (ExternalIndexField) in another index ( which lives in another
core)?

I would not want to search on that field but I may wish to use it to
filter or sort or as a ValueSource in a Function

The usecase is as follows.
--
I have a large index with huge docs which changes less frequently
(think of a mailbox). The user may arbitrarily apply/remove tags on
that. but I may not wish to reindex the mails where the tags are
applied. I want to just add a small doc mail-unique_id and the tag
into another index in another core. When I query, I wish to  apply a
filter of the label or when i retrieve the mail details I want to get
the tags (stored field) applied to that.

Another one.
I have a huge index of products which the users can vote up or down
(say popularity). I may want to add the add the popularity of the item
into another index and when I query I wish to sort by the popularity.

the commits on the other external index will be more frequent than the
main index.

What are the challenges in implementing something like this? I wish to
raise a Jira issue if it looks feasible


-- 
--Noble Paul


[jira] Updated: (SOLR-952) duplicated code in (Default)SolrHighlighter and HighlightingUtils

2009-02-19 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-952:
--

  Description: A large quantity of code is duplicated between the 
deprecated HighlightingUtils class and the newer SolrHighlighter and 
DefaultSolrHighlighter (which have been getting bug fixes and enhancements). 
The Utils class is no longer used anywhere in Solr, but people writing plugins 
may be taking advantage of it, so it should be cleaned up.  (was: 
A large quantity of code is duplicated between the deprecated HighlightingUtils 
class and the newer SolrHighlighter and DefaultSolrHighlighter (which have been 
getting bug fixes and enhancements). The Utils class is no longer used anywhere 
in Solr, but people writing plugins may be taking advantage of it, so it should 
be cleaned up.)
Fix Version/s: 1.4

 duplicated code in (Default)SolrHighlighter and HighlightingUtils
 -

 Key: SOLR-952
 URL: https://issues.apache.org/jira/browse/SOLR-952
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.4
Reporter: Chris Harris
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-952.patch


 A large quantity of code is duplicated between the deprecated 
 HighlightingUtils class and the newer SolrHighlighter and 
 DefaultSolrHighlighter (which have been getting bug fixes and enhancements). 
 The Utils class is no longer used anywhere in Solr, but people writing 
 plugins may be taking advantage of it, so it should be cleaned up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1022) suggest multiValued for ignored field

2009-02-17 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674507#action_12674507
 ] 

Otis Gospodnetic commented on SOLR-1022:


I must have missed some releated thread on the ML... but can you explan what 
you mean by an unmatched multi-valued field?
And what does ignored field mean?  Thanks.

 suggest multiValued for ignored field
 -

 Key: SOLR-1022
 URL: https://issues.apache.org/jira/browse/SOLR-1022
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
 Environment: Mac OS 10.5 java 1.5
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1022.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 We are actually using the suggested ignored field in the schema.  I have 
 found, however, that Solr still throws a error 400 if I send in an unmatched 
 multi-valued field.
 It seems that if I set this ignored field to be multiValued than a document 
 with unrecognized single or multiple value fields is sucessfully indexed.
 Attached patch alters this suggested item in the schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-670) UpdateHandler must provide a rollback feature

2009-02-10 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672441#action_12672441
 ] 

Otis Gospodnetic commented on SOLR-670:
---

Is it possible that the new rollback causes the IndexWriter to be closed on 
error, which then causes the following error next time you try to add a (valid) 
document?

Feb 10, 2009 5:46:28 PM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: {} 0 1
Feb 10, 2009 5:46:28 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is 
closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:397)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:402)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2108)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:218)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

After rollback is invoked, is one supposed to execute some other command to get 
Solr in a healthy state?


 UpdateHandler must provide a rollback feature
 -

 Key: SOLR-670
 URL: https://issues.apache.org/jira/browse/SOLR-670
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4

 Attachments: SOLR-670.patch, SOLR-670.patch, SOLR-670.patch


 Lucene IndexWriter already has a rollback method. There should be a 
 counterpart for the same in _UpdateHandler_  so that users can do a rollback 
 over http 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-670) UpdateHandler must provide a rollback feature

2009-02-10 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672523#action_12672523
 ] 

Otis Gospodnetic commented on SOLR-670:
---

That was with Solr trunk (svn up-ed right before trying).
I did not call commit after rollback when that happened, though I *think* I 
tried adding commit, too, and that didn't do anything either.


 UpdateHandler must provide a rollback feature
 -

 Key: SOLR-670
 URL: https://issues.apache.org/jira/browse/SOLR-670
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4

 Attachments: SOLR-670.patch, SOLR-670.patch, SOLR-670.patch


 Lucene IndexWriter already has a rollback method. There should be a 
 counterpart for the same in _UpdateHandler_  so that users can do a rollback 
 over http 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1005) DoubleMetaphone Filter Produces NullpointerException on zero-length token

2009-02-06 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-1005.


Resolution: Fixed
  Assignee: Otis Gospodnetic

Thanks Michael.

Committed revision 741721.


 DoubleMetaphone Filter Produces NullpointerException on zero-length token
 -

 Key: SOLR-1005
 URL: https://issues.apache.org/jira/browse/SOLR-1005
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.4
 Environment: jdk 1.6.10, tomcat 6.x
Reporter: Michael Henson
Assignee: Otis Gospodnetic
 Attachments: solr-1005.zip


 If any token given to the DoubleMetaphoneFilter is empty (Token exists, 0 
 length), then the encoder will return null instead of a metaphone encoded 
 string. The current code assumes that there will always be a valid object 
 returned.
 Proposed solution: Make sure 0-length tokens are skipped at the top branch 
 where the code checks whether or not we have a Token object at all.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [Solr Wiki] Update of LBHttpSolrServer by OtisGospodnetic

2009-02-03 Thread Otis Gospodnetic
I'd simply address that first.  I feel that's the first question people will 
ask (themselves).  But, sorry for the interruption. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com
 To: solr-dev@lucene.apache.org
 Sent: Tuesday, February 3, 2009 12:50:24 PM
 Subject: Re: [Solr Wiki] Update of LBHttpSolrServer by OtisGospodnetic
 
 isn't this the same as the When to use this? section ?
 why do we need a separate section?
 
 On Tue, Feb 3, 2009 at 9:50 PM, Apache Wiki wrote:
  Dear Wiki user,
 
  You have subscribed to a wiki page or wiki category on Solr Wiki for 
  change 
 notification.
 
  The following page has been changed by OtisGospodnetic:
  http://wiki.apache.org/solr/LBHttpSolrServer
 
  --
   == What is LBHttpSolrServer? ==
  - LB!HttpSolrServer or !LoadBalanced !HttpSolrServer is just a wrapper to 
 !CommonsHttpSolrServer. This is useful when you have multiple !SolrServers 
 and 
 the requests need to be Load Balanced among them. it offers automatic 
 failover 
 when a server goes down and it detects when the server  comes back up
  + LB!HttpSolrServer or !LoadBalanced !HttpSolrServer is just a wrapper to 
 !CommonsHttpSolrServer. This is useful when you have multiple !SolrServers 
 and 
 the requests need to be Load Balanced among them. it offers automatic 
 failover 
 when a server goes down and it detects when the server comes back up.
  +
  + TODO: address Why would I use LBHttpSolrServer instead of existing hw/sf 
 LB-s.
 
   == How to use? ==
   {{{
 
 
 
 
 -- 
 --Noble Paul



[jira] Commented: (SOLR-844) A SolrServer impl to front-end multiple urls

2009-02-03 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12670042#action_12670042
 ] 

Otis Gospodnetic commented on SOLR-844:
---

Good comment from Wunder's made on the ML:
{quote}
This would be useful if there was search-specific balancing,
like always send the same query back to the same server. That
can make your cache far more effective.

wunder
{quote}

 A SolrServer impl to front-end multiple urls
 

 Key: SOLR-844
 URL: https://issues.apache.org/jira/browse/SOLR-844
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4

 Attachments: SOLR-844.patch, SOLR-844.patch, SOLR-844.patch, 
 SOLR-844.patch


 Currently a {{CommonsHttpSolrServer}} can talk to only one server. This 
 demands that the user have a LoadBalancer or do the roundrobin on their own. 
 We must have a {{LBHttpSolrServer}} which must automatically do a 
 Loadbalancing between multiple hosts. This can be backed by the 
 {{CommonsHttpSolrServer}}
 This can have the following other features
 * Automatic failover
 * Optionally take in  a file /url containing the the urls of servers so that 
 the server list can be automatically updated  by periodically loading the 
 config
 * Support for adding removing servers during runtime
 * Pluggable Loadbalancing mechanism. (round-robin, weighted round-robin, 
 random etc)
 * Pluggable Failover mechanisms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-844) A SolrServer impl to front-end multiple urls

2009-01-22 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666296#action_12666296
 ] 

Otis Gospodnetic commented on SOLR-844:
---

I'm not sure there is a clear consensus about this functionality being a good 
thing.  Perhaps we can get more people's opinions?


 A SolrServer impl to front-end multiple urls
 

 Key: SOLR-844
 URL: https://issues.apache.org/jira/browse/SOLR-844
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4

 Attachments: SOLR-844.patch, SOLR-844.patch, SOLR-844.patch


 Currently a {{CommonsHttpSolrServer}} can talk to only one server. This 
 demands that the user have a LoadBalancer or do the roundrobin on their own. 
 We must have a {{LBHttpSolrServer}} which must automatically do a 
 Loadbalancing between multiple hosts. This can be backed by the 
 {{CommonsHttpSolrServer}}
 This can have the following other features
 * Automatic failover
 * Optionally take in  a file /url containing the the urls of servers so that 
 the server list can be automatically updated  by periodically loading the 
 config
 * Support for adding removing servers during runtime
 * Pluggable Loadbalancing mechanism. (round-robin, weighted round-robin, 
 random etc)
 * Pluggable Failover mechanisms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-844) A SolrServer impl to front-end multiple urls

2009-01-22 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666296#action_12666296
 ] 

otis edited comment on SOLR-844 at 1/22/09 1:12 PM:


I'm not sure there is a clear consensus about this functionality being a good 
thing (also 0 votes).  Perhaps we can get more people's opinions?


  was (Author: otis):
I'm not sure there is a clear consensus about this functionality being a 
good thing.  Perhaps we can get more people's opinions?

  
 A SolrServer impl to front-end multiple urls
 

 Key: SOLR-844
 URL: https://issues.apache.org/jira/browse/SOLR-844
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4

 Attachments: SOLR-844.patch, SOLR-844.patch, SOLR-844.patch


 Currently a {{CommonsHttpSolrServer}} can talk to only one server. This 
 demands that the user have a LoadBalancer or do the roundrobin on their own. 
 We must have a {{LBHttpSolrServer}} which must automatically do a 
 Loadbalancing between multiple hosts. This can be backed by the 
 {{CommonsHttpSolrServer}}
 This can have the following other features
 * Automatic failover
 * Optionally take in  a file /url containing the the urls of servers so that 
 the server list can be automatically updated  by periodically loading the 
 config
 * Support for adding removing servers during runtime
 * Pluggable Loadbalancing mechanism. (round-robin, weighted round-robin, 
 random etc)
 * Pluggable Failover mechanisms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-960) CommonsHttpSolrServer - documentation - phase II (Addition of log in setMaxRetries as a warning for out of range input)

2009-01-14 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-960.
---

Resolution: Fixed

I'll assume the int-long is fine.

Sending
src/solrj/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java
Transmitting file data .
Committed revision 734606.


 CommonsHttpSolrServer - documentation - phase II  (Addition of log in 
 setMaxRetries as a warning for out of range input) 
 -

 Key: SOLR-960
 URL: https://issues.apache.org/jira/browse/SOLR-960
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Kay Kay
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-960.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Add javadoc for : 
 CommonsHttpSolrServer#AGENT
 CommonsHttpSolrServer#_invariantParams
 CommonsHttpSolrServer#_followRedirects
 CommonsHttpSolrServer#_allowCompression , _maxRetries 
 #setConnectionTimeout, #setSoTimeout
 #setConnectionManagerTimeout(int) deprecated in favor of 
 #setConnectionManagerTimeout(long) with the same API as in HttpClient 3.1 . 
 #setMaxRetries -  there would be a warning in the log message if the maximum 
 retries were  1 to keep the programmer explicitly aware of the same. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-849) Add bwlimit support to snappuller

2009-01-13 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-849.
---

Resolution: Won't Fix

No need for this since we are moving away from shell script-based replication, 
most likely.


 Add bwlimit support to snappuller
 -

 Key: SOLR-849
 URL: https://issues.apache.org/jira/browse/SOLR-849
 Project: Solr
  Issue Type: Improvement
  Components: replication (scripts)
Reporter: Otis Gospodnetic
Priority: Minor
 Attachments: SOLR-849.patch


 From http://markmail.org/message/njnbh5gbb2mvfe24

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-849) Add bwlimit support to snappuller

2009-01-13 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-849:
--

Assignee: Otis Gospodnetic

 Add bwlimit support to snappuller
 -

 Key: SOLR-849
 URL: https://issues.apache.org/jira/browse/SOLR-849
 Project: Solr
  Issue Type: Improvement
  Components: replication (scripts)
Reporter: Otis Gospodnetic
Assignee: Otis Gospodnetic
Priority: Minor
 Attachments: SOLR-849.patch


 From http://markmail.org/message/njnbh5gbb2mvfe24

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-958) CommonsHttpSolrServer - documentation ..

2009-01-13 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-958.
---

Resolution: Fixed
  Assignee: Otis Gospodnetic

Thanks!

Sending
src/solrj/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java
Transmitting file data .
Committed revision 734326.


 CommonsHttpSolrServer - documentation .. 
 -

 Key: SOLR-958
 URL: https://issues.apache.org/jira/browse/SOLR-958
 Project: Solr
  Issue Type: Bug
  Components: documentation
Reporter: Kay Kay
Assignee: Otis Gospodnetic
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-958.patch

   Original Estimate: 0.17h
  Remaining Estimate: 0.17h

 clarification about ResponseParser member , useMultiPartPost 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-957) CommonParams#VERSION : Inconsistent doc

2009-01-13 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-957.
---

Resolution: Fixed

Thanks Kay.
Sendingsrc/common/org/apache/solr/common/params/CommonParams.java
Transmitting file data .
Committed revision 734329.


 CommonParams#VERSION :  Inconsistent doc
 

 Key: SOLR-957
 URL: https://issues.apache.org/jira/browse/SOLR-957
 Project: Solr
  Issue Type: Bug
  Components: documentation
Reporter: Kay Kay
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-957.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The doc for VERSION (in CommonParams) seems to be copied from the previous 
 field. (totally unrelated ). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-956) SolrParams#getFieldInt(String, String) - inconsistent documentation

2009-01-13 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-956.
---

Resolution: Fixed
  Assignee: Otis Gospodnetic

Thanks Kay.

Sendingsrc/common/org/apache/solr/common/params/SolrParams.java
Transmitting file data .
Committed revision 734330.


 SolrParams#getFieldInt(String, String)  - inconsistent documentation 
 -

 Key: SOLR-956
 URL: https://issues.apache.org/jira/browse/SOLR-956
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Kay Kay
Assignee: Otis Gospodnetic
 Fix For: 1.4

 Attachments: SOLR-956.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 SolrParams#getFieldInt(String, String) documentation says it returns def. if 
 the value does not exist. 
 There is no def. passed on to the method - so seems to be inconsistent with 
 what the method does. It returns null if the field,param does not exist. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-954) SolrQuery - better cross-referential documentation / fix inconsistent cross-reference links .

2009-01-13 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-954.
---

Resolution: Fixed
  Assignee: Otis Gospodnetic

Thanks Kay.

Sendingsrc/solrj/org/apache/solr/client/solrj/SolrQuery.java
Transmitting file data .
Committed revision 734332.


 SolrQuery - better cross-referential documentation / fix inconsistent 
 cross-reference links .
 -

 Key: SOLR-954
 URL: https://issues.apache.org/jira/browse/SOLR-954
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.3
 Environment: Tomcat 6, Java 6 
Reporter: Kay Kay
Assignee: Otis Gospodnetic
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-954.patch, SOLR-954.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 SolrQuery methods need quite a bit of documentation as the javadoc appears to 
 be blank at the moment and comments for some deprecated methods point to 
 non-existent methods.  Patch relevant to documentation available herewith. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-953) Small simplification for LuceneGapFragmenter.isNewFragment

2009-01-13 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-953.
---

Resolution: Fixed
  Assignee: Otis Gospodnetic

Thanks Chris.

Sendingsrc/java/org/apache/solr/highlight/GapFragmenter.java
Transmitting file data .
Committed revision 734336.


 Small simplification for LuceneGapFragmenter.isNewFragment
 --

 Key: SOLR-953
 URL: https://issues.apache.org/jira/browse/SOLR-953
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Affects Versions: 1.4
Reporter: Chris Harris
Assignee: Otis Gospodnetic
Priority: Minor
 Attachments: SOLR-953.patch


 This little patch makes the code for LuceneGapFragmenter.isNewFragment(Token) 
 slightly more intuitive.
 The method currently features the line
 {code}
 fragOffsetAccum += token.endOffset() - fragOffsetAccum;
 {code}
 This can be simplified, though, to just
 {code}
 fragOffsetAccum = token.endOffset();
 {code}
 Maybe it's just me, but I find the latter expression's intent to be 
 sufficiently clearer than the former to warrant committing such a change.
 This patch makes this simplification. Also, if you do make this 
 simplification, then it doesn't really make sense to think of fragOffsetAccum 
 as an accumulator anymore, so in the patch we rename the variable to just 
 fragOffset.
 Tests from HighlighterTest.java pass with the patch applied.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2009-01-09 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662626#action_12662626
 ] 

Otis Gospodnetic commented on SOLR-308:
---

Lance - anyone can add/modify a Wiki page.  Do you mind adding info about this 
field type?


 Add a field that generates an unique id when you have none in your data to 
 index
 

 Key: SOLR-308
 URL: https://issues.apache.org/jira/browse/SOLR-308
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Thomas Peuss
Assignee: Hoss Man
Priority: Minor
 Fix For: 1.3

 Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch, 
 UUIDField.patch, UUIDField.patch


 This patch adds a field that generates an unique id when you have no unique 
 id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: error: exceeded limit of maxWarmingSearcher

2009-01-05 Thread Otis Gospodnetic
Doesn't that mean that you are doing something that causes searchers to warm up 
(e.g. running snap* scripts or your new replication equivalent) and doing that 
so frequently that when you do this for the third time the first two searchers 
are still warming up?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com
 To: solr-dev@lucene.apache.org
 Sent: Monday, January 5, 2009 11:10:06 PM
 Subject: error: exceeded limit of maxWarmingSearcher
 
 I have implemented the javabin update functionality (SOLR-8965) and
 the LargeVolumeJettytestcase is failing with the following message.
 
 
 exceeded limit of maxWarmingSearchers=2, try again later.
 Jan 5, 2009 5:44:40 PM
 org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: {} 0 15
 Jan 5, 2009 5:44:40 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new
 searcher. exceeded limit of maxWarmingSearchers=2, try again later.
 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1050)
 at 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
 at 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:78)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:95)
 
 Can anyone point me to what I may be doing wrong?
 -- 
 --Noble Paul



Re: [jira] Commented: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-01-02 Thread Otis Gospodnetic
Quick clarifications:

- Droids: http://incubator.apache.org/droids/index.html
- DIH: http://wiki.apache.org/solr/DataImportHandler
- Solr + Tika: http://wiki.apache.org/solr/ExtractingRequestHandler


Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Ben Johnson ben.john...@jandpconsulting.co.uk
 To: solr-dev@lucene.apache.org
 Sent: Thursday, January 1, 2009 6:00:43 PM
 Subject: Re: [jira] Commented: (SOLR-934) Enable importing of mails into a 
 solr index through DIH.
 
 I'm watching this issue with interest, but I'm having trouble understanding 
 the 
 bigger picture.  I am prototyping a system that uses Restlet to store and 
 index 
 objects (mainly MS Office and OpenOffice documents and emails), so I am 
 planning 
 to use Solr with Tika to index the objects.
 
 I know nothing about DIH (Distributed Index Handler?), so I'm not sure what 
 role 
 it plays with Solr.  Is it a vendor-specific technology (from Autonomy)?  
 What 
 does it do?  Do you give it objects to index and it handles them by passing 
 it 
 to one or more Solr/Tika indexing servers?  And are you thinking that this 
 would 
 therefore be a good place to not only index the objects, but also pass the 
 information about the digital content to DROID?
 
 Reading a bit about DROID (from TNA, The National Archives), it seems like it 
 is 
 used to capture information about the digital content of objects stored in a 
 content repository.  How does this fit with Solr?  I thought Solr with Tika 
 just 
 did the indexing of text-based objects, but the actual storage of the objects 
 would be elsewhere (probably in the file system). From what I can tell, DROID 
 would operate on the file system objects, not the indexing information.  Have 
 I 
 got this right?
 
 Ideally, I would also like to convert any suitable content into PDF/A format 
 for 
 long-term archival - probably not relevant to this issue, but I thought I'd 
 mention it in case you see an application of this as part of email and 
 attachment storage.
 
 Sorry for all the questions, but hopefully someone could clarify this for me!
 
 Thanks very much
 Ben Johnson
 
 --
 From: Grant Ingersoll (JIRA) 
 Sent: Thursday, January 01, 2009 7:07 PM
 To: 
 Subject: [jira] Commented: (SOLR-934) Enable importing of mails into a solr 
 index through DIH.
 
  
 [ 
 https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12660210#action_12660210
  
 ]
  
  Grant Ingersoll commented on SOLR-934:
  --
  
  Would it make more sense for DIH to farm out it's content acquisition to a 
 library like Droids?  Then, we could have real crawling, etc. all through a 
 pluggable connector framework.
  
  Enable importing of mails into a solr index through DIH.
  
  
  Key: SOLR-934
  URL: https://issues.apache.org/jira/browse/SOLR-934
  Project: Solr
   Issue Type: New Feature
   Components: contrib - DataImportHandler
 Affects Versions: 1.4
 Reporter: Preetam Rao
 Assignee: Shalin Shekhar Mangar
  Fix For: 1.4
  
  Attachments: SOLR-934.patch, SOLR-934.patch
  
Original Estimate: 24h
   Remaining Estimate: 24h
  
  Enable importing of mails into solr through DIH. Take one or more mailbox 
 credentials, download and index their content along with the content from 
 attachments. The folders to fetch can be made configurable based on various 
 criteria. Apache Tika is used for extracting content from different kinds of 
 attachments. JavaMail is used for mail box related operations like fetching 
 mails, filtering them etc.
  The basic configuration for one mail box is as below:
  {code:xml}
  
 
  password=something host=imap.gmail.com 
  protocol=imaps/
  
  {code}
  The below is the list of all configuration available:
  {color:green}Required{color}
  -
  *user*
  *pwd*
  *protocol*  (only imaps supported now)
  *host*
  {color:green}Optional{color}
  -
  *folders* - comma seperated list of folders.
  If not specified, default folder is used. Nested folders can be specified 
 like a/b/c
  *recurse* - index subfolders. Defaults to true.
  *exclude* - comma seperated list of patterns.
  *include* - comma seperated list of patterns.
  *batchSize* - mails to fetch at once in a given folder.
  Only headers can be prefetched in Javamail IMAP.
  *readTimeout* - defaults to 6ms
  *conectTimeout* - defaults to 3ms
  *fetchSize* - IMAP config. 32KB default
  *fetchMailsSince* -
  date/time in miliiseconds, mails received after which will be fetched. 
  Useful 
 for delta import.
  *customFilter* - class name.
  {code}
  import javax.mail.Folder;
  import javax.mail.SearchTerm;
  clz 

  1   2   3   4   5   >