Solr Slow in Unix

2009-07-16 Thread Anand Kumar Prabhakar

I'm running a Solr instance in Apache Tomcat 6 in a Solaris Box. The QTimes
are high when compared to the same configuration on a Windows machine. Can
anyone help with the configurations i can check to improve the performance?
-- 
View this message in context: 
http://www.nabble.com/Solr-Slow-in-Unix-tp24512286p24512286.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



Solr nightly build failure

2009-07-16 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build
[mkdir] Created dir: /tmp/apache-solr-nightly/build/web

compile-solrj:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj
[javac] Compiling 83 source files to /tmp/apache-solr-nightly/build/solrj
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solr
[javac] Compiling 363 source files to /tmp/apache-solr-nightly/build/solr
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 161 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

junit:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results
[junit] Running org.apache.solr.BasicFunctionalityTest
[junit] Tests run: 19, Failures: 0, Errors: 0, Time elapsed: 34.434 sec
[junit] Running org.apache.solr.ConvertedLegacyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 18.866 sec
[junit] Running org.apache.solr.DisMaxRequestHandlerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 19.113 sec
[junit] Running org.apache.solr.EchoParamsTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 6.624 sec
[junit] Running org.apache.solr.OutputWriterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 10.401 sec
[junit] Running org.apache.solr.SampleTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.039 sec
[junit] Running org.apache.solr.SolrInfoMBeanTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.449 sec
[junit] Running org.apache.solr.TestDistributedSearch
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 89.175 sec
[junit] Running org.apache.solr.TestTrie
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 21.283 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.542 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.446 sec
[junit] Running org.apache.solr.analysis.EnglishPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 3.702 sec
[junit] Running org.apache.solr.analysis.HTMLStripReaderTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.989 sec
[junit] Running org.apache.solr.analysis.LengthFilterTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.421 sec
[junit] Running org.apache.solr.analysis.SnowballPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.175 sec
[junit] Running org.apache.solr.analysis.TestBufferedTokenStream
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 3.057 sec
[junit] Running org.apache.solr.analysis.TestCapitalizationFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.857 sec
[junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.042 sec
[junit] Running org.apache.solr.analysis.TestKeepFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.447 sec
[junit] Running org.apache.solr.analysis.TestKeepWordFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.428 sec
[junit] Running org.apache.solr.analysis.TestMappingCharFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.647 sec
[junit] Running org.apache.solr.analysis.TestPatternReplaceFilter
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 4.766 sec
[junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.368 sec
[junit] Running org.apache.solr.analysis.TestPhoneticFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.25 sec
[junit] Running org.apache.solr.analysis.TestRemoveDuplicatesTokenFilter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.26 sec
[junit] Running org.apache.solr.analysis.TestS

[jira] Updated: (SOLR-1286) DIH: The commit parameter is always defaulting to "true" even if "false" is explicitly passed in.

2009-07-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1286:
-

Attachment: SOLR-1286.patch

the last patch did not have the tests

> DIH: The commit parameter is always defaulting to "true" even if "false" is 
> explicitly passed in.
> -
>
> Key: SOLR-1286
> URL: https://issues.apache.org/jira/browse/SOLR-1286
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Jay Hill
>Assignee: Erik Hatcher
> Fix For: 1.4
>
> Attachments: SOLR-1286.patch, SOLR-1286.patch, SOLR-1286.patch, 
> SOLR-1286.patch, SOLR-1286.patch
>
>
> I've tried running full and delta imports with commit=false so that the 
> autoCommit will manage all commits to the index. However setting commit=false 
> doesn't have any effect: 
> curl 
> 'http://localhost:8080/solr/dataimporter?command=full-import&commit=false'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1286) DIH: The commit parameter is always defaulting to "true" even if "false" is explicitly passed in.

2009-07-16 Thread Jay Hill (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731955#action_12731955
 ] 

Jay Hill commented on SOLR-1286:


Noble and Erik, thanks for the quick response. I just applied the patch and 
rebuilt. It doesn't matter what I enter as a param for commit, when the finish 
method executes requestParameters.commit always equals true:

Using: curl 
'http://localhost:8080/solr/indexer/books?command=full-import&commit=false'

The commit is still occurring. I just woke up and tested, so I'll dig in a 
little more to try to find out what's up.

> DIH: The commit parameter is always defaulting to "true" even if "false" is 
> explicitly passed in.
> -
>
> Key: SOLR-1286
> URL: https://issues.apache.org/jira/browse/SOLR-1286
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Jay Hill
>Assignee: Erik Hatcher
> Fix For: 1.4
>
> Attachments: SOLR-1286.patch, SOLR-1286.patch, SOLR-1286.patch, 
> SOLR-1286.patch, SOLR-1286.patch
>
>
> I've tried running full and delta imports with commit=false so that the 
> autoCommit will manage all commits to the index. However setting commit=false 
> doesn't have any effect: 
> curl 
> 'http://localhost:8080/solr/dataimporter?command=full-import&commit=false'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-750) DateField.parseMath doesn't handle non-existent Z

2009-07-16 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731957#action_12731957
 ] 

David Smiley commented on SOLR-750:
---

Ignoring the long-gone circumstances in which I encountered and reported this 
issue originally...
 I do feel strongly that Solr shouldn't force me to specify a Z when Solr 
doesn't really have any time zone support.  And as such it shouldn't emit the 
"Z" in date output either.  If I'm using Solr and want to feed it dates in a 
particular time zone, or perhaps a local-time of day, and clients expect this, 
then why should Solr force me to specify a timezone?  I find it irritating.

> DateField.parseMath doesn't handle non-existent Z
> -
>
> Key: SOLR-750
> URL: https://issues.apache.org/jira/browse/SOLR-750
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-750_DateField_no_Z.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> I've run into situations when trying to use SOLR-540 (wildcard highlight 
> spec) such that if attempts to highlight a date field, I get a stack trace 
> from DateField.parseMath puking because there isn't a "Z" at the end of an 
> otherwise good date-time string.  It was very easy to fix the code to make it 
> react gracefully to no Z.  Attached is the patch.  This bug isn't really 
> related to SOLR-540 so please apply it without waiting for 540.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (SOLR-750) DateField.parseMath doesn't handle non-existent Z

2009-07-16 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reopened SOLR-750:
---


> DateField.parseMath doesn't handle non-existent Z
> -
>
> Key: SOLR-750
> URL: https://issues.apache.org/jira/browse/SOLR-750
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-750_DateField_no_Z.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> I've run into situations when trying to use SOLR-540 (wildcard highlight 
> spec) such that if attempts to highlight a date field, I get a stack trace 
> from DateField.parseMath puking because there isn't a "Z" at the end of an 
> otherwise good date-time string.  It was very easy to fix the code to make it 
> react gracefully to no Z.  Attached is the patch.  This bug isn't really 
> related to SOLR-540 so please apply it without waiting for 540.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1286) DIH: The commit parameter is always defaulting to "true" even if "false" is explicitly passed in.

2009-07-16 Thread Jay Hill (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731966#action_12731966
 ] 

Jay Hill commented on SOLR-1286:


Looking at the handleRequestBody method of DataImportHandler, it looks like it 
is getting the correct value for "commit" from the request, but during the 
mapping to create the DataImporter.RequestParams object commit is always 
getting set as "true":

SolrParams params = req.getParams();
System.out.println(" ---From request : " + 
params.get("commit"));
DataImporter.RequestParams requestParams = new 
DataImporter.RequestParams(getParamsMap(params));
System.out.println(" ---RequestParams: " + 
requestParams.commit);

the output:
---From request : false
---RequestParams: true

still digging.


> DIH: The commit parameter is always defaulting to "true" even if "false" is 
> explicitly passed in.
> -
>
> Key: SOLR-1286
> URL: https://issues.apache.org/jira/browse/SOLR-1286
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Jay Hill
>Assignee: Erik Hatcher
> Fix For: 1.4
>
> Attachments: SOLR-1286.patch, SOLR-1286.patch, SOLR-1286.patch, 
> SOLR-1286.patch, SOLR-1286.patch
>
>
> I've tried running full and delta imports with commit=false so that the 
> autoCommit will manage all commits to the index. However setting commit=false 
> doesn't have any effect: 
> curl 
> 'http://localhost:8080/solr/dataimporter?command=full-import&commit=false'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Forcing config replication without index change

2009-07-16 Thread Otis Gospodnetic

Hi,

Shouldn't it be possible to force replication of at least *some* of the config 
files even if the index hasn't changed?
(see Paul Noble's comment on http://markmail.org/message/hgdwumfuuwixfxvq and 
the 4-message thread)

Here is a use case:
* Index is mostly static (nightly updates)
* elevate.xml needs to be changed throughout the day
* elevate.xml needs to be pushed to slaves and solr needs to reload it

This is currently not possible because replication will happen only if the 
index changed in some way.  You can't force a commit to fake index change.  So 
one has to either:
* add/delete dummy docs on master to force index change
* write an external script that copies the config file to slaves


Shouldn't it be possible to force replication of at least *some* of the config 
files even if the index hasn't changed?

Thanks,
Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



Re: Solr Slow in Unix

2009-07-16 Thread Yonik Seeley
On Thu, Jul 16, 2009 at 4:18 AM, Anand Kumar
Prabhakar wrote:
> I'm running a Solr instance in Apache Tomcat 6 in a Solaris Box. The QTimes
> are high when compared to the same configuration on a Windows machine. Can
> anyone help with the configurations i can check to improve the performance?

What's the hardware actually look like on each machine?

-Yonik
http://www.lucidimagination.com


[jira] Commented: (SOLR-1041) dataDir is not set relative to instanceDir

2009-07-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731994#action_12731994
 ] 

Otis Gospodnetic commented on SOLR-1041:


I worked around it by using the relative directory in instanceDir instead of 
using the absolute directory.  I think one should able to use either an 
absolute or a relative directory.

If it matter, note that I don't have dataDir in cores' solrconfig.xml files or 
in solr.xml, so Solr uses defaults (data/) for that.


> dataDir is not set relative to instanceDir 
> ---
>
> Key: SOLR-1041
> URL: https://issues.apache.org/jira/browse/SOLR-1041
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-1041.patch, SOLR-1041.patch
>
>
> see the mail thread. http://markmail.org/thread/ebd7vumj3uyzpyt6
> A recent bug fix has broken the feature. Now it is always relative to current 
> working directory for single core

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Forcing config replication without index change

2009-07-16 Thread Mark Miller
bq. Shouldn't it be possible to force replication of at least *some* of the
config files even if the index hasn't changed?
Indeed. Perhaps another call? forceIndexFetch? it replicates configs whether
the index has changed or not, but wouldn't replicate the index if it didn't
need to?

Or a separate call altogether? fetchConfig, that just updates the configs?



On Thu, Jul 16, 2009 at 3:00 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

>
> Hi,
>
> Shouldn't it be possible to force replication of at least *some* of the
> config files even if the index hasn't changed?
> (see Paul Noble's comment on http://markmail.org/message/hgdwumfuuwixfxvqand 
> the 4-message thread)
>
> Here is a use case:
> * Index is mostly static (nightly updates)
> * elevate.xml needs to be changed throughout the day
> * elevate.xml needs to be pushed to slaves and solr needs to reload it
>
> This is currently not possible because replication will happen only if the
> index changed in some way.  You can't force a commit to fake index change.
>  So one has to either:
> * add/delete dummy docs on master to force index change
> * write an external script that copies the config file to slaves
>
>
> Shouldn't it be possible to force replication of at least *some* of the
> config files even if the index hasn't changed?
>
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>


-- 
-- 
- Mark

http://www.lucidimagination.com


Re: Solr Slow in Unix

2009-07-16 Thread Walter Underwood
In particular, are you using local disc or network storage? --wunder

On 7/16/09 8:24 AM, "Yonik Seeley"  wrote:

> On Thu, Jul 16, 2009 at 4:18 AM, Anand Kumar
> Prabhakar wrote:
>> I'm running a Solr instance in Apache Tomcat 6 in a Solaris Box. The QTimes
>> are high when compared to the same configuration on a Windows machine. Can
>> anyone help with the configurations i can check to improve the performance?
> 
> What's the hardware actually look like on each machine?
> 
> -Yonik
> http://www.lucidimagination.com



[jira] Commented: (SOLR-1279) ApostropheTokenizer

2009-07-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731998#action_12731998
 ] 

Otis Gospodnetic commented on SOLR-1279:


Boris, please let us know if WordDelimiterFilter works for you.
If it does not and this new code is needed, could you please:
* add the ASL to the top
* write a bit of javadoc (your description from this issue is good)
* write a unit test

Thanks for your help!

> ApostropheTokenizer
> ---
>
> Key: SOLR-1279
> URL: https://issues.apache.org/jira/browse/SOLR-1279
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Sergey Borisov
>Priority: Minor
> Fix For: 1.5
>
> Attachments: ApostropheTokenizer.zip
>
>
> ApostropheTokenizer creates extra tokens during the analysis stage for the 
> fields containing apostrophes. The reason for adding this is to ensure that 
> documents that differ only by apostrophe have the same relevancy score. 
> For example, if the document contains string "McDonald's", it will be 
> tokenized as "McDonald's McDonalds". This way when the search is performed 
> against "McDonald's" or "McDonalds" will produce similar score.
> This code handles up to two apostrophes in a token.
> To use this tokenizer add the following line in schema.xml
> 
>   
> ...
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-07-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731999#action_12731999
 ] 

Otis Gospodnetic commented on SOLR-1275:


Patch looks good to me (also not tested it)

> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



MLT termOffsets, What is it used for ?

2009-07-16 Thread Emmanuel Castro Santana

Hi you all

We are trying to set up the schema.xml for a better similarity search using
MoreLikeThisHandler and we wonder what this termOffsets option is actually
used for. All we know so far is that it increases the storage costs, but how
exactly ?
We have been struggling to find this information for a while, if any of you
could give us a help I would really appreciate.

Thanks in advance.
Emmanuel Santana
-- 
View this message in context: 
http://www.nabble.com/MLT-termOffsets%2C-What-is-it-used-for---tp24520786p24520786.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



[jira] Commented: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-07-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732032#action_12732032
 ] 

Jason Rutherglen commented on SOLR-1275:


I can add some tests? (would like to get familiar with Solr's test framework)

> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1286) DIH: The commit parameter is always defaulting to "true" even if "false" is explicitly passed in.

2009-07-16 Thread Jay Hill (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Hill updated SOLR-1286:
---

Attachment: SOLR-1286.patch

Found the problem. There was a test to see if there was a value set for 
"optimize", if so, no matter what it was, "commit" was set to "true":
  if (requestParams.containsKey("optimize")) {
optimize = Boolean.parseBoolean((String) requestParams.get("optimize"));
commit = true;
  }

But we had optimize=false set as an invariant so the simple presence of a value 
(false) caused a commit to happen. Changed it to this:
  if (requestParams.containsKey("optimize")) {
optimize = Boolean.parseBoolean((String) requestParams.get("optimize"));
if (optimize)
  commit = true;
  }

I think that should do it.

> DIH: The commit parameter is always defaulting to "true" even if "false" is 
> explicitly passed in.
> -
>
> Key: SOLR-1286
> URL: https://issues.apache.org/jira/browse/SOLR-1286
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Jay Hill
>Assignee: Erik Hatcher
> Fix For: 1.4
>
> Attachments: SOLR-1286.patch, SOLR-1286.patch, SOLR-1286.patch, 
> SOLR-1286.patch, SOLR-1286.patch, SOLR-1286.patch
>
>
> I've tried running full and delta imports with commit=false so that the 
> autoCommit will manage all commits to the index. However setting commit=false 
> doesn't have any effect: 
> curl 
> 'http://localhost:8080/solr/dataimporter?command=full-import&commit=false'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1288) better Trie* integration

2009-07-16 Thread Yonik Seeley (JIRA)
better Trie* integration


 Key: SOLR-1288
 URL: https://issues.apache.org/jira/browse/SOLR-1288
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Yonik Seeley
 Fix For: 1.4


Improve support for the Trie* fields up to the level of Solr's existing numeric 
types.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1288) better Trie* integration

2009-07-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732087#action_12732087
 ] 

Yonik Seeley commented on SOLR-1288:


At the top of my list is:
 - no toExterna(), etc... this causes toString() to not print out nice values 
in debugging... this will cause failures in other places

There are probably others... I haven't checked, but does date faceting work on 
trie-type dates?

Should the trie based date subclass DateField so we can have a common base 
class for all date related classes?  Seems like this would make it easier for 
separate components (like date faceting) to deal with.

> better Trie* integration
> 
>
> Key: SOLR-1288
> URL: https://issues.apache.org/jira/browse/SOLR-1288
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Yonik Seeley
> Fix For: 1.4
>
>
> Improve support for the Trie* fields up to the level of Solr's existing 
> numeric types.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1284) Use and implement new non-deprecated DocIdSetIterator methods

2009-07-16 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732090#action_12732090
 ] 

Shalin Shekhar Mangar commented on SOLR-1284:
-

Thanks for looking into this, Yonik. I should really have kept it on hold for 
you to review this stuff.

> Use and implement new non-deprecated DocIdSetIterator methods
> -
>
> Key: SOLR-1284
> URL: https://issues.apache.org/jira/browse/SOLR-1284
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 1.4
>
>
> next() and skipTo() should be changed to nextDoc() and advance()
> background: 
> http://search.lucidimagination.com/search/document/9962d317a2811096/latest_lucene_update

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1288) better Trie* integration

2009-07-16 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732096#action_12732096
 ] 

Shalin Shekhar Mangar commented on SOLR-1288:
-

bq. does date faceting work on trie-type dates?

Faceting itself does not work with any trie fields because of the encoded 
values in the index. So we need to implement indexedToReadable in TrieField. 
Also, since trie fields are multi-valued, we need to short-cut faceting on 
using the highest precision value.

> better Trie* integration
> 
>
> Key: SOLR-1288
> URL: https://issues.apache.org/jira/browse/SOLR-1288
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Yonik Seeley
> Fix For: 1.4
>
>
> Improve support for the Trie* fields up to the level of Solr's existing 
> numeric types.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1284) Use and implement new non-deprecated DocIdSetIterator methods

2009-07-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732094#action_12732094
 ] 

Yonik Seeley commented on SOLR-1284:


Nah, that's fine - I didn't realize how much you did get done.   I didn't think 
it was easily doable in patches and was planning something along the lines of 
updating the lucene jars in the morning and have everyone who could work on it 
until everything was working.  But this worked out just fine.

> Use and implement new non-deprecated DocIdSetIterator methods
> -
>
> Key: SOLR-1284
> URL: https://issues.apache.org/jira/browse/SOLR-1284
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 1.4
>
>
> next() and skipTo() should be changed to nextDoc() and advance()
> background: 
> http://search.lucidimagination.com/search/document/9962d317a2811096/latest_lucene_update

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1284) Use and implement new non-deprecated DocIdSetIterator methods

2009-07-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732119#action_12732119
 ] 

Yonik Seeley commented on SOLR-1284:


Oh, yeah - the other Lucene change that I thought might hide some lurking bugs 
was that now a MultiReader is *always* used (simplifies reopen logic).

> Use and implement new non-deprecated DocIdSetIterator methods
> -
>
> Key: SOLR-1284
> URL: https://issues.apache.org/jira/browse/SOLR-1284
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 1.4
>
>
> next() and skipTo() should be changed to nextDoc() and advance()
> background: 
> http://search.lucidimagination.com/search/document/9962d317a2811096/latest_lucene_update

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1283) Mark Invalid error on indexing

2009-07-16 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732126#action_12732126
 ] 

Grant Ingersoll commented on SOLR-1283:
---

We should make the buffer size configurable, I guess.  However, there's always 
the potential to go past it or use up a lot of memory in the meantime (if one 
is expecting really large files)

> Mark Invalid error on indexing
> --
>
> Key: SOLR-1283
> URL: https://issues.apache.org/jira/browse/SOLR-1283
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
> Environment: Ubuntu 8.04, Sun Java 6
>Reporter: solrize
>
> When indexing large (1 megabyte) documents I get a lot of exceptions with 
> stack traces like the below.  It happens both in the Solr 1.3 release and in 
> the July 9 1.4 nightly.  I believe this to NOT be the same issue as SOLR-42.  
> I found some further discussion on solr-user: 
> http://www.nabble.com/IOException:-Mark-invalid-while-analyzing-HTML-td17052153.html
>  
> In that discussion, Grant asked the original poster to open a Jira issue, but 
> I didn't see one so I'm opening one; please feel free to merge or close if 
> it's redundant. 
> My stack trace follows.
> Jul 15, 2009 8:36:42 AM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update params={} status=500 QTime=3 
> Jul 15, 2009 8:36:42 AM org.apache.solr.common.SolrException log
> SEVERE: java.io.IOException: Mark invalid
> at java.io.BufferedReader.reset(BufferedReader.java:485)
> at 
> org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171)
> at 
> org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728)
> at 
> org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742)
> at java.io.Reader.read(Reader.java:123)
> at 
> org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:108)
> at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:178)
> at 
> org.apache.lucene.analysis.standard.StandardFilter.next(StandardFilter.java:84)
> at 
> org.apache.lucene.analysis.LowerCaseFilter.next(LowerCaseFilter.java:53)
> at 
> org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:347)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159)
> at 
> org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:748)
>   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2512)
>   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2484)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240)
>   at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
>   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1292)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>   at org.mortbay.jetty.Server.handle(Server.java:285)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHa

[jira] Commented: (SOLR-1284) Use and implement new non-deprecated DocIdSetIterator methods

2009-07-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732132#action_12732132
 ] 

Yonik Seeley commented on SOLR-1284:


bq. Oh, yeah - the other Lucene change that I thought might hide some lurking 
bugs was that now a MultiReader is always used (simplifies reopen logic). 

Found it and committed - it was in distributed search code looking up sort 
field values.  It had assumed that there were no single segment MultiReaders.  
Would have caused FieldCache instantiation at two different levels again 
(doubling the memory size).

> Use and implement new non-deprecated DocIdSetIterator methods
> -
>
> Key: SOLR-1284
> URL: https://issues.apache.org/jira/browse/SOLR-1284
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 1.4
>
>
> next() and skipTo() should be changed to nextDoc() and advance()
> background: 
> http://search.lucidimagination.com/search/document/9962d317a2811096/latest_lucene_update

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1284) Use and implement new non-deprecated DocIdSetIterator methods

2009-07-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1284.


Resolution: Fixed

OK closing, I'm reasonably confident that we implement and use advance() and 
friends everywhere it's important.

> Use and implement new non-deprecated DocIdSetIterator methods
> -
>
> Key: SOLR-1284
> URL: https://issues.apache.org/jira/browse/SOLR-1284
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 1.4
>
>
> next() and skipTo() should be changed to nextDoc() and advance()
> background: 
> http://search.lucidimagination.com/search/document/9962d317a2811096/latest_lucene_update

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1283) Mark Invalid error on indexing

2009-07-16 Thread solrize (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732206#action_12732206
 ] 

solrize commented on SOLR-1283:
---

Right now I'm getting a ton of these errors.  It doesn't seem strictly 
dependent on the doc size.  If I can crank up the buffer size enough that the 
error happens only occasionally instead of frequently, that would be a big 
improvement over the present situation.  Thanks!

> Mark Invalid error on indexing
> --
>
> Key: SOLR-1283
> URL: https://issues.apache.org/jira/browse/SOLR-1283
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
> Environment: Ubuntu 8.04, Sun Java 6
>Reporter: solrize
>
> When indexing large (1 megabyte) documents I get a lot of exceptions with 
> stack traces like the below.  It happens both in the Solr 1.3 release and in 
> the July 9 1.4 nightly.  I believe this to NOT be the same issue as SOLR-42.  
> I found some further discussion on solr-user: 
> http://www.nabble.com/IOException:-Mark-invalid-while-analyzing-HTML-td17052153.html
>  
> In that discussion, Grant asked the original poster to open a Jira issue, but 
> I didn't see one so I'm opening one; please feel free to merge or close if 
> it's redundant. 
> My stack trace follows.
> Jul 15, 2009 8:36:42 AM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update params={} status=500 QTime=3 
> Jul 15, 2009 8:36:42 AM org.apache.solr.common.SolrException log
> SEVERE: java.io.IOException: Mark invalid
> at java.io.BufferedReader.reset(BufferedReader.java:485)
> at 
> org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171)
> at 
> org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728)
> at 
> org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742)
> at java.io.Reader.read(Reader.java:123)
> at 
> org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:108)
> at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:178)
> at 
> org.apache.lucene.analysis.standard.StandardFilter.next(StandardFilter.java:84)
> at 
> org.apache.lucene.analysis.LowerCaseFilter.next(LowerCaseFilter.java:53)
> at 
> org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:347)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159)
> at 
> org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:748)
>   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2512)
>   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2484)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240)
>   at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
>   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1292)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>   at org.mortbay.jetty.Server.handle(Server.java:285)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnectio

[jira] Created: (SOLR-1289) facet.method=enum for Trie*

2009-07-16 Thread Yonik Seeley (JIRA)
facet.method=enum for Trie*
---

 Key: SOLR-1289
 URL: https://issues.apache.org/jira/browse/SOLR-1289
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
 Fix For: 1.5


Implement enum faceting method for Trie*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1290) Implement single-valued FieldCache faceting for Trie*

2009-07-16 Thread Yonik Seeley (JIRA)
Implement single-valued FieldCache faceting for Trie*
-

 Key: SOLR-1290
 URL: https://issues.apache.org/jira/browse/SOLR-1290
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
 Fix For: 1.5


Implement single-valued FieldCache faceting for Trie* on it's nativeFieldCache 
form

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

2009-07-16 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731780#action_12731780
 ] 

Lance Norskog edited comment on SOLR-1229 at 7/16/09 3:35 PM:
--

Ok - these run. Thanks.

Just to make sure I understand. The 'pk' attribute declares 2 things:  
1) that this column must exist for a document to be generated, and 
2) that this entity is the level where documents are created. Is this true?

tmpid appears as an unused name merely so that ${x.id} is sent into solr_id. 
Maybe name="" would be more clear for this purpose?

Lance





  was (Author: lancenorskog):
Ok - these run. Thanks.

Just to make sure I understand. The 'pk' attribute declares 2 things:  
1) that this column must exist for a document to be generated, and 
2) that this entity is the level where documents are created. Is this true?

tmpid appears as an unused name merely so that ${x.id} is sent into solr_id. 
Maybe name="" would be more clear for this purpose?

Something is documented on the wiki but not used: multiple PKs in one entity.

On the wiki page, see the config file after "Writing a huge deltaQuery" - there 
is a attribute: 
{{{pk="ITEM_ID, CATEGORY_ID"}}}
There is code to parse this in DataImporter.InitEntity() and store the list in 
Entity.primaryKeys. But the list of PKs is never used. 

I think the use case for this is that the user requires more fields besides the 
uniqueKey for a document.  Is this right? This is definitely on my list of 
must-have features. The second field may or may not be declared "required" in 
the schema, so looking at the schema is not good enough. The field has to be 
declared "required" in the dataconfig.

Lance




  
> deletedPkQuery feature does not work when pk and uniqueKey field do not have 
> the same value
> ---
>
> Key: SOLR-1229
> URL: https://issues.apache.org/jira/browse/SOLR-1229
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
> Fix For: 1.4
>
> Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, 
> SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the 
> database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> 
>  
>  
>pk="board_id"
>transformer="TemplateTransformer"
>deletedPkQuery="select board_id from boards where deleted = 'Y'"
>query="select * from boards where deleted = 'N'"
>deltaImportQuery="select * from boards where deleted = 'N'"
>deltaQuery="select * from boards where deleted = 'N'"
>preImportDeleteQuery="datasource:board">
>  
>  
>  
>
>  
> 
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a 
> template board-.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In 
> our definition, unique key of Solr document is the primary key of the top 
> level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

2009-07-16 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732213#action_12732213
 ] 

Lance Norskog commented on SOLR-1229:
-

Something is documented on the wiki but not used: multiple PKs in one entity.

On the wiki page, see the config file after "Writing a huge deltaQuery" - there 
is a attribute: 
{pk="ITEM_ID, CATEGORY_ID"}
There is code to parse this in DataImporter.InitEntity() and store the list in 
Entity.primaryKeys. But this list of PKs is never used.

I think the use case for this is that the user requires more fields besides the 
uniqueKey for a document. Is this right? This is definitely on my list of 
must-have features. The second field may or may not be declared "required" in 
the schema, so looking at the schema is not good enough. The field has to be 
declared "required" in the dataconfig.

Lance

(I split this comment out from the previous since they are not related. 
However, this is another feature inside the same bit of code upon which we are 
endlessly chewing.)

> deletedPkQuery feature does not work when pk and uniqueKey field do not have 
> the same value
> ---
>
> Key: SOLR-1229
> URL: https://issues.apache.org/jira/browse/SOLR-1229
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
> Fix For: 1.4
>
> Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, 
> SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the 
> database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> 
>  
>  
>pk="board_id"
>transformer="TemplateTransformer"
>deletedPkQuery="select board_id from boards where deleted = 'Y'"
>query="select * from boards where deleted = 'N'"
>deltaImportQuery="select * from boards where deleted = 'N'"
>deltaQuery="select * from boards where deleted = 'N'"
>preImportDeleteQuery="datasource:board">
>  
>  
>  
>
>  
> 
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a 
> template board-.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In 
> our definition, unique key of Solr document is the primary key of the top 
> level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1229) deletedPkQuery feature does not work when pk and uniqueKey field do not have the same value

2009-07-16 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732213#action_12732213
 ] 

Lance Norskog edited comment on SOLR-1229 at 7/16/09 3:38 PM:
--

Something is documented on the wiki but not used: multiple PKs in one entity.

On the wiki page, see the config file after "Writing a huge deltaQuery" - there 
is a attribute: 
{pk="ITEM_ID, CATEGORY_ID"}
There is code to parse this in DataImporter.InitEntity() and store the list in 
Entity.primaryKeys. But this list of PKs is never used.

I think the use case for this is that the user requires more fields besides the 
uniqueKey for a document. Is this right? This is definitely on my list of 
must-have features. The second field may or may not be declared "required" in 
the schema, so looking at the schema is not good enough. The field has to be 
declared "required" in the dataconfig.

Lance

(I split this comment out from the previous since they are not related. 
However, this is another feature inside the same bit of code upon which we are 
ceaselessly chewing.)

  was (Author: lancenorskog):
Something is documented on the wiki but not used: multiple PKs in one 
entity.

On the wiki page, see the config file after "Writing a huge deltaQuery" - there 
is a attribute: 
{pk="ITEM_ID, CATEGORY_ID"}
There is code to parse this in DataImporter.InitEntity() and store the list in 
Entity.primaryKeys. But this list of PKs is never used.

I think the use case for this is that the user requires more fields besides the 
uniqueKey for a document. Is this right? This is definitely on my list of 
must-have features. The second field may or may not be declared "required" in 
the schema, so looking at the schema is not good enough. The field has to be 
declared "required" in the dataconfig.

Lance

(I split this comment out from the previous since they are not related. 
However, this is another feature inside the same bit of code upon which we are 
endlessly chewing.)
  
> deletedPkQuery feature does not work when pk and uniqueKey field do not have 
> the same value
> ---
>
> Key: SOLR-1229
> URL: https://issues.apache.org/jira/browse/SOLR-1229
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
> Fix For: 1.4
>
> Attachments: SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, 
> SOLR-1229.patch, SOLR-1229.patch, SOLR-1229.patch, tests.patch
>
>
> Problem doing a delta-import such that records marked as "deleted" in the 
> database are removed from Solr using deletedPkQuery.
> Here's a config I'm using against a mocked test database:
> {code:xml}
> 
>  
>  
>pk="board_id"
>transformer="TemplateTransformer"
>deletedPkQuery="select board_id from boards where deleted = 'Y'"
>query="select * from boards where deleted = 'N'"
>deltaImportQuery="select * from boards where deleted = 'N'"
>deltaQuery="select * from boards where deleted = 'N'"
>preImportDeleteQuery="datasource:board">
>  
>  
>  
>
>  
> 
> {code}
> Note that the uniqueKey in Solr is the "id" field.  And its value is a 
> template board-.
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In 
> our definition, unique key of Solr document is the primary key of the top 
> level entity".  This of course isn't really an appropriate assumption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1270) The FloatField (and probably others) field type takes any string value at index, but JSON writer outputs as numeric without checking

2009-07-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732220#action_12732220
 ] 

Hoss Man commented on SOLR-1270:



* FloatField is doing what it's suppose to (being fast and trusting the user 
input)
* JSONWriter is doing what it's suppose to (being fast and trusting that the 
data in the fields is valid.

The only way (i know of) to get invalid JSON output from the JSONWriter from a 
numeric field is for the client to index invalid data.

Garbage in, Garbage out.

If there is a way to make JSONWriter error on a *legal* float value, then there 
is a bug in JSONWriter and we should fix it.  but if the problem is users 
indexing bogus data, then either the users should clean up their data, or we 
should add a "ParanoidFloatField" that validates the input (at the expense of 
performance).

If you have suggestions for documentation improvements to make users more aware 
of their responsibilities when using FloatField (and IntField, etc...) they 
would definitely be appreciated.



> The FloatField (and probably others) field type takes any string value at 
> index, but JSON writer outputs as numeric without checking
> 
>
> Key: SOLR-1270
> URL: https://issues.apache.org/jira/browse/SOLR-1270
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.2, 1.3, 1.4
> Environment: ubuntu 8.04, sun java 6, tomcat 5.5
>Reporter: Donovan Jimenez
>Priority: Minor
>
> The FloatField field type takes any string value at index. These values 
> aren't necessarily in JSON numeric, but the JSON writer does not check its 
> validity before writing it out as a JSON numeric.
> I'm aware of the SortableFloatField which does do index time verification and 
> conversion of the value, but the way the JSON writer is working seemed like 
> either a bug that needed addressed or perhaps a gotch that needs better 
> documented?
> This issue originally came from my php client issue tracker: 
> http://code.google.com/p/solr-php-client/issues/detail?id=13

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-907) Include access to Lucene Source with 1.4 release

2009-07-16 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-907.
---

Resolution: Invalid
  Assignee: Hoss Man

resolving as invalid since what was asked for (a way to know exactly which 
"version" of lucene (official, or svnversion if unofficial) has already been 
part of solr for some time.

if someone wants to take this further (auto fetch in source jars from maven or 
what not) feel free to reopen.

> Include access to Lucene Source with 1.4 release
> 
>
> Key: SOLR-907
> URL: https://issues.apache.org/jira/browse/SOLR-907
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
>Reporter: Todd Feak
>Assignee: Hoss Man
>Priority: Minor
>
> If Solr 1.4 release with a non-release version of Lucene, please include some 
> way to access the exact source code for the Lucene libraries that are 
> included with Solr. This could take the form of Maven2 Repo source files, a 
> subversion location and revision number, including the source with the 
> distribution, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-750) DateField.parseMath doesn't handle non-existent Z

2009-07-16 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-750.
---

Resolution: Invalid

Solr does have timezone support: it supports UTC ... that may sound like a 
cop-out answer, but it's true.  DateField specifies that it only accepts UTC 
formated dates, and stores the UTC date in the index, it knows that a date it 
receives as input is UTC because it ends in "Z"

In the future, DateField might start allowing documents to be indexed with 
alternate timezone specifiers, and convert to UTC internally before writing to 
the index; or new options might get added at some point to allow query clients 
to specify what timezone they are in, and solr could convert all the internal 
dates to that timezone for them, etc... 

...if/when features like those get implemented, they can only work if there is 
a standardized internal format, and at hte moment the only way DateField can 
ensure that there is a standardized internal format is if it forces the clients 
updating the index to only send UTC dates.

bq. If I'm using Solr and want to feed it dates in a particular time zone, or 
perhaps a local-time of day, and clients expect this, then why should Solr 
force me to specify a timezone? I find it irritating.

there's nothing to stop you from lying to solr about the timezone.  If all of 
the update/search clinets for your instance are in on the secret that the times 
are really GMT-0730 even though solr thinks they are UTC, then no one gets hurt.

But if we droped the requirement that date inputs have the "Z" suffix, people 
would assume they can index stuff like  1995-12-31T23:59:59-07:30 and then be 
confused when it doesn't work.

> DateField.parseMath doesn't handle non-existent Z
> -
>
> Key: SOLR-750
> URL: https://issues.apache.org/jira/browse/SOLR-750
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-750_DateField_no_Z.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> I've run into situations when trying to use SOLR-540 (wildcard highlight 
> spec) such that if attempts to highlight a date field, I get a stack trace 
> from DateField.parseMath puking because there isn't a "Z" at the end of an 
> otherwise good date-time string.  It was very easy to fix the code to make it 
> react gracefully to no Z.  Attached is the patch.  This bug isn't really 
> related to SOLR-540 so please apply it without waiting for 540.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1271) Stopwords search with function query(_val_) in Solr

2009-07-16 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1271.


Resolution: Invalid

1) There's no bug here...

* the query parser ignores the stopwords completely when parsing, so your first 
example produces an empty query, which doesn't match anything
* function queries match all docs with varying scores (based on the function), 
so your second example produces a query that matches everything.

2) in the future, if you are getting behavior that doesn't make sense (but 
doens't produce an actual error message), please post a question to solr-user 
describing what you are trying to do and the behavior you are seeing before 
filing a bug in Jira.

> Stopwords search with function query(_val_) in Solr
> ---
>
> Key: SOLR-1271
> URL: https://issues.apache.org/jira/browse/SOLR-1271
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.3
>Reporter: arvind
>
> Consider the following cases :
> q=stopword1+stopword2 gives no results which is correct
> Now, if we modify the above query to use function query like,
> q=stopword1+stopword2 _val_:"rord(field)"  then Solr gives some results but, 
> ideally it should not
> Can anybody please have a look at this issue?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1291) implement Trie.toInternal/toExternal and friends

2009-07-16 Thread Yonik Seeley (JIRA)
implement Trie.toInternal/toExternal and friends


 Key: SOLR-1291
 URL: https://issues.apache.org/jira/browse/SOLR-1291
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Yonik Seeley
 Fix For: 1.4


TrieField needs to implement toInternal and friends or else it breaks for a lot 
of Solr features.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-769) Support Document and Search Result clustering

2009-07-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-769:
-

Assignee: (was: Yonik Seeley)

Assigning myself since I'm not sure when I'll be able to get back to this.
Issues remaining:
 - classloading issues after the hander was removed from solr.war
 - possible packaging issues that Grant brought up (the downloaded jars 
shouldn't be shipped)
 - update the Wiki once classloading works and we can generate the new example 
output


> Support Document and Search Result clustering
> -
>
> Key: SOLR-769
> URL: https://issues.apache.org/jira/browse/SOLR-769
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Priority: Minor
> Fix For: 1.4
>
> Attachments: clustering-componet-shard.patch, clustering-libs.tar, 
> clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, 
> SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
> SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
> SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
> SOLR-769.patch, SOLR-769.tar, SOLR-769.zip, subcluster-flattening.patch
>
>
> Clustering is a useful tool for working with documents and search results, 
> similar to the notion of dynamic faceting.  Carrot2 
> (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing 
> search results clustering.  Mahout (http://lucene.apache.org/mahout) is well 
> suited for whole-corpus clustering.  
> The patch I lays out a contrib module that starts off w/ an integration of a 
> SearchComponent for doing clustering and an implementation using Carrot.  In 
> search results mode, it will use the DocList as the input for the cluster.   
> While Carrot2 comes w/ a Solr input component, it is not the same as the 
> SearchComponent that I have in that the Carrot example actually submits a 
> query to Solr, whereas my SearchComponent is just chained into the Component 
> list and uses the ResponseBuilder to add in the cluster results.
> While not fully fleshed out yet, the collection based mode will take in a 
> list of ids or just use the whole collection and will produce clusters.  
> Since this is a longer, typically offline task, there will need to be some 
> type of storage mechanism (and replication??) for the clusters.  I _may_ 
> push this off to a separate JIRA issue, but I at least want to present the 
> use case as part of the design of this component/contrib.  It may even make 
> sense that we split this out, such that the building piece is something like 
> an UpdateProcessor and then the SearchComponent just acts as a lookup 
> mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-769) Support Document and Search Result clustering

2009-07-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732277#action_12732277
 ] 

Yonik Seeley edited comment on SOLR-769 at 7/16/09 5:53 PM:


un-assigning myself since I'm not sure when I'll be able to get back to this.
Issues remaining:
 - classloading issues after the hander was removed from solr.war
 - possible packaging issues that Grant brought up (the downloaded jars 
shouldn't be shipped)
 - update the Wiki once classloading works and we can generate the new example 
output


  was (Author: ysee...@gmail.com):
Assigning myself since I'm not sure when I'll be able to get back to this.
Issues remaining:
 - classloading issues after the hander was removed from solr.war
 - possible packaging issues that Grant brought up (the downloaded jars 
shouldn't be shipped)
 - update the Wiki once classloading works and we can generate the new example 
output

  
> Support Document and Search Result clustering
> -
>
> Key: SOLR-769
> URL: https://issues.apache.org/jira/browse/SOLR-769
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Priority: Minor
> Fix For: 1.4
>
> Attachments: clustering-componet-shard.patch, clustering-libs.tar, 
> clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, 
> SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
> SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
> SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
> SOLR-769.patch, SOLR-769.tar, SOLR-769.zip, subcluster-flattening.patch
>
>
> Clustering is a useful tool for working with documents and search results, 
> similar to the notion of dynamic faceting.  Carrot2 
> (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing 
> search results clustering.  Mahout (http://lucene.apache.org/mahout) is well 
> suited for whole-corpus clustering.  
> The patch I lays out a contrib module that starts off w/ an integration of a 
> SearchComponent for doing clustering and an implementation using Carrot.  In 
> search results mode, it will use the DocList as the input for the cluster.   
> While Carrot2 comes w/ a Solr input component, it is not the same as the 
> SearchComponent that I have in that the Carrot example actually submits a 
> query to Solr, whereas my SearchComponent is just chained into the Component 
> list and uses the ResponseBuilder to add in the cluster results.
> While not fully fleshed out yet, the collection based mode will take in a 
> list of ids or just use the whole collection and will produce clusters.  
> Since this is a longer, typically offline task, there will need to be some 
> type of storage mechanism (and replication??) for the clusters.  I _may_ 
> push this off to a separate JIRA issue, but I at least want to present the 
> use case as part of the design of this component/contrib.  It may even make 
> sense that we split this out, such that the building piece is something like 
> an UpdateProcessor and then the SearchComponent just acts as a lookup 
> mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1292) show lucene fieldcache entries and sizes

2009-07-16 Thread Yonik Seeley (JIRA)
show lucene fieldcache entries and sizes


 Key: SOLR-1292
 URL: https://issues.apache.org/jira/browse/SOLR-1292
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
 Fix For: 1.4


See LUCENE-1749, FieldCache introspection API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1292) show lucene fieldcache entries and sizes

2009-07-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-1292:
--

Assignee: Mark Miller

> show lucene fieldcache entries and sizes
> 
>
> Key: SOLR-1292
> URL: https://issues.apache.org/jira/browse/SOLR-1292
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Mark Miller
> Fix For: 1.4
>
>
> See LUCENE-1749, FieldCache introspection API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1270) The FloatField (and probably others) field type takes any string value at index, but JSON writer outputs as numeric without checking

2009-07-16 Thread Matt Schraeder (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732294#action_12732294
 ] 

Matt Schraeder commented on SOLR-1270:
--

The data being indexed is valid.  It is a float value less than 1.  This means 
a "0.0" or a "0.5" or the like.  The JSONWriter is outputting ".5" and ".0" 
rather than with the leading zero.  This causes an invalid JSON encode because 
".5" is not a valid float in JSON. You need the leading 0 before the decimal.

You can verify this in the example code that Donovan wrote.

> The FloatField (and probably others) field type takes any string value at 
> index, but JSON writer outputs as numeric without checking
> 
>
> Key: SOLR-1270
> URL: https://issues.apache.org/jira/browse/SOLR-1270
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.2, 1.3, 1.4
> Environment: ubuntu 8.04, sun java 6, tomcat 5.5
>Reporter: Donovan Jimenez
>Priority: Minor
>
> The FloatField field type takes any string value at index. These values 
> aren't necessarily in JSON numeric, but the JSON writer does not check its 
> validity before writing it out as a JSON numeric.
> I'm aware of the SortableFloatField which does do index time verification and 
> conversion of the value, but the way the JSON writer is working seemed like 
> either a bug that needed addressed or perhaps a gotch that needs better 
> documented?
> This issue originally came from my php client issue tracker: 
> http://code.google.com/p/solr-php-client/issues/detail?id=13

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-07-16 Thread Linbin Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732318#action_12732318
 ] 

Linbin Chen commented on SOLR-1277:
---

good idea and good project



> Implement a Solr specific naming service (using Zookeeper)
> --
>
> Key: SOLR-1277
> URL: https://issues.apache.org/jira/browse/SOLR-1277
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.5
>
> Attachments: log4j-1.2.15.jar, SOLR-1277.patch, zookeeper-3.2.0.jar
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The goal is to give Solr server clusters self-healing attributes
> where if a server fails, indexing and searching don't stop and
> all of the partitions remain searchable. For configuration, the
> ability to centrally deploy a new configuration without servers
> going offline.
> We can start with basic failover and start from there?
> Features:
> * Automatic failover (i.e. when a server fails, clients stop
> trying to index to or search it)
> * Centralized configuration management (i.e. new solrconfig.xml
> or schema.xml propagates to a live Solr cluster)
> * Optionally allow shards of a partition to be moved to another
> server (i.e. if a server gets hot, move the hot segments out to
> cooler servers). Ideally we'd have a way to detect hot segments
> and move them seamlessly. With NRT this becomes somewhat more
> difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1270) The FloatField (and probably others) field type takes any string value at index, but JSON writer outputs as numeric without checking

2009-07-16 Thread Donovan Jimenez (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732330#action_12732330
 ] 

Donovan Jimenez commented on SOLR-1270:
---

Matt is refering to the code I posted in 
http://code.google.com/p/solr-php-client/issues/detail?id=13#c8 Where I indexed 
php values null, 0, "0", and ".0" as FloatField.   

I agree with Hoss here, the ".0" or null IS garbage according to the JSON 
numeric (json.org) and Java Floating Point Literal specifations 
(http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#230798).
   

What I don't understand is why it wouldn't be beneficial for the FloatField 
type to use Float.parseFloat() for value checking at index time. My opinion is 
that the pro of directly letting the user know their input document does not 
match the expectations of their schema outweighs the con of  the time it takes 
to parse the value.  It doesn't remove the need for separate documentation, but 
getting the error at index time makes more sense then getting a JSON parsing 
error when querying. The cause and effect become less detached.


> The FloatField (and probably others) field type takes any string value at 
> index, but JSON writer outputs as numeric without checking
> 
>
> Key: SOLR-1270
> URL: https://issues.apache.org/jira/browse/SOLR-1270
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.2, 1.3, 1.4
> Environment: ubuntu 8.04, sun java 6, tomcat 5.5
>Reporter: Donovan Jimenez
>Priority: Minor
>
> The FloatField field type takes any string value at index. These values 
> aren't necessarily in JSON numeric, but the JSON writer does not check its 
> validity before writing it out as a JSON numeric.
> I'm aware of the SortableFloatField which does do index time verification and 
> conversion of the value, but the way the JSON writer is working seemed like 
> either a bug that needed addressed or perhaps a gotch that needs better 
> documented?
> This issue originally came from my php client issue tracker: 
> http://code.google.com/p/solr-php-client/issues/detail?id=13

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1272) Java Replication does not log actions

2009-07-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732340#action_12732340
 ] 

Noble Paul commented on SOLR-1272:
--

is this a bug? should we close it?

> Java Replication does not log actions
> -
>
> Key: SOLR-1272
> URL: https://issues.apache.org/jira/browse/SOLR-1272
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Affects Versions: 1.4
>Reporter: Lance Norskog
>Assignee: Noble Paul
> Fix For: 1.4
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Java Replication actions are not logged. There is no trail of full and 
> partial replications.
> All full and partial replications, failed replications, and communication 
> failures should be logged in solr/logs/ the way that the script replication 
> system logs activity.
> This is a basic requirement for production use. If such a log does exist, 
> please document it on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1270) The FloatField (and probably others) field type takes any string value at index, but JSON writer outputs as numeric without checking

2009-07-16 Thread Matt Schraeder (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732344#action_12732344
 ] 

Matt Schraeder commented on SOLR-1270:
--

Let me clarify a bit, because I don't think I came across how I meant to.  
There are two issues at work here.

1) The fact that the index lets you add invalid data to an index. Solr should 
either do it's best to parse the value as a float that it's expecting, or it 
should throw an error saying you gave it invalid data that doesn't match the 
field.  If speed is more important that verification, that's your decision and 
I can agree with that

2) When I WAS passing in valid data to the index, I was passing in small float 
values such as 0.0 and 0.5. Basically, any value < 1.  Solr's JSONWriter wasn't 
returning these values as 0.0 or 0.5, which would be the proper return.  They 
were returning the value without the leading 0.  By not having the leading 0, 
php's JSON decode fails because the value it's receiving is ".0" or ".5".  The 
period before hand it interprets as a string, rather than a decimal, but as 
long as there was a number before the decimal it was fine (1.5 came out as 1.5 
and was interpreted as a float by JSON decode properly.

These, in my opinion, are the same bug: JSON Writer is returning invalid JSON.  
In issue 1, yes it's because of invalid data in the index.  If the index is bad 
I cannot expect the JSON to be valid.  In issue 2 the data in SOLR is valid and 
stored/returned properly as 0.5 with the leading 0, but in the JSON not having 
the leading 0 is breaking validation and keeping the user from being able to 
properly decode the JSON string.

> The FloatField (and probably others) field type takes any string value at 
> index, but JSON writer outputs as numeric without checking
> 
>
> Key: SOLR-1270
> URL: https://issues.apache.org/jira/browse/SOLR-1270
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.2, 1.3, 1.4
> Environment: ubuntu 8.04, sun java 6, tomcat 5.5
>Reporter: Donovan Jimenez
>Priority: Minor
>
> The FloatField field type takes any string value at index. These values 
> aren't necessarily in JSON numeric, but the JSON writer does not check its 
> validity before writing it out as a JSON numeric.
> I'm aware of the SortableFloatField which does do index time verification and 
> conversion of the value, but the way the JSON writer is working seemed like 
> either a bug that needed addressed or perhaps a gotch that needs better 
> documented?
> This issue originally came from my php client issue tracker: 
> http://code.google.com/p/solr-php-client/issues/detail?id=13

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.