Re: lucene4.0 release

2012-07-05 Thread Andi Vajda

On Jul 6, 2012, at 0:27, Roman Chyla roman.ch...@gmail.com wrote:

 Lucene is 4.0 in alpha release and we would like to start working with
 pylucene4.0 already. I checked out the pylucene trunk and made the
 necessary changes so that it compiles. Would it be possible to
 incorporate (some of) these changes?

Absolutely, please send a patch to the list or file a bug and attach it there.

The issue with a PyLucene 4.0 release is not so much getting it to compile and 
run but rewriting all the tests and samples (originally ported from Java) since 
the Lucene api changed in many ways. That's a large amount of work and some of 
the new analyzer/tokenizer framework stuff needs some new jcc support for 
generating classes on the fly. I've got that written to some extent already but 
porting the samples and tests again is daunting.

Andi..

 
 Thanks,
 
  Roman


Re: lucene4.0 release

2012-07-05 Thread Roman Chyla
The patch probably probably didn't make it to the list, I'll file a ticket later

It is definitely lot of work with the python code, I have gone through
1.5 test cases now, and it is just 'unpleasant', so many API changes
out there - but I'll try to convert more

roman

On Thu, Jul 5, 2012 at 7:48 PM, Andi Vajda va...@apache.org wrote:

 On Jul 6, 2012, at 0:27, Roman Chyla roman.ch...@gmail.com wrote:

 Lucene is 4.0 in alpha release and we would like to start working with
 pylucene4.0 already. I checked out the pylucene trunk and made the
 necessary changes so that it compiles. Would it be possible to
 incorporate (some of) these changes?

 Absolutely, please send a patch to the list or file a bug and attach it there.

 The issue with a PyLucene 4.0 release is not so much getting it to compile 
 and run but rewriting all the tests and samples (originally ported from Java) 
 since the Lucene api changed in many ways. That's a large amount of work and 
 some of the new analyzer/tokenizer framework stuff needs some new jcc support 
 for generating classes on the fly. I've got that written to some extent 
 already but porting the samples and tests again is daunting.

 Andi..


 Thanks,

  Roman


[jira] [Comment Edited] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized

2012-07-05 Thread Bernd Fehling (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406866#comment-13406866
 ] 

Bernd Fehling edited comment on SOLR-3377 at 7/5/12 6:10 AM:
-

I was willing to supply a final fix to this and was hoping that it will make it 
to release 4.x.
But unfortunately:
- I got no enhanced unit test
- noone comitted this/my patch either
- the problem is still there

So I said was willing, thats true, I gave up on this and thinking now about 
switching to ElasticSearch because they really appreciate any help.


  was (Author: befehl):
I was willing to supply a final fix to this and was hoping that it will 
make it to release 4.x.
But unfortunately:
- I got no enhanced unit test
- noone comitted this/my patch either
- the problem is still there
So I said was willing, thats true, I gave up on this and thinking now about 
switching to ElasticSearch because they really appreciate any help.

  
 eDismax: A fielded query wrapped by parens is not recognized
 

 Key: SOLR-3377
 URL: https://issues.apache.org/jira/browse/SOLR-3377
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Jan Høydahl
 Fix For: 4.0, 3.6.1

 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
 SOLR-3377.patch


 As reported by bernd on the user list, a query like this
 {{q=(name:test)}}
 will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized

2012-07-05 Thread Bernd Fehling (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406866#comment-13406866
 ] 

Bernd Fehling commented on SOLR-3377:
-

I was willing to supply a final fix to this and was hoping that it will make it 
to release 4.x.
But unfortunately:
- I got no enhanced unit test
- noone comitted this/my patch either
- the problem is still there
So I said was willing, thats true, I gave up on this and thinking now about 
switching to ElasticSearch because they really appreciate any help.


 eDismax: A fielded query wrapped by parens is not recognized
 

 Key: SOLR-3377
 URL: https://issues.apache.org/jira/browse/SOLR-3377
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Jan Høydahl
 Fix For: 4.0, 3.6.1

 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
 SOLR-3377.patch


 As reported by bernd on the user list, a query like this
 {{q=(name:test)}}
 will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-trunk - Build # 1981 - Still Failing

2012-07-05 Thread Chris Male
Thanks Uwe.

Maybe I'm blind but I can't really see any inner classes in this test (or
the classes it extends).  The file names seems to contain the parameters
used to run the test method, I'm not sure where these values are taken from
so I don't know how to compress them.

On Thu, Jul 5, 2012 at 5:55 PM, Uwe Schindler u...@thetaphi.de wrote:

 Hi Chris,

 ** **

 See my mail from yesterday: Clover does not have a problem with the code,
 it is more that the code nesting in this test is so deep, that the filename
 generated is too long for the underlying os.

 ** **

 Maybe make test simplier with less code nesting (inner-inner-inner
 classes, too long variable names). The file name seems to be generated by
 that.

 ** **

 -

 Uwe Schindler

 H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de

 eMail: u...@thetaphi.de

 ** **

 *From:* Chris Male [mailto:gento...@gmail.com]
 *Sent:* Thursday, July 05, 2012 7:41 AM
 *To:* dev@lucene.apache.org
 *Subject:* Re: [JENKINS] Lucene-trunk - Build # 1981 - Still Failing

 ** **

 I don't really get what is going on here, apart from that Clover is
 failing due to this test.  The file it is looking for seems crazy, is that
 expected?

 ** **

 Having looked at the test, I haven't come across the @ParametersFactory
 annotation before but maybe that somehow doesn't work with Clover?

 On Thu, Jul 5, 2012 at 5:34 PM, Apache Jenkins Server 
 jenk...@builds.apache.org wrote:

 Build: https://builds.apache.org/job/Lucene-trunk/1981/

 All tests passed

 Build Log:
 [...truncated 37937 lines...]

 [...truncated 37937 lines...]

 [...truncated 37937 lines...]

 [...truncated 37937 lines...]

 [...truncated 37937 lines...]

 [...truncated 37937 lines...]




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 

 ** **

 --
 Chris Male




-- 
Chris Male | Software Developer | DutchWorks | www.dutchworks.nl


RE: [JENKINS] Lucene-trunk - Build # 1981 - Still Failing

2012-07-05 Thread Uwe Schindler
Let's trick clover, by adding magic comments:

 

///CLOVER:OFF

. code .

///CLOVER:ON

 

See
https://confluence.atlassian.com/display/CLOVER026/Using+Source+Directives

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Chris Male [mailto:gento...@gmail.com] 
Sent: Thursday, July 05, 2012 8:14 AM
To: dev@lucene.apache.org
Subject: Re: [JENKINS] Lucene-trunk - Build # 1981 - Still Failing

 

Thanks Uwe.

 

Maybe I'm blind but I can't really see any inner classes in this test (or
the classes it extends).  The file names seems to contain the parameters
used to run the test method, I'm not sure where these values are taken from
so I don't know how to compress them.

On Thu, Jul 5, 2012 at 5:55 PM, Uwe Schindler u...@thetaphi.de wrote:

Hi Chris,

 

See my mail from yesterday: Clover does not have a problem with the code, it
is more that the code nesting in this test is so deep, that the filename
generated is too long for the underlying os.

 

Maybe make test simplier with less code nesting (inner-inner-inner classes,
too long variable names). The file name seems to be generated by that.

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de http://www.thetaphi.de/ 

eMail: u...@thetaphi.de

 

From: Chris Male [mailto:gento...@gmail.com] 
Sent: Thursday, July 05, 2012 7:41 AM
To: dev@lucene.apache.org
Subject: Re: [JENKINS] Lucene-trunk - Build # 1981 - Still Failing

 

I don't really get what is going on here, apart from that Clover is failing
due to this test.  The file it is looking for seems crazy, is that expected?

 

Having looked at the test, I haven't come across the @ParametersFactory
annotation before but maybe that somehow doesn't work with Clover?

On Thu, Jul 5, 2012 at 5:34 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:

Build: https://builds.apache.org/job/Lucene-trunk/1981/

All tests passed

Build Log:
[...truncated 37937 lines...]

[...truncated 37937 lines...]

[...truncated 37937 lines...]

[...truncated 37937 lines...]

[...truncated 37937 lines...]

[...truncated 37937 lines...]




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org





 

-- 
Chris Male





 

-- 
Chris Male | Software Developer | DutchWorks | www.dutchworks.nl



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406874#comment-13406874
 ] 

Shai Erera commented on LUCENE-4190:


What if we had an object called IndexFileNames with a method accept(String 
name), that returns true if the file is recognized, false otherwise - that 
could give applications a way to create a recognized-set of index files:
* Lucene would provide a DefaultIndexFileNames which recognizes all non-codec 
files
* Either the app would provide an extension to the default (or a wrapper) which 
recognizes its codec files as well
** Or, we make the Codec responsible for recognizing files too, and then the 
code would just query the Codec for non-default index files.

Either way, it seems like we can very easily recognize what are index files and 
what aren't.

When files need to be deleted, it seems simple as well:
* Lucene lists all files in the directory
* Any file that is referenced by the index (I assume we still know which files 
are needed right?) is kept
* Any other file is queried against IndexFileNames.accept and if it is 
accepted, it's deleted, otherwise it's left alone.

Since this looks too simple to me, I'm assuming that I'm missing something. If 
so, can someone please clarify the problem to me?

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Gilad Barkai (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406877#comment-13406877
 ] 

Gilad Barkai commented on LUCENE-4190:
--

Perhaps out of context, but here goes..
Users sometimes do stupid things, me included, such as putting the index in a 
non-dedicated-directory. But should they pay the penalty just because the code 
should not get overly complicated?

Codecs create their own files, and no one seems able to control what files they 
create (other than in assert?); Than, is it possible for the codec to handle 
the removal of the files it created? 

That would make codecs work the same way the Index handles the 'core' index 
files - each codec will be able to erase its own.
Another closely related option - let IW consult with the codecs about 
'non-core-files' and see which one should/could be removed.

I only suggest this because I fear for users' files which might get erased. 

Disclosure:
It'll be ages before I understand Lucene 4 half as much as I do Lucene 3.6 (not 
that that's much), so forgive me if I stepped on anyone's toes, or just 
described how to implement a time  machine :)

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread selckin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406879#comment-13406879
 ] 

selckin commented on LUCENE-4190:
-

what if you accidentally call deleteAll() on your production index, maybe old 
commit points should not be deleted until after a period of 30 days

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread selckin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406879#comment-13406879
 ] 

selckin edited comment on LUCENE-4190 at 7/5/12 6:54 AM:
-

edit: remove unhelpful sarcasm, sorry

  was (Author: selckin):
what if you accidentally call deleteAll() on your production index, maybe 
old commit points should not be deleted until after a period of 30 days
  
 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4094) Randomize file.encoding

2012-07-05 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406902#comment-13406902
 ] 

Dawid Weiss commented on LUCENE-4094:
-

Follow-up discussion wrt overriding file.encoding:
http://markmail.org/message/q4eeac7q6fjalbtd

 Randomize file.encoding
 ---

 Key: LUCENE-4094
 URL: https://issues.apache.org/jira/browse/LUCENE-4094
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial

 Stated in the code:
 {code}
 // TODO we can't randomize this yet (it drives ant crazy) but this makes 
 tests reproduce
 // in case machines have different default charsets...
 sb.append( -Dargs=\-Dfile.encoding= + 
 System.getProperty(file.encoding) + \);
 {code}
 But this should work without any problems with junit4 because communication 
 streams are separate and we're decoding output properly (or so I hope). 
 Try and see what happens :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized

2012-07-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406923#comment-13406923
 ] 

Jan Høydahl commented on SOLR-3377:
---

Bernd, I agree that this is a bug that absolutely should be fixed.

I have followed it through this far but have not yet had the chance to go the 
last mile until committing, but I am definitely keen to pick it up again after 
summer holidays and parental leave, hopefully before. The reason I unassigned 
myself is to signal to the other committers that I'm not actively working on 
this and let others step in if they wish.

This is the way Apache works - we are all volunteers, and I am sure that with 
some patience this will make it through in time for 4.0 final. You've done a 
great job so far with the patch. It may be final and good to go, but 
personally I'd write some more tests since this particular area has been 
lacking - before committing.

 eDismax: A fielded query wrapped by parens is not recognized
 

 Key: SOLR-3377
 URL: https://issues.apache.org/jira/browse/SOLR-3377
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Jan Høydahl
 Fix For: 4.0, 3.6.1

 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
 SOLR-3377.patch


 As reported by bernd on the user list, a query like this
 {{q=(name:test)}}
 will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized

2012-07-05 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3377:
--

Priority: Critical  (was: Major)

Upgrading priority to signal the severity - i.e. a valid user query may return 
0 hits, which may be pretty critical for some.

 eDismax: A fielded query wrapped by parens is not recognized
 

 Key: SOLR-3377
 URL: https://issues.apache.org/jira/browse/SOLR-3377
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Jan Høydahl
Priority: Critical
 Fix For: 4.0, 3.6.1

 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
 SOLR-3377.patch


 As reported by bernd on the user list, a query like this
 {{q=(name:test)}}
 will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4094) Randomize file.encoding

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406946#comment-13406946
 ] 

Robert Muir commented on LUCENE-4094:
-

I totally disagree with everything the jdk developers are saying. They tend to 
just whine when we find bugs in their shit.

we should continue to do this: its important to seek out these default charset 
bugs (this is because of their stupid design).


 Randomize file.encoding
 ---

 Key: LUCENE-4094
 URL: https://issues.apache.org/jira/browse/LUCENE-4094
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial

 Stated in the code:
 {code}
 // TODO we can't randomize this yet (it drives ant crazy) but this makes 
 tests reproduce
 // in case machines have different default charsets...
 sb.append( -Dargs=\-Dfile.encoding= + 
 System.getProperty(file.encoding) + \);
 {code}
 But this should work without any problems with junit4 because communication 
 streams are separate and we're decoding output properly (or so I hope). 
 Try and see what happens :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4191) Lucene doc pages redirect to api-4_0_0-ALPHA which results in 404

2012-07-05 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4191.
-

Resolution: Won't Fix

Don't use these /api links

 Lucene doc pages redirect to api-4_0_0-ALPHA which results in 404
 ---

 Key: LUCENE-4191
 URL: https://issues.apache.org/jira/browse/LUCENE-4191
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.6
Reporter: Chaim Peck
  Labels: documentation

 Try to go to this URL:
 http://lucene.apache.org/solr/api/org/apache/solr/analysis/BaseTokenFilterFactory.html
 The result is that you will be redirected here, which is a 404:
 http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/analysis/BaseTokenFilterFactory.html
 You can still get to the page from google cache:
 http://webcache.googleusercontent.com/search?q=cache:mCJCac4iZ0QJ:lucene.apache.org/solr/api/org/apache/solr/analysis/BaseTokenFilterFactory.html+cd=1hl=enct=clnkgl=us

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-4.x - Build # 27 - Still Failing

2012-07-05 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-4.x/27/

All tests passed

Build Log:
[...truncated 38243 lines...]

[...truncated 38243 lines...]

[...truncated 38243 lines...]

[...truncated 38243 lines...]

[...truncated 38243 lines...]

[...truncated 38243 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4094) Randomize file.encoding

2012-07-05 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406949#comment-13406949
 ] 

Dawid Weiss commented on LUCENE-4094:
-

I understand their argument (combination not encountered in practice) but I 
disagree with the claim it should justify crappy code. The default charset 
should be independent of the OS-filesystem interaction. It should just work 
with UTF-16.

Anyway, when I run our stuff with enforced UTF-16 lots of weird things start to 
happen. new FileReader(file), benchmarks run forever (will provide a seed) and 
such. I'll commit in one by one and then we can start testing/ fixing locally.

 Randomize file.encoding
 ---

 Key: LUCENE-4094
 URL: https://issues.apache.org/jira/browse/LUCENE-4094
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial

 Stated in the code:
 {code}
 // TODO we can't randomize this yet (it drives ant crazy) but this makes 
 tests reproduce
 // in case machines have different default charsets...
 sb.append( -Dargs=\-Dfile.encoding= + 
 System.getProperty(file.encoding) + \);
 {code}
 But this should work without any problems with junit4 because communication 
 streams are separate and we're decoding output properly (or so I hope). 
 Try and see what happens :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406951#comment-13406951
 ] 

Robert Muir commented on LUCENE-4190:
-

Again: I am totally against complicated file handling here for this reason. 
People can handle this some other way in their apps.
We *HAVE* to keep this kind of code simple and maintainable in lucene.

It was a mistake to start a slippery slope by being friendly at all to this. (I 
reverted)

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4193) Update Lucene FAQ regarding index-time field boosting

2012-07-05 Thread Elmer van Chastelet (JIRA)
Elmer van Chastelet created LUCENE-4193:
---

 Summary: Update Lucene FAQ regarding index-time field boosting
 Key: LUCENE-4193
 URL: https://issues.apache.org/jira/browse/LUCENE-4193
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/website
Reporter: Elmer van Chastelet
Priority: Minor


Current FAQ says the following regarding index-time field boosts:

{quote}Index time field boosts are worthless if you set them on every 
document.{quote}
see [the 
FAQ|http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_difference_between_field_.28or_document.29_boosting_and_query_boosting.3F].

I think, this should be changed to {quote}Index time field boosts are worthless 
if you set them on every document _and solely search on this field at query 
time_.{quote}

Because, when searching on _multiple_ fields, a match in a properly index-time 
boosted field will score higher than a match in a non-boosted field.

See [this discussion|https://forum.hibernate.org/viewtopic.php?f=9t=1016615] 
on Hibernate Search forums.

Not sure if there are more places where similar statements are made.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Solr-4.x - Build # 28 - Still Failing

2012-07-05 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Solr-4.x/28/

No tests ran.

Build Log:
[...truncated 8055 lines...]

[...truncated 8055 lines...]

[...truncated 8055 lines...]

[...truncated 8055 lines...]

[...truncated 8055 lines...]

[...truncated 8055 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4110) Report long periods of forked jvm inactivity (hung tests/ suites).

2012-07-05 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-4110.
-

Resolution: Fixed

 Report long periods of forked jvm inactivity (hung tests/ suites).
 --

 Key: LUCENE-4110
 URL: https://issues.apache.org/jira/browse/LUCENE-4110
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
 Fix For: 5.0


 https://github.com/carrotsearch/randomizedtesting/issues/106
 I'll see what can be done about it (had some thoughts on the way back to the 
 hotel and I think it's doable).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4189) Test output should include timestamps (start/end for each test/ suite).

2012-07-05 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-4189.
-

Resolution: Fixed

 Test output should include timestamps (start/end for each test/ suite).
 ---

 Key: LUCENE-4189
 URL: https://issues.apache.org/jira/browse/LUCENE-4189
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0, 5.0


 This adds more verboseness to the output -- should this be optional 
 (overrideable using local properties but defaulting to 'off')?
 {code}
[junit4] [11:54:50.259] Suite: org.apache.lucene.index.TestDeletionPolicy
[junit4] [11:54:53.706] Completed in 3.45s, 6 tests
[junit4]  
[junit4] [11:54:53.709] Suite: org.apache.lucene.util.TestVirtualMethod
[junit4] [11:54:53.725] Completed in 0.02s, 2 tests
[junit4]  
[junit4] [11:54:53.728] Suite: org.apache.lucene.index.TestRollingUpdates
[junit4] [11:54:55.700] Completed in 1.97s, 2 tests
[junit4]  
[junit4] [11:54:55.721] Suite: 
 org.apache.lucene.index.TestIndexWriterExceptions
[junit4] [11:55:02.394] Completed in 6.67s, 24 tests
[junit4]  
[junit4] [11:55:02.398] Suite: org.apache.lucene.index.TestNoDeletionPolicy
[junit4] [11:55:02.548] Completed in 0.15s, 4 tests
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4110) Report long periods of forked jvm inactivity (hung tests/ suites).

2012-07-05 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-4110:


Fix Version/s: 4.0

 Report long periods of forked jvm inactivity (hung tests/ suites).
 --

 Key: LUCENE-4110
 URL: https://issues.apache.org/jira/browse/LUCENE-4110
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
 Fix For: 4.0, 5.0


 https://github.com/carrotsearch/randomizedtesting/issues/106
 I'll see what can be done about it (had some thoughts on the way back to the 
 hotel and I think it's doable).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Clean your workspace (jars update)

2012-07-05 Thread Dawid Weiss
I've updated randomizedtesting to 1.6.0. ant clean resolve, please.
Also, a few requested things have made it into this commit:

* Timestamps on suites/tests (disabled by default, enable in your
local props or -Dtests.timestamps=on).
  https://issues.apache.org/jira/browse/LUCENE-4189

* Long-running/ hung tests will report back to the console now (every
60 seconds).
  https://issues.apache.org/jira/browse/LUCENE-4110

* [IMPORTANT] The forked JVM's file.encoding property will be
randomized between the following three:
  US-ASCII, ISO-8859-1, UTF-8, (your platform's default).

The last one is an important change and it may (will) break tests.
Please help out in fixing default encoding-sensitive things both in
tests and
in code. If you have a Windows machine (or Java 1.6 JVM) you can go with:

ant -Dtests.file.encoding=UTF-16

this will most likely break anything that expects lower ASCII range
(which is unfortunately the same in all the above randomized
encodings).

Any problems, requests, ideas, feedback -- speak up.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4094) Randomize file.encoding

2012-07-05 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-4094:


Fix Version/s: 5.0
   4.0

 Randomize file.encoding
 ---

 Key: LUCENE-4094
 URL: https://issues.apache.org/jira/browse/LUCENE-4094
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0, 5.0


 Stated in the code:
 {code}
 // TODO we can't randomize this yet (it drives ant crazy) but this makes 
 tests reproduce
 // in case machines have different default charsets...
 sb.append( -Dargs=\-Dfile.encoding= + 
 System.getProperty(file.encoding) + \);
 {code}
 But this should work without any problems with junit4 because communication 
 streams are separate and we're decoding output properly (or so I hope). 
 Try and see what happens :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4094) Randomize file.encoding

2012-07-05 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-4094.
-

Resolution: Fixed

 Randomize file.encoding
 ---

 Key: LUCENE-4094
 URL: https://issues.apache.org/jira/browse/LUCENE-4094
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0, 5.0


 Stated in the code:
 {code}
 // TODO we can't randomize this yet (it drives ant crazy) but this makes 
 tests reproduce
 // in case machines have different default charsets...
 sb.append( -Dargs=\-Dfile.encoding= + 
 System.getProperty(file.encoding) + \);
 {code}
 But this should work without any problems with junit4 because communication 
 streams are separate and we're decoding output properly (or so I hope). 
 Try and see what happens :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Solr-4.x - Build # 28 - Still Failing

2012-07-05 Thread Dawid Weiss
This has to do with ivy -- I admit I don't know what's happening. The
configuration gets read once but it isn't persisted for other ivy
tasks (even though the property clearly is): So you end up with:

ivy-configure:

resolve:
[ivy:retrieve] :: loading settings :: url =
jar:file:/home/hudson/.ant/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve]
[ivy:retrieve] :: problems summary ::
[ivy:retrieve]  WARNINGS
[ivy:retrieve]  module not found:
com.carrotsearch.randomizedtesting#junit4-ant;1.6.0
[ivy:retrieve]   local: tried
[ivy:retrieve]  
/home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml
[ivy:retrieve]-- artifact
com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar:
[ivy:retrieve]  
/home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar
[ivy:retrieve]   shared: tried
[ivy:retrieve]  
/home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml
[ivy:retrieve]-- artifact
com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar:
[ivy:retrieve]  
/home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar
[ivy:retrieve]   public: tried
[ivy:retrieve]  
http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.pom
[ivy:retrieve]-- artifact
com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar:
[ivy:retrieve]  
http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.jar
[ivy:retrieve]  module not found:
com.carrotsearch.randomizedtesting#randomizedtesting-runner;1.6.0
[ivy:retrieve]   local: tried

Note that sonatype's release repository is NOT tried, it just checks
the default chain. I'll provide a workaround fix in a second but I
don't know how to fix it in a proper way.

Dawid

On Thu, Jul 5, 2012 at 12:47 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Solr-4.x/28/

 No tests ran.

 Build Log:
 [...truncated 8055 lines...]

 [...truncated 8055 lines...]

 [...truncated 8055 lines...]

 [...truncated 8055 lines...]

 [...truncated 8055 lines...]

 [...truncated 8055 lines...]




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406992#comment-13406992
 ] 

Robert Muir commented on LUCENE-4100:
-

Hello, thank you for working on this! 

I have just taken a rough glance at the code, and think we should probably look 
at what API changes would make 
this sort of thing fit better into Lucene and it easier to implement.

Random thoughts:

Specifically: what you are doing in the PostingsWriter is similar to computing 
impacts (I don't have a copy of
the paper so admittedly don't know the exact algorithm you are using). But it 
seems to me that you are putting 
a maxScore in the term dictionary metadata for all of the terms postings (as a 
float).

With the tool you provide, this works because you have access to e.g. the 
segment's length normalization information
etc (your postingswriter takes a reader). But we would have to think about how 
to give postingswriters access to this 
on flush... it seems possible to me though.

Giving the postingswriter full statistics (e.g. docfreq) for Similarity 
computation seems difficult: while I think
we could accum this stuff in FreqProxTermsWriter before we flush to the codec, 
it wouldn't solve the problem at merge time,
so you would have to do a 2-pass merge in the codec somehow...

But the alternative of splitting the impact (tf/norm) from the 
document-independent weight (e.g. IDF) isn't that pretty
either, because it limits the scoring systems (Similarity implementations) that 
could use the optimization.

as many terms will be low frequency (e.g. docfreq=1), i think its not
worth it to encode the maxscore for these low freq terms: we could save space 
by omitting maxscore for low freq terms 
and just treat it as infinitely large?

the opposite problem: is it really optimal to encode maxscore for the entire 
term? or would it be better for high-freq
terms to encode maxScore for a range of postings (e.g. block). This way, you 
could skip over ranges of postings that cannot
compete (rather than limiting the optimization to an entire term). A codec 
could put this information into a block header,
or at certain intervals, into the skip data, etc.

do we really need a full 4-byte float? How well would the algorithm work with 
degraded precision: e.g. something like
SmallFloat. (I think this SmallFloat currently computes a lower bound, we would 
have to bump to the next byte to make an upper bound).

another idea: it might be nice if this optimization could sit underneath the 
codec, such that you dont need a special
Scorer. One idea here would be for your collector to set an attribute on the 
DocsEnum (maxScore): of course a normal
codec would totally ignore this and proceed as today. But codecs like this one 
could return NO_MORE_DOCS when postings
for that term can no longer compete. I'm just not positive if this algorithm 
can be refactored in this way, and this
would also require some clean way of getting these attributes from Collector - 
Scorer - DocsEnum. Currently Scorer
is in the way here :)

Just some random thoughts, I'll try to get a copy of this paper so I have a 
better idea whats going on with this particular
optimization...

 Maxscore - Efficient Scoring
 

 Key: LUCENE-4100
 URL: https://issues.apache.org/jira/browse/LUCENE-4100
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs, core/query/scoring, core/search
Affects Versions: 4.0
Reporter: Stefan Pohl
  Labels: api-change, patch, performance
 Fix For: 4.0

 Attachments: contrib_maxscore.tgz, maxscore.patch


 At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient 
 algorithm first published in the IR domain in 1995 by H. Turtle  J. Flood, 
 that I find deserves more attention among Lucene users (and developers).
 I implemented a proof of concept and did some performance measurements with 
 example queries and lucenebench, the package of Mike McCandless, resulting in 
 very significant speedups.
 This ticket is to get started the discussion on including the implementation 
 into Lucene's codebase. Because the technique requires awareness about it 
 from the Lucene user/developer, it seems best to become a contrib/module 
 package so that it consciously can be chosen to be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Solr-4.x - Build # 28 - Still Failing

2012-07-05 Thread Dawid Weiss
Ok, I know why this is happening. We have antcall and ant calls
across build files. Sometimes  these calls pass only selected
properties (propertysets) but do not pass references. As in here:

ant dir=${common.dir} target=default inheritall=false
  propertyset refid=uptodate.and.compiled.properties/
/ant

ivy:configure stores the default configuration as a reference so the
property will be passed down but the reference not. I don't know how
to fix it cleanly so I'll just leave my workaround patch in (which
re-reads the configuration every time, unfortunately).

Dawid

On Thu, Jul 5, 2012 at 12:52 PM, Dawid Weiss
dawid.we...@cs.put.poznan.pl wrote:
 This has to do with ivy -- I admit I don't know what's happening. The
 configuration gets read once but it isn't persisted for other ivy
 tasks (even though the property clearly is): So you end up with:

 ivy-configure:

 resolve:
 [ivy:retrieve] :: loading settings :: url =
 jar:file:/home/hudson/.ant/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
 [ivy:retrieve]
 [ivy:retrieve] :: problems summary ::
 [ivy:retrieve]  WARNINGS
 [ivy:retrieve]  module not found:
 com.carrotsearch.randomizedtesting#junit4-ant;1.6.0
 [ivy:retrieve]   local: tried
 [ivy:retrieve]
 /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml
 [ivy:retrieve]-- artifact
 com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar:
 [ivy:retrieve]
 /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar
 [ivy:retrieve]   shared: tried
 [ivy:retrieve]
 /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml
 [ivy:retrieve]-- artifact
 com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar:
 [ivy:retrieve]
 /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar
 [ivy:retrieve]   public: tried
 [ivy:retrieve]
 http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.pom
 [ivy:retrieve]-- artifact
 com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar:
 [ivy:retrieve]
 http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.jar
 [ivy:retrieve]  module not found:
 com.carrotsearch.randomizedtesting#randomizedtesting-runner;1.6.0
 [ivy:retrieve]   local: tried

 Note that sonatype's release repository is NOT tried, it just checks
 the default chain. I'll provide a workaround fix in a second but I
 don't know how to fix it in a proper way.

 Dawid

 On Thu, Jul 5, 2012 at 12:47 PM, Apache Jenkins Server
 jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Solr-4.x/28/

 No tests ran.

 Build Log:
 [...truncated 8055 lines...]

 [...truncated 8055 lines...]

 [...truncated 8055 lines...]

 [...truncated 8055 lines...]

 [...truncated 8055 lines...]

 [...truncated 8055 lines...]




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Solr-4.x - Build # 28 - Still Failing

2012-07-05 Thread Chris Male
Is your workaround particularly slow?

On Thu, Jul 5, 2012 at 11:18 PM, Dawid Weiss
dawid.we...@cs.put.poznan.plwrote:

 Ok, I know why this is happening. We have antcall and ant calls
 across build files. Sometimes  these calls pass only selected
 properties (propertysets) but do not pass references. As in here:

 ant dir=${common.dir} target=default inheritall=false
   propertyset refid=uptodate.and.compiled.properties/
 /ant

 ivy:configure stores the default configuration as a reference so the
 property will be passed down but the reference not. I don't know how
 to fix it cleanly so I'll just leave my workaround patch in (which
 re-reads the configuration every time, unfortunately).

 Dawid

 On Thu, Jul 5, 2012 at 12:52 PM, Dawid Weiss
 dawid.we...@cs.put.poznan.pl wrote:
  This has to do with ivy -- I admit I don't know what's happening. The
  configuration gets read once but it isn't persisted for other ivy
  tasks (even though the property clearly is): So you end up with:
 
  ivy-configure:
 
  resolve:
  [ivy:retrieve] :: loading settings :: url =
 
 jar:file:/home/hudson/.ant/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
  [ivy:retrieve]
  [ivy:retrieve] :: problems summary ::
  [ivy:retrieve]  WARNINGS
  [ivy:retrieve]  module not found:
  com.carrotsearch.randomizedtesting#junit4-ant;1.6.0
  [ivy:retrieve]   local: tried
  [ivy:retrieve]
 
 /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml
  [ivy:retrieve]-- artifact
  com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar:
  [ivy:retrieve]
 
 /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar
  [ivy:retrieve]   shared: tried
  [ivy:retrieve]
 
 /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml
  [ivy:retrieve]-- artifact
  com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar:
  [ivy:retrieve]
 
 /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar
  [ivy:retrieve]   public: tried
  [ivy:retrieve]
 
 http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.pom
  [ivy:retrieve]-- artifact
  com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar:
  [ivy:retrieve]
 
 http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.jar
  [ivy:retrieve]  module not found:
  com.carrotsearch.randomizedtesting#randomizedtesting-runner;1.6.0
  [ivy:retrieve]   local: tried
 
  Note that sonatype's release repository is NOT tried, it just checks
  the default chain. I'll provide a workaround fix in a second but I
  don't know how to fix it in a proper way.
 
  Dawid
 
  On Thu, Jul 5, 2012 at 12:47 PM, Apache Jenkins Server
  jenk...@builds.apache.org wrote:
  Build: https://builds.apache.org/job/Solr-4.x/28/
 
  No tests ran.
 
  Build Log:
  [...truncated 8055 lines...]
 
  [...truncated 8055 lines...]
 
  [...truncated 8055 lines...]
 
  [...truncated 8055 lines...]
 
  [...truncated 8055 lines...]
 
  [...truncated 8055 lines...]
 
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male


Re: [JENKINS] Solr-4.x - Build # 28 - Still Failing

2012-07-05 Thread Dawid Weiss
I don't think so -- it just reloads the config file over and over but
it's probably in the os cache anyway.

That property passing is broken as I explained. In general antcalls
lead to a big mess with references/ properties, I hate it (but don't
have an idea how to improve it).

Dawid

On Thu, Jul 5, 2012 at 1:21 PM, Chris Male gento...@gmail.com wrote:
 Is your workaround particularly slow?


 On Thu, Jul 5, 2012 at 11:18 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl
 wrote:

 Ok, I know why this is happening. We have antcall and ant calls
 across build files. Sometimes  these calls pass only selected
 properties (propertysets) but do not pass references. As in here:

 ant dir=${common.dir} target=default inheritall=false
   propertyset refid=uptodate.and.compiled.properties/
 /ant

 ivy:configure stores the default configuration as a reference so the
 property will be passed down but the reference not. I don't know how
 to fix it cleanly so I'll just leave my workaround patch in (which
 re-reads the configuration every time, unfortunately).

 Dawid

 On Thu, Jul 5, 2012 at 12:52 PM, Dawid Weiss
 dawid.we...@cs.put.poznan.pl wrote:
  This has to do with ivy -- I admit I don't know what's happening. The
  configuration gets read once but it isn't persisted for other ivy
  tasks (even though the property clearly is): So you end up with:
 
  ivy-configure:
 
  resolve:
  [ivy:retrieve] :: loading settings :: url =
 
  jar:file:/home/hudson/.ant/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
  [ivy:retrieve]
  [ivy:retrieve] :: problems summary ::
  [ivy:retrieve]  WARNINGS
  [ivy:retrieve]  module not found:
  com.carrotsearch.randomizedtesting#junit4-ant;1.6.0
  [ivy:retrieve]   local: tried
  [ivy:retrieve]
 
  /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml
  [ivy:retrieve]-- artifact
  com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar:
  [ivy:retrieve]
 
  /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar
  [ivy:retrieve]   shared: tried
  [ivy:retrieve]
 
  /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml
  [ivy:retrieve]-- artifact
  com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar:
  [ivy:retrieve]
 
  /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar
  [ivy:retrieve]   public: tried
  [ivy:retrieve]
 
  http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.pom
  [ivy:retrieve]-- artifact
  com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar:
  [ivy:retrieve]
 
  http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.jar
  [ivy:retrieve]  module not found:
  com.carrotsearch.randomizedtesting#randomizedtesting-runner;1.6.0
  [ivy:retrieve]   local: tried
 
  Note that sonatype's release repository is NOT tried, it just checks
  the default chain. I'll provide a workaround fix in a second but I
  don't know how to fix it in a proper way.
 
  Dawid
 
  On Thu, Jul 5, 2012 at 12:47 PM, Apache Jenkins Server
  jenk...@builds.apache.org wrote:
  Build: https://builds.apache.org/job/Solr-4.x/28/
 
  No tests ran.
 
  Build Log:
  [...truncated 8055 lines...]
 
  [...truncated 8055 lines...]
 
  [...truncated 8055 lines...]
 
  [...truncated 8055 lines...]
 
  [...truncated 8055 lines...]
 
  [...truncated 8055 lines...]
 
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 Chris Male

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4145) Unhandled exception from test framework (in json parsing of test output files?)

2012-07-05 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-4145.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.0

Should be better now as events are not buffered on the client. I still wouldn't 
give my head for the -Dtests.iters=gazillion scenario because they're still 
buffered on the master (for reports, etc.)

As always, it's a tradeoff -- spilling those events to disk is possible but 
would increase the complexity a lot. Maybe an embedded simple db like hsqldb or 
something would help here, I don't know. Anyway, it doesn't make sense in 99% 
of situations (so large iteration count/ tests number).



 Unhandled exception from test framework (in json parsing of test output 
 files?)
 -

 Key: LUCENE-4145
 URL: https://issues.apache.org/jira/browse/LUCENE-4145
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Dawid Weiss
 Fix For: 4.0, 5.0


 Working on SOLR-3267 i got a weird exception printed to the junit output...
 {noformat}
[junit4] Unhandled exception in thread: Thread[pumper-events,5,main]
[junit4] 
 com.carrotsearch.ant.tasks.junit4.dependencies.com.google.gson.JsonParseException:
  No such reference: id#org.apache.solr.search.TestSort[3]
 ...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Carl Austin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407001#comment-13407001
 ] 

Carl Austin commented on LUCENE-4190:
-

I was the original commenter on the blog about this issue and have previously 
experienced the deletion of all files on a drive because of the exact same 
restriction - the fallout from this is massive.

The issue here is that many people who use lucene will not realise that this 
can happen, and this situation will occur sooner or later. You can't expect 
that every developer who uses lucene will understand every in and out, read 
every bit of javadoc fully or every release change note. Look at the number of 
posts to the mailing list that are just people who haven't fully read or 
understood something. I firmly believe that this has to be handled by the 
library such that a simple mistake or misunderstanding by a developer does not 
lead to the loss of important files.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4194) Fix default charset sensitive method calls

2012-07-05 Thread Dawid Weiss (JIRA)
Dawid Weiss created LUCENE-4194:
---

 Summary: Fix default charset sensitive method calls
 Key: LUCENE-4194
 URL: https://issues.apache.org/jira/browse/LUCENE-4194
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Dawid Weiss
Priority: Minor
 Fix For: 4.0, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4194) Fix default charset sensitive method calls

2012-07-05 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407018#comment-13407018
 ] 

Dawid Weiss commented on LUCENE-4194:
-

There are a number of places (in tests mostly) which call:
{code}
new FileReader(File)
String.getBytes()
new String(byte[])
{code}

The expected encoding should be provided explicitly, even if the contents is 
mostly ASCII.

 Fix default charset sensitive method calls
 --

 Key: LUCENE-4194
 URL: https://issues.apache.org/jira/browse/LUCENE-4194
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Dawid Weiss
Priority: Minor
 Fix For: 4.0, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4194) Fix default charset sensitive method calls

2012-07-05 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407019#comment-13407019
 ] 

Dawid Weiss commented on LUCENE-4194:
-

Try running:
{noformat}
ant -Dtests.file.encoding=UTF-16 test
{noformat}
on windows. This exposes most of these issues.

 Fix default charset sensitive method calls
 --

 Key: LUCENE-4194
 URL: https://issues.apache.org/jira/browse/LUCENE-4194
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Dawid Weiss
Priority: Minor
 Fix For: 4.0, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4194) Fix default charset sensitive method calls

2012-07-05 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-4194:


Attachment: CropperCapture[2].png
CropperCapture[1].png

A list of files calling forbidden methods...

 Fix default charset sensitive method calls
 --

 Key: LUCENE-4194
 URL: https://issues.apache.org/jira/browse/LUCENE-4194
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Dawid Weiss
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: CropperCapture[1].png, CropperCapture[2].png




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407027#comment-13407027
 ] 

Shai Erera commented on LUCENE-4190:


bq. We HAVE to keep this kind of code simple and maintainable in lucene.

Why? We write lots of other code that prevents users from shooting themselves 
in the legs, so why make an exception here? Just because a code might get 
complicated doesn't mean we don't need to write it.

While I agree with you that Lucene is not a File manager, I think it'd be good 
if we can cleanup after ourselves rather than delete everything that we don't 
recognize.

Since you're more familiar than me with the 4.0 internals, can you please 
relate to the simple proposal I outlined above? Can it even work?

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407033#comment-13407033
 ] 

Michael McCandless commented on LUCENE-4190:


I agree there is a real danger here if users accidentally point
IndexWriter at the wrong directory.  This was found/fixed way in the
past already: LUCENE-385.

But I also don't want to go back to the hairy files(), extensions() we
used to require of all codec components.

Yet I think there's a good middle ground: only allow a codec to write
to _seg.* or _seg_*.* files (ie the ones created by
IndexFileNames).  All of our codecs are (should be!) using IndexFileName.*
to compute a file name to write to.

In reality a codec already isn't free to just write to any file,
because then it may conflict with another codec doing the same thing.
So de-facto codecs already have a private namespace, prefixed by
_seg and further refined by _N (ie when there are multiple postings
formats in a single codec).

Since a general codec must already obey its private namespace (to not
step on other codecs) I think it's fine to enforce it?



 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407034#comment-13407034
 ] 

Robert Muir commented on LUCENE-4190:
-

The problem with a global filter is that what files a codec uses are an 
implementation detail of the codec. Currently today,
a codec can name files pretty much whatever it wants (it must avoid _seg.cfs 
and segments_seg and segments.gen of course).
 
In general other than exceptional cases, we know which files a codec owns 
because
a codec writes the list of files that it uses for a segment into the 
SegmentInfo 
(http://lucene.apache.org/core/4_0_0-ALPHA/core/org/apache/lucene/codecs/lucene40/Lucene40SegmentInfoFormat.html).

The problem is these exceptional cases: how can IndexFileDeleter distinguish 
between leftover partially written index files for a segment and some files of 
the user, since it may not have the SegmentInfo (.si) for that segment?

Previous attempts at this still didnt work well:
* listing the extensions() in the codec is not great, e.g. Sep codec uses .doc 
extension for documents!
* having the codec list the files it uses for a segment isnt easy and causes a 
mess: previously files() had to be symmetric at read and write time and we 
often had bugs in this, because the files used by the codec often depends upon 
various things like options the user chooses (e.g. did they enable term 
vectors, payloads, etc etc). I will do *anything* to prevent this from coming 
back!

So in my opinion, the only real, third option is to restrict what file names a 
codec can use, in a way thats not a huge imposition to the codec. My patch on 
this issue (which people weren't happy with) did just this: it restricted file 
names to begin with an underscore.


 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407040#comment-13407040
 ] 

Robert Muir commented on LUCENE-4190:
-

{quote}
Since a general codec must already obey its private namespace (to not
step on other codecs) I think it's fine to enforce it?
{quote}

The problem it seems is people want a perfect solution. An imperfect solution 
(_.*) seems to imply that
its a bug if lucene deletes _myImportantDocument.doc.

So if we insist on a perfect solution: then fine, the perfect solution I accept 
is for lucene to totally own
the directory, don't put files in there! Then the behavior is clear, no bugs, 
we delete everything.


 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4191) Lucene doc pages redirect to api-4_0_0-ALPHA which results in 404

2012-07-05 Thread Chaim Peck (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407082#comment-13407082
 ] 

Chaim Peck commented on LUCENE-4191:


Then where does one go to find documentation?

The above link if the first hit when you google BaseTokenFilterFactory

 Lucene doc pages redirect to api-4_0_0-ALPHA which results in 404
 ---

 Key: LUCENE-4191
 URL: https://issues.apache.org/jira/browse/LUCENE-4191
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.6
Reporter: Chaim Peck
  Labels: documentation

 Try to go to this URL:
 http://lucene.apache.org/solr/api/org/apache/solr/analysis/BaseTokenFilterFactory.html
 The result is that you will be redirected here, which is a 404:
 http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/analysis/BaseTokenFilterFactory.html
 You can still get to the page from google cache:
 http://webcache.googleusercontent.com/search?q=cache:mCJCac4iZ0QJ:lucene.apache.org/solr/api/org/apache/solr/analysis/BaseTokenFilterFactory.html+cd=1hl=enct=clnkgl=us

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Gilad Barkai (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407086#comment-13407086
 ] 

Gilad Barkai commented on LUCENE-4190:
--

{quote}
So if we insist on a perfect solution: then fine, the perfect solution I accept 
is for lucene to totally own
the directory, don't put files in there! Then the behavior is clear, no bugs, 
we delete everything.
{quote}

But than we're left with the original problem - should a poor user (say, me) 
accidentally put an index in an already filled directory (say /tmp) - the price 
to pay for is great.
Too great IMHO.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407088#comment-13407088
 ] 

Yonik Seeley commented on LUCENE-4190:
--

Robert, you completely ignored my explicit VETO.
We had consensus, and code was committed.  It's no longer your commit to do 
anything you want with over the objections of others.
Undoubtedly, you would now revert any commit I would make to rectify the 
situation and fix this bug.  So let's now take it to the PMC and codify if it's 
OK to ignore VETOs of other PMC members that you don't agree with.  Perhaps we 
need to update the rules we operate under.


 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407091#comment-13407091
 ] 

Robert Muir commented on LUCENE-4190:
-

you can't veto me backing out my own commit. 

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407094#comment-13407094
 ] 

Mark Miller commented on LUCENE-4190:
-

bq. pretty sure I have the right to revert my own commit.

Once something is in the code base, it doesn't matter who committed it - all 
the same rules apply. That it's your commit doesn't change anything. Unless you 
went insane and started threatening license revoking type...oh wait...

bq. I can declare the licensing of asl2 as a mistake and instead full gpl if we 
want to press the point?

You can't be on the PMC and play games like this if you ask me. Being on the 
PMC means you have an obligation to act above this. Are we going back to the 
revert wars now?

As far as I can tell you are trying to act like a dictator on this issue. You 
contribute a lot to Lucene, but you are not the dictator. Why do you need to 
*demand* that certain things happen as you prescribe? Why do you need to make 
threats about revoking licenses?

This issue should be about consensus, not bullying.

Are you kidding me dude?

I hope you start working with the community and stop trying to step on it. Your 
stance is far too often, it's Roberts way or the highway if you ask me.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407099#comment-13407099
 ] 

Mark Harwood commented on LUCENE-4190:
--

-1 for merrily wiping contents of whatever directory a user happens to pick for 
an index location
+0 on requiring all codecs to declare filenames because I take on board Rob's 
points re complexity
+1 for the _* name-spacing proposal as a sensible compromise





 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407097#comment-13407097
 ] 

Robert Muir commented on LUCENE-4190:
-

Thats ok, its clearly another typical Solr-versus-Robert battle here, where 
Mark+Yonik both gang up on me.

Another way to look at it: I committed the patch after Mike reviewed it, 
because it looked like consensus.
There was then a ton of questions and commentary, arguably there wasnt really 
consensus and i prematurely committed.

So i backed it out.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux-Java6-64 - Build # 1168 - Failure!

2012-07-05 Thread Policeman Jenkins Server
Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/1168/

5 tests failed.
REGRESSION:  
org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta2.testCompositePk_DeltaImport_add_delete

Error Message:
Exception during query

Stack Trace:
java.lang.RuntimeException: Exception during query
at 
__randomizedtesting.SeedInfo.seed([6E23FDC4CD0393D:D44317B3C5D732CF]:0)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:461)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:428)
at 
org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta2.testCompositePk_DeltaImport_add_delete(TestSqlEntityProcessorDelta2.java:282)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
Caused by: java.lang.RuntimeException: REQUEST FAILED: xpath=//*[@numFound='1']
xml response was: ?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint 

[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407104#comment-13407104
 ] 

Mark Miller commented on LUCENE-4190:
-

bq. Thats ok, its clearly another typical Solr-versus-Robert battle here, where 
Mark+Yonik both gang up on me.

I don't have an opinion on this issue. A lot of smart people have already given 
input, and I was interested to read about it. I have not formulated my own 
opinion yet.

I also don't mind if you and Yonik have disagreement or debate. As long as you 
act reasonably.

Anyone that ignores consensus and threatens license revoking has me not on 
their side. That's part of the role of a PMC member IMO. To keep an eye out for 
unhealthy community behavior and point it out. All the disagreement in the 
world is fine, but you have to play in the sandbox.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407107#comment-13407107
 ] 

Robert Muir commented on LUCENE-4190:
-

I think if you read through the issue, how is consensus being ignored?

Again: I committed the patch after Mike reviewed it, because it looked like 
consensus.
But I think this was premature, because a lot of questions and comments came 
afterwards.

Backing it out is the right thing to do. It might be that we get consensus for 
this patch
or something else and it might even go right back in the way it was.

You just don't like the words I used.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4195) Add javadocs to Codec package.html

2012-07-05 Thread Alan Woodward (JIRA)
Alan Woodward created LUCENE-4195:
-

 Summary: Add javadocs to Codec package.html
 Key: LUCENE-4195
 URL: https://issues.apache.org/jira/browse/LUCENE-4195
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 5.0
Reporter: Alan Woodward
Priority: Minor


The Codec package.html is pretty basic.  Add some overview information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407115#comment-13407115
 ] 

Mark Miller commented on LUCENE-4190:
-

{quote}Another way to look at it: I committed the patch after Mike reviewed it, 
because it looked like consensus.
There was then a ton of questions and commentary, arguably there wasnt really 
consensus and i prematurely committed. So i backed it out. {quote}

That would have been a good argument and much better than the alternative 
argument you took IMO.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4195) Add javadocs to Codec package.html

2012-07-05 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-4195:
--

Attachment: LUCENE-4195.patch

Here's some basic javadoc, telling users how to register new Codecs and 
PostingsFormats.  Pretty basic, but better than nothing!

 Add javadocs to Codec package.html
 --

 Key: LUCENE-4195
 URL: https://issues.apache.org/jira/browse/LUCENE-4195
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 5.0
Reporter: Alan Woodward
Priority: Minor
 Attachments: LUCENE-4195.patch


 The Codec package.html is pretty basic.  Add some overview information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4195) Add javadocs to Codec package.html

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407117#comment-13407117
 ] 

Robert Muir commented on LUCENE-4195:
-

awesome! Thanks for doing this.

 Add javadocs to Codec package.html
 --

 Key: LUCENE-4195
 URL: https://issues.apache.org/jira/browse/LUCENE-4195
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 5.0
Reporter: Alan Woodward
Priority: Minor
 Attachments: LUCENE-4195.patch


 The Codec package.html is pretty basic.  Add some overview information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4195) Add javadocs to Codec package.html

2012-07-05 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4195.
-

   Resolution: Fixed
Fix Version/s: 4.0

I committed this, Thanks again!

 Add javadocs to Codec package.html
 --

 Key: LUCENE-4195
 URL: https://issues.apache.org/jira/browse/LUCENE-4195
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 5.0
Reporter: Alan Woodward
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-4195.patch


 The Codec package.html is pretty basic.  Add some overview information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407145#comment-13407145
 ] 

Robert Muir commented on LUCENE-4190:
-

{quote}
The words, the threat, the quick action - correct - that's my problem.
{quote}

Right, i take offense to the idea that if i committed something too soon, i 
cant back it out.

Sure, it didnt help that I was already frustrated with the technical situation 
(I thought and still do think,
that the patch is a great compromise, easy solution, low risk, simple, etc).

But i think if this situation happens, e.g. someone commits prematurely, then 
there are a bunch of comments
on the issue that make it clear there really isnt consensus, then they have the 
right to back it out, in fact
I think its the right thing to do. 

And nobody can veto that.


 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407148#comment-13407148
 ] 

Yonik Seeley commented on LUCENE-4190:
--

bq. But I think this was premature, because a lot of questions and comments 
came afterwards.

Except that if you read back through the issue, that's not what happened.

Anyway, if you're interested in consensus now, I don't see anyone opposed to 
the underscore solution in the short term, even if some thought it didn't go 
far enough.  I didn't see anyone saying that deleting all files was preferable 
in the short term.
So if there are no objections, I'll re-commit the underscore fix (which there 
was consensus for), and then discussion can continue about better methods.

Robert, I'll repeat your own words back to you:
bq. We can maybe improve in the future besides the _ check, but I just think 
this is an easy improvement that will prevent most of the problems.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407151#comment-13407151
 ] 

Robert Muir commented on LUCENE-4190:
-

{quote}
I didn't see anyone saying that deleting all files was preferable in the short 
term.
{quote}

I'm not sure this is totally true.

Again I think if its required that we have a *perfect* solution, then deleting 
all files is preferable to the alternative of codec having hairy code to detect 
if it owns or doesnt own a file.

But this patch is a nice *imperfect* solution that probably prevents accidental 
deletion of MyImportantDocument.doc or whatever.


 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407157#comment-13407157
 ] 

Uwe Schindler commented on LUCENE-4190:
---

Hi,
I was thinking about the whole thing for longer time. My idea would limit us a 
bit more, but I really like Mike's proposal of fixed names. I would change the 
Directory class, so every method that handles or deletes files gets 2 
parameters, segment name and one arbitrary codec-private file name. the 
directory is then responsible to create the file name, prefix with _ and so on. 
A custom directoy (like hbase), could use the segment name as table name and 
the private file name as identifier, so all segment files go into same hbase 
table. the diurectory would then also be responible to do a cleanup/list of 
files, where it would only return files matching the pattern.

For the index wide metdata like segments file we would then unfortunately need 
a special method to get indexoutput :(

If we keep with current one-filename, i would make the format fixed, so it 
throws IOException if filename is invalid. Assert makes no sense here as it 
does not prevent people from doing the wrong thing. Then really nothing can 
create invalid files and deleting by _[0-9a-z_]+ works and all would be happy.

Alternatively, we could switch to the following:
- If we create an *new* index, we enforce that listFiles returns empty list (., 
.. excluded, buts thats done already), otherwise we throw 
IOException(directory not empty).
- If there is a segment file already there, we can delete everything not 
allowed in an index.

Uwe

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407155#comment-13407155
 ] 

Robert Muir commented on LUCENE-4190:
-

{quote}
So if there are no objections, I'll re-commit the underscore fix (which there 
was consensus for), and then discussion can continue about better methods.
{quote}

I don't object to the patch being committed (though i think it would be good to 
wait a little bit), but
I am still very very concerned that it starts a slippery slope back to 
Codec.files().


 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407162#comment-13407162
 ] 

Robert Muir commented on LUCENE-4190:
-

{quote}
I was thinking about the whole thing for longer time. My idea would limit us a 
bit more, but I really like Mike's proposal of fixed names. I would change the 
Directory class, so every method that handles or deletes files gets 2 
parameters, segment name and one arbitrary codec-private file name. the 
directory is then responsible to create the file name, prefix with _ and so on. 
A custom directoy (like hbase), could use the segment name as table name and 
the private file name as identifier, so all segment files go into same hbase 
table. the diurectory would then also be responible to do a cleanup/list of 
files, where it would only return files matching the pattern.
{quote}

I'm not sure matching _[0-9a-z_]+ is really that big of an improvement over 
just the underscore. But i dont think we need
to refactor Directory.java to do this. we could just change the underscore 
check to a regular expression.

{quote}
Assert makes no sense here as it does not prevent people from doing the wrong 
thing.
{quote}

I don't agree: i at first thought to do a hard check, but this is only really 
necessary for codec developers. So an assert
is enough, because you catch it when developing your codec (its either gonna 
work, or completely not work here).

{quote}
If we create an new index, we enforce that listFiles returns empty list (., .. 
excluded, buts thats done already), otherwise we throw IOException(directory 
not empty).
{quote}

I thought about this but i have concerns about things like .DS_Store and 
.nfsX or other files that some system could
be doing behind the scenes, etc.




 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407167#comment-13407167
 ] 

Robert Muir commented on LUCENE-4190:
-

just to mention: the reason I don't like the Directory refactoring would be 
some of the crazy things we do (look at CompoundFileDirectory and also 
IndexWriter copySegmentAsIs, etc).

This is basically what i think we should avoid: adding a lot of risky 
complexity for little gain.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407170#comment-13407170
 ] 

Uwe Schindler commented on LUCENE-4190:
---

{quote}
bq. Assert makes no sense here as it does not prevent people from doing the 
wrong thing.

I don't agree: i at first thought to do a hard check, but this is only really 
necessary for codec developers. So an assert
is enough, because you catch it when developing your codec (its either gonna 
work, or completely not work here).
{quote}

Why not make it a hard check, otherwise one could write a file without _ and 
schwupps, it's wech :) (German). Why only an assert? If we require all files 
start with _ lets enorce it, otherwise delete all files like we do currently. 
Using an assert would get my -1 to commit this again.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407177#comment-13407177
 ] 

Robert Muir commented on LUCENE-4190:
-

I'm not 1000% determined for it to only be an assert, but then we should change 
how the code works to make
sure that the check is not too expensive. The current assert makes 
SegmentInfo.addFiles/addFile very expensive.


 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407177#comment-13407177
 ] 

Robert Muir edited comment on LUCENE-4190 at 7/5/12 2:42 PM:
-

I'm not 1000% determined for it to only be an assert, but then we should change 
how the code works to make
sure that the check is not too expensive. The current assert makes 
SegmentInfo.addFiles/addFile very expensive (if its turned directly into a hard 
check)


  was (Author: rcmuir):
I'm not 1000% determined for it to only be an assert, but then we should 
change how the code works to make
sure that the check is not too expensive. The current assert makes 
SegmentInfo.addFiles/addFile very expensive.

  
 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407181#comment-13407181
 ] 

Uwe Schindler commented on LUCENE-4190:
---

String.startsWith(_) static string is cheap (a few cpu cycles, as it only 
needs compare length and one char... Please dont tell me that SI.addFilkes is 
called in inner loops like Scorers! Not doing this check is stupid.

BTW: In CFSDirectory the assert about double entries on reading the dir should 
also throw CorruptIndexEx, because a CFS with duplicate file names is broken. 
This check is even cheaper. I am planning to open a new issue to fix all those 
I/O related checks to be hard, asserts are not appropriate here.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407185#comment-13407185
 ] 

Robert Muir commented on LUCENE-4190:
-

Uwe, no i mean that we check the entire list each time.

So if someone were to call addFile(), addFile(), addFile() that would be very 
bad runtime. Ill update the patch.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows-Java6-64 - Build # 696 - Failure!

2012-07-05 Thread Policeman Jenkins Server
Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/696/

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler

Error Message:
ERROR: SolrIndexSearcher opens=74 closes=72

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=74 closes=72
at __randomizedtesting.SeedInfo.seed([8F8AC7455F54278A]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:191)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:82)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:754)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 7663 lines...]
[junit4:junit4]   2 161993 T2156 oas.SolrTestCaseJ4.endTrackingSearchers 
SEVERE ERROR: SolrIndexSearcher opens=74 closes=72
[junit4:junit4]   2 NOTE: test params are: codec=Appending, 
sim=RandomSimilarityProvider(queryNorm=false,coord=false): {}, locale=lv, 
timezone=Africa/Blantyre
[junit4:junit4]   2 NOTE: Windows 7 6.1 amd64/Sun Microsystems Inc. 1.6.0_32 
(64-bit)/cpus=2,threads=1,free=53764744,total=238157824
[junit4:junit4]   2 NOTE: All tests run in this JVM: 
[TestGermanLightStemFilterFactory, TestPropInjectDefaults, 
SolrCmdDistributorTest, TestTrimFilterFactory, TestSolrQueryParser, 
LukeRequestHandlerTest, SolrIndexConfigTest, PrimitiveFieldTypeTest, 
DistanceFunctionTest, MultiTermTest, TestThaiWordFilterFactory, 
SpatialFilterTest, FileBasedSpellCheckerTest, LengthFilterTest, 
TestRemoteStreaming, TestSort, PrimUtilsTest, TestConfig, 
XmlUpdateRequestHandlerTest, XsltUpdateRequestHandlerTest, 
LeaderElectionIntegrationTest, SolrCoreCheckLockOnStartupTest, 
TestKeywordMarkerFilterFactory, TestCJKWidthFilterFactory, TestCodecSupport, 
TestEnglishMinimalStemFilterFactory, FieldAnalysisRequestHandlerTest, 
FieldMutatingUpdateProcessorTest, TestDelimitedPayloadTokenFilterFactory, 
TestSwedishLightStemFilterFactory, TestSearchPerf, TestBinaryField, 
BadIndexSchemaTest, TestTurkishLowerCaseFilterFactory, TestDistributedSearch, 
TestMappingCharFilterFactory, TestKeepFilterFactory, TestFaceting, 
FullSolrCloudTest, IndexBasedSpellCheckerTest, IndexSchemaTest, 
IndexReaderFactoryTest, TestHashPartitioner, TestIndexingPerformance, 
CoreAdminHandlerTest, SearchHandlerTest, 
TestHyphenationCompoundWordTokenFilterFactory, TestPropInject, 
TestFrenchMinimalStemFilterFactory, TestPortugueseMinimalStemFilterFactory, 
TestElisionFilterFactory, TestCapitalizationFilterFactory, 
TestItalianLightStemFilterFactory, SnowballPorterFilterFactoryTest, 
TestQuerySenderNoQuery, TestHindiFilters, TestStandardFactories, 
ZkSolrClientTest, TestCJKBigramFilterFactory, CloudStateUpdateTest, 
TestDFRSimilarityFactory, RAMDirectoryFactoryTest, 
OpenExchangeRatesOrgProviderTest, TestDefaultSimilarityFactory, 
TestKStemFilterFactory, 

[jira] [Updated] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4190:


Attachment: LUCENE-4190.patch

updated patch with _ check turned into a hard check.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4196) Turn asserts in I/O related code into hard checks

2012-07-05 Thread Uwe Schindler (JIRA)
Uwe Schindler created LUCENE-4196:
-

 Summary: Turn asserts in I/O related code into hard checks
 Key: LUCENE-4196
 URL: https://issues.apache.org/jira/browse/LUCENE-4196
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0-ALPHA
Reporter: Uwe Schindler
 Fix For: 4.0


In lots of codecs we only assert, that e.g. some things inside files are 
correctly in bounds, which leads to security problems (ok, not as bad as 
C-Style buffer overflows), but e.g. allocating a large array after reading a 
VInt from a file header and then OOM, is a security issue. So we have to check 
all those contracts for files as hard checks, especially as a simply check in 
most cases dont cost anything (and it costs not more than the assert itsself, 
as the assert also takes CPU power, because it needs a check one time on a 
static final class field).
Of course we cannot check values we read when reading postings, but the simple 
checks that any postings file has correct header and something like a positive 
number of elements, or number of elements  file size,..., a bit-fireld only 
contains valid bits in StoredFieldsReader, or non-duplicate filenames (CFS) are 
very important. We had those checks in 3.x, but in 4.0, Mike changed all of 
those to asserts during the flex development (in my opinion with no real 
reason).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4196) Turn asserts in I/O related code into hard checks

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407194#comment-13407194
 ] 

Robert Muir commented on LUCENE-4196:
-

I agree, sometimes I added asserts that I feel should be real checks but I feel 
its safest to just do the assert.

Lucene3xNormsProducer:119 is an example. if it fails it means you have a 
corrumpt .nrm file with wrong norms mismatched
for different fields.

 Turn asserts in I/O related code into hard checks
 -

 Key: LUCENE-4196
 URL: https://issues.apache.org/jira/browse/LUCENE-4196
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0-ALPHA
Reporter: Uwe Schindler
 Fix For: 4.0


 In lots of codecs we only assert, that e.g. some things inside files are 
 correctly in bounds, which leads to security problems (ok, not as bad as 
 C-Style buffer overflows), but e.g. allocating a large array after reading a 
 VInt from a file header and then OOM, is a security issue. So we have to 
 check all those contracts for files as hard checks, especially as a simply 
 check in most cases dont cost anything (and it costs not more than the assert 
 itsself, as the assert also takes CPU power, because it needs a check one 
 time on a static final class field).
 Of course we cannot check values we read when reading postings, but the 
 simple checks that any postings file has correct header and something like a 
 positive number of elements, or number of elements  file size,..., a 
 bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate 
 filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, 
 Mike changed all of those to asserts during the flex development (in my 
 opinion with no real reason).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407196#comment-13407196
 ] 

Uwe Schindler commented on LUCENE-4190:
---

New patch looks good, I was not aware that the previous one was iterating over 
all files each time. As the SegmentInfo internal list should not be available 
outside, we have no problem anybody else changing this uncontrolled.

See also issue LUCENE-4196.

 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Question about solr config files encoding.

2012-07-05 Thread Uwe Schindler
Config fiules are XML and I changed them to be handled by the XML parser 
(InputStreams), so XML parser reads encoding from Header.

But JSON is defined to be UTF-8, so we must supply the encoding 
(IOUtils.UTF8_CHARSET).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Dawid Weiss [mailto:dawid.we...@gmail.com]
 Sent: Thursday, July 05, 2012 5:00 PM
 To: dev@lucene.apache.org
 Subject: Question about solr config files encoding.
 
 Guys should the encoding of config files really be platform-dependent?
 Currently Solr tests fail massively on setup because of things like
 this:
 
 public OpenExchangeRates(InputStream ratesStream) throws IOException {
   parser = new JSONParser(new InputStreamReader(ratesStream));
 
 this reader, when confronted with UTF-16 as file.encoding results in funky
 exceptions like:
 
 Caused by: org.apache.noggit.JSONParser$ParseException: JSON Parse
 Error: char=笊,position=0 BEFORE='笊'
 AFTER='†≤楳捬慩浥爢㨠≔桩猠摡瑡⁩猠捯汬散瑥搠晲潭⁶慲楯畳⁰牯癩摥牳⁡
 湤⁰牯癩摥搠晲'
  at org.apache.noggit.JSONParser.err(JSONParser.java:221)
  at org.apache.noggit.JSONParser.next(JSONParser.java:620)
  at org.apache.noggit.JSONParser.nextEvent(JSONParser.java:661)
  at
 org.apache.solr.schema.OpenExchangeRatesOrgProvider$OpenExchangeRates.
 init(OpenExchangeRatesOrgProvider.java:189)
  at
 org.apache.solr.schema.OpenExchangeRatesOrgProvider.reload(OpenExchang
 eRatesOrgProvider.java:129)
 
 Can we fix the encoding of these input files to UTF-8 or something?
 According to JSON RFC:
 
 http://tools.ietf.org/html/rfc4627#section-3
 
 JSON text SHALL be encoded in Unicode.  The default encoding is
UTF-8.
 
Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.
 
00 00 00 xx  UTF-32BE
00 xx 00 xx  UTF-16BE
xx 00 00 00  UTF-32LE
xx 00 xx 00  UTF-16LE
xx xx xx xx  UTF-8
 
 We could just enforce/require UTF-8? Alternatively, auto-detect this from a
 binary stream as a custom Reader class.
 
 Dawid
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4196) Turn asserts in I/O related code into hard checks

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407207#comment-13407207
 ] 

Robert Muir commented on LUCENE-4196:
-

That one is a good example of something we should watch out for, i think its ok 
because it uses IndexInput.length,
but we should make sure we don't directly turn asserts that use things like 
Directory.fileExists or Directory.fileLength into
real checks, it could cause problems for NFS (LUCENE-3727)



 Turn asserts in I/O related code into hard checks
 -

 Key: LUCENE-4196
 URL: https://issues.apache.org/jira/browse/LUCENE-4196
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0-ALPHA
Reporter: Uwe Schindler
 Fix For: 4.0


 In lots of codecs we only assert, that e.g. some things inside files are 
 correctly in bounds, which leads to security problems (ok, not as bad as 
 C-Style buffer overflows), but e.g. allocating a large array after reading a 
 VInt from a file header and then OOM, is a security issue. So we have to 
 check all those contracts for files as hard checks, especially as a simply 
 check in most cases dont cost anything (and it costs not more than the assert 
 itsself, as the assert also takes CPU power, because it needs a check one 
 time on a static final class field).
 Of course we cannot check values we read when reading postings, but the 
 simple checks that any postings file has correct header and something like a 
 positive number of elements, or number of elements  file size,..., a 
 bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate 
 filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, 
 Mike changed all of those to asserts during the flex development (in my 
 opinion with no real reason).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Question about solr config files encoding.

2012-07-05 Thread Dawid Weiss
 But JSON is defined to be UTF-8, so we must supply the encoding 
 (IOUtils.UTF8_CHARSET).

That RFC says it can be any unicode... this said I agree with you that
we can probably assume it's UTF-8 and not worry about anything else.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Question about solr config files encoding.

2012-07-05 Thread Uwe Schindler
3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.

   00 00 00 xx  UTF-32BE
   00 xx 00 xx  UTF-16BE
   xx 00 00 00  UTF-32LE
   xx 00 xx 00  UTF-16LE
   xx xx xx xx  UTF-8

:-)

I think we can safely assume it is UTF-8, otherwise we must do the same shit 
like XML parsers with mark() on BufferedInputStream Most libraries out 
there can only read UTF-8 and SOLR itself produces only UTF8 JSON, right? Those 
tests only check response from solr.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: dawid.we...@gmail.com [mailto:dawid.we...@gmail.com] On Behalf Of
 Dawid Weiss
 Sent: Thursday, July 05, 2012 5:35 PM
 To: dev@lucene.apache.org
 Subject: Re: Question about solr config files encoding.
 
  But JSON is defined to be UTF-8, so we must supply the encoding
 (IOUtils.UTF8_CHARSET).
 
 That RFC says it can be any unicode... this said I agree with you that we can
 probably assume it's UTF-8 and not worry about anything else.
 
 Dawid
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Question about solr config files encoding.

2012-07-05 Thread Yonik Seeley
On Thu, Jul 5, 2012 at 10:59 AM, Dawid Weiss dawid.we...@gmail.com wrote:
 According to JSON RFC:

 http://tools.ietf.org/html/rfc4627#section-3

 JSON text SHALL be encoded in Unicode.

One of my little pet peeves with the RFC - I think this was a bad
requirement.  JSON should have been text, and then their should have
been an optional way to detect encoding if other mechanisms don't
cover it (like HTTP headers, etc).  This effectively means that
something like
[hi] is not valid JSON for many of you reading this email (if your
email client is internally representing it as something other than
unicode encoded for example).


 We could just enforce/require UTF-8?

Yes, Solr has normally always required/assumed UTF-8 for config files.
 It's simply an oversight in any places that don't.

-Yonik
http://lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Question about solr config files encoding.

2012-07-05 Thread Uwe Schindler
I just add:

Solr's XML files are parsed according to XML spec, so you can choose any
charset, you only have to define it according to XML spec! Also XML POST to
updatehandler can be any encoding (it does not need to be declared in header
anymore, the ?xml... header is fine). There is already a test! I Fixed all
this in endless sessions, but I was happy to do it, as my favourite data
format is: XML :-) [I refuse to fix this for DIH, but that's another story,
SOLR-2347].

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: Thursday, July 05, 2012 5:43 PM
 To: dev@lucene.apache.org
 Subject: Re: Question about solr config files encoding.
 
 On Thu, Jul 5, 2012 at 10:59 AM, Dawid Weiss dawid.we...@gmail.com
 wrote:
  According to JSON RFC:
 
  http://tools.ietf.org/html/rfc4627#section-3
 
  JSON text SHALL be encoded in Unicode.
 
 One of my little pet peeves with the RFC - I think this was a bad
requirement.
 JSON should have been text, and then their should have been an optional
way
 to detect encoding if other mechanisms don't cover it (like HTTP headers,
etc).
 This effectively means that something like [hi] is not valid JSON for
many of
 you reading this email (if your email client is internally representing it
as
 something other than unicode encoded for example).
 
 
  We could just enforce/require UTF-8?
 
 Yes, Solr has normally always required/assumed UTF-8 for config files.
  It's simply an oversight in any places that don't.
 
 -Yonik
 http://lucidimagination.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4188) Storing Shapes shouldn't be Strategy dependent

2012-07-05 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407221#comment-13407221
 ] 

David Smiley commented on LUCENE-4188:
--

RE createStoredField():
bq. I don't really like this. It is barely an improvement on the current code. 
The whole point of this issue is that the storing of Shapes shouldn't be 
related to Strategys. I think we should be explicit and require the consumer 
code (Solr or something else) decides how it wants to store Shapes. If you want 
a convenience method then it should be static, illustrating it is a utility 
that the Strategys cannot override. Ideally I would like it somewhere else 
entirely.

The client doesn't have to use this method, but in all tests + the Solr 
adapters I don't think there's a reason not to.  I found it to be useful, and 
to provide a place to document how it is recommended to store the shape (notice 
I even included the one-liner source in the javadocs).  An advantage of it 
being an instance method on the Strategy is that it has convenient access to 
both the field name  SpatialContext.  I could make this method final, and I 
could add more documentation that makes it clear that the user is free to store 
the shape in any way they wish since the spatial module doesn't care.

 Storing Shapes shouldn't be Strategy dependent
 --

 Key: LUCENE-4188
 URL: https://issues.apache.org/jira/browse/LUCENE-4188
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
Assignee: David Smiley
 Attachments: LUCENE-4188_remove_field_storage_from_createField.patch


 The logic for storing Shape representations seems to be different for each 
 Strategy.  The PrefixTreeStrategy impls store the Shape in WKT, which is nice 
 if you're using WKT but not much help if you're not.  BBoxStrategy doesn't 
 actually store the Shape itself, but a representation of the bounding box.  
 TwoDoubles seems to follow the PrefixTreeStrategy approach, which is 
 surprising since it only indexes Points and they could be stored without 
 using WKT.
 I think we need to consider what storing a Shape means.  If we want to store 
 the Shape itself, then that logic should be standardised and done outside of 
 the Strategys since it is not really related to them.  If we want to store 
 the terms being used by the Strategys to make Shapes queryable, then we need 
 to change the logic in the Strategys to actually do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Question about solr config files encoding.

2012-07-05 Thread Uwe Schindler
 updatehandler can be any encoding (it does not need to be declared in
header

...HTTP header..., sorry

  -Original Message-
  From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
  Seeley
  Sent: Thursday, July 05, 2012 5:43 PM
  To: dev@lucene.apache.org
  Subject: Re: Question about solr config files encoding.
 
  On Thu, Jul 5, 2012 at 10:59 AM, Dawid Weiss dawid.we...@gmail.com
  wrote:
   According to JSON RFC:
  
   http://tools.ietf.org/html/rfc4627#section-3
  
   JSON text SHALL be encoded in Unicode.
 
  One of my little pet peeves with the RFC - I think this was a bad
 requirement.
  JSON should have been text, and then their should have been an
  optional
 way
  to detect encoding if other mechanisms don't cover it (like HTTP
  headers,
 etc).
  This effectively means that something like [hi] is not valid JSON
  for
 many of
  you reading this email (if your email client is internally
  representing it
 as
  something other than unicode encoded for example).
 
 
   We could just enforce/require UTF-8?
 
  Yes, Solr has normally always required/assumed UTF-8 for config files.
   It's simply an oversight in any places that don't.
 
  -Yonik
  http://lucidimagination.com
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Question about solr config files encoding.

2012-07-05 Thread Dawid Weiss
Sure, I don't have a problem with XML. I'll assume UTF-8 for json and
go through the issues later today.

Dawid

On Thu, Jul 5, 2012 at 5:47 PM, Uwe Schindler u...@thetaphi.de wrote:
 I just add:

 Solr's XML files are parsed according to XML spec, so you can choose any
 charset, you only have to define it according to XML spec! Also XML POST to
 updatehandler can be any encoding (it does not need to be declared in header
 anymore, the ?xml... header is fine). There is already a test! I Fixed all
 this in endless sessions, but I was happy to do it, as my favourite data
 format is: XML :-) [I refuse to fix this for DIH, but that's another story,
 SOLR-2347].

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: Thursday, July 05, 2012 5:43 PM
 To: dev@lucene.apache.org
 Subject: Re: Question about solr config files encoding.

 On Thu, Jul 5, 2012 at 10:59 AM, Dawid Weiss dawid.we...@gmail.com
 wrote:
  According to JSON RFC:
 
  http://tools.ietf.org/html/rfc4627#section-3
 
  JSON text SHALL be encoded in Unicode.

 One of my little pet peeves with the RFC - I think this was a bad
 requirement.
 JSON should have been text, and then their should have been an optional
 way
 to detect encoding if other mechanisms don't cover it (like HTTP headers,
 etc).
 This effectively means that something like [hi] is not valid JSON for
 many of
 you reading this email (if your email client is internally representing it
 as
 something other than unicode encoded for example).


  We could just enforce/require UTF-8?

 Yes, Solr has normally always required/assumed UTF-8 for config files.
  It's simply an oversight in any places that don't.

 -Yonik
 http://lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux-Java8-64 - Build # 5 - Failure!

2012-07-05 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux-Java8-64/5/

1 tests failed.
REGRESSION:  
org.apache.solr.handler.component.SpellCheckComponentTest.testPerDictionary

Error Message:
mismatch: '0'!='2' @ spellcheck/suggestions/bar/startOffset

Stack Trace:
java.lang.RuntimeException: mismatch: '0'!='2' @ 
spellcheck/suggestions/bar/startOffset
at 
__randomizedtesting.SeedInfo.seed([9AA8B04990EA81A6:5DA644E1988694EE]:0)
at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:547)
at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:495)
at 
org.apache.solr.handler.component.SpellCheckComponentTest.testPerDictionary(SpellCheckComponentTest.java:102)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:474)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 8725 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux-Java8-64/checkout/build.xml:29:
 The following error occurred while 

[jira] [Updated] (SOLR-3355) Add shard name to SolrCore statistics

2012-07-05 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3355:
--

Fix Version/s: 5.0

I've added collection as well and wrote a couple tests for this - I'll commit 
shortly.

 Add shard name to SolrCore statistics
 -

 Key: SOLR-3355
 URL: https://issues.apache.org/jira/browse/SOLR-3355
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Michael Garski
Assignee: Mark Miller
Priority: Trivial
 Fix For: 4.0, 5.0

 Attachments: SOLR-3355.patch


 The JMX stats of the core do not expose the shard name that it is hosting, 
 which could be of use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3355) Add shard name to SolrCore statistics

2012-07-05 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3355:
--

Attachment: SOLR-3355.patch

 Add shard name to SolrCore statistics
 -

 Key: SOLR-3355
 URL: https://issues.apache.org/jira/browse/SOLR-3355
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Michael Garski
Assignee: Mark Miller
Priority: Trivial
 Fix For: 4.0, 5.0

 Attachments: SOLR-3355.patch, SOLR-3355.patch


 The JMX stats of the core do not expose the shard name that it is hosting, 
 which could be of use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-4.x - Build # 207 - Failure

2012-07-05 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-4.x/207/

3 tests failed.
REGRESSION:  
org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta3.testCompositePk_DeltaImport_delete

Error Message:
Exception during query

Stack Trace:
java.lang.RuntimeException: Exception during query
at 
__randomizedtesting.SeedInfo.seed([2AB24089FB28443:2396D216FACFAC2F]:0)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:461)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:428)
at 
org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta3.testCompositePk_DeltaImport_delete(TestSqlEntityProcessorDelta3.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
Caused by: java.lang.RuntimeException: REQUEST FAILED: xpath=//*[@numFound='0']
xml response was: ?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint 

[jira] [Created] (LUCENE-4197) Small improvements to Lucene Spatial Module for v4

2012-07-05 Thread David Smiley (JIRA)
David Smiley created LUCENE-4197:


 Summary: Small improvements to Lucene Spatial Module for v4
 Key: LUCENE-4197
 URL: https://issues.apache.org/jira/browse/LUCENE-4197
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
 Fix For: 4.0


This issue is to capture small changes to the Lucene spatial module that don't 
deserve their own issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4197) Small improvements to Lucene Spatial Module for v4

2012-07-05 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4197:
-

Attachment: 
LUCENE-4197_SpatialArgs_doesn_t_need_overloaded_toString()_with_a_ctx_param_.patch

SpatialArgs.toString() shouldn't be overloaded with a ctx -- not needed for its 
purpose.  Nobody was calling it any way.  What instigated this finding was that 
this class depended on SimpleSpatialContext, gone in 0.3-SNAPSHOT of Spatial4j.

 Small improvements to Lucene Spatial Module for v4
 --

 Key: LUCENE-4197
 URL: https://issues.apache.org/jira/browse/LUCENE-4197
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
 Fix For: 4.0

 Attachments: 
 LUCENE-4197_SpatialArgs_doesn_t_need_overloaded_toString()_with_a_ctx_param_.patch


 This issue is to capture small changes to the Lucene spatial module that 
 don't deserve their own issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407365#comment-13407365
 ] 

Robert Muir commented on LUCENE-4190:
-

{quote}
I think that the way to bound the namespace of files is to put everything in 
a subdirectory of the index directory chosen by the user and control the name 
of that subdirectory, making it clear that this is semi-private to Lucene and 
that all files in that subdirectory are fair game.
{quote}

Well there are a couple challenges with that I think:
1. subdirectories currently are a foreign concept to Directory, we would have 
to make some serious changes there to support subdirectories.
2. Lucene 3.x and Lucene4-alpha indexes still need to be supported, and we dont 
want to leave behind baggage when we merge, so the transition would be tricky.
3. the user could also do this on their own right? e.g. we still have the same 
situation we have currently, where anything in that directory can get deleted 
by lucene, its just underneath another layer.


 IndexWriter deletes non-Lucene files
 

 Key: LUCENE-4190
 URL: https://issues.apache.org/jira/browse/LUCENE-4190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4190.patch, LUCENE-4190.patch, LUCENE-4190.patch


 Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog 
 post: 
 http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
 IndexWriter will now (as of 4.0) delete all foreign files from the index 
 directory.  We made this change because Codecs are free to write to any files 
 now, so the space of filenames is hard to bound.
 But if the user accidentally uses the wrong directory (eg c:/) then we will 
 in fact delete important stuff.
 I think we can at least use some simple criteria (must start with _, maybe 
 must fit certain pattern eg _base36(_X).Y), so we are much less likely to 
 delete a non-Lucene file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3593) add /solr/api/index.html

2012-07-05 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-3593.


Resolution: Fixed

FWIW: i went ahead and used the documentation.html name after all because i 
realized it kept hte page editing simpler.  it was easy to deal with the legacy 
/solr/api, /solr/api/ and /solr/api/index.html type top level links using 
redirects...



 add /solr/api/index.html
 --

 Key: SOLR-3593
 URL: https://issues.apache.org/jira/browse/SOLR-3593
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man

 solr historically only had one version of the javadocs on the site at a time.
 particularly now that we have 3.6.X and 4.X concurrently, this needs to 
 change.  both sets of javadoc are already on the site, and 
 /solr/tutorial.html already links to both versions appropriately but there 
 are still some improvements that should be made...
 * add a /solr/api/index.html file that mirrors the type of inof listed on 
 /core/documentation.html
 ** we could use the same documentation.html name, but since historically 
 lots of people have bookmarked/linked to /solr/api reusing that path as the 
 landing page for finding docs about multiple versions seems better
 ** making this visible will probably mean needing to dial in the existing 
 /solr/api redirect more
 * update the Javadocs link i nthe right nav to link to this page

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4191) Lucene doc pages redirect to api-4_0_0-ALPHA which results in 404

2012-07-05 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407386#comment-13407386
 ] 

Hoss Man commented on LUCENE-4191:
--

BaseTokenFilterFactory no longer exists in the latest version of Solr (most of 
the Factory concepts were refactored up into the Lucene-Core analysis-common 
module) and Google has not yet updated it's crawl of solr javadocs.

Solr 3.6 javadocs are still available, or you can follow links from he Solr 
4.0-ALPHA javadocs over to the Lucene-Core javadocs for classes like 
TokenFilterFactory and AbstractAnalysisFactory ...

http://lucene.apache.org/solr/documentation.html


 Lucene doc pages redirect to api-4_0_0-ALPHA which results in 404
 ---

 Key: LUCENE-4191
 URL: https://issues.apache.org/jira/browse/LUCENE-4191
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.6
Reporter: Chaim Peck
  Labels: documentation

 Try to go to this URL:
 http://lucene.apache.org/solr/api/org/apache/solr/analysis/BaseTokenFilterFactory.html
 The result is that you will be redirected here, which is a 404:
 http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/analysis/BaseTokenFilterFactory.html
 You can still get to the page from google cache:
 http://webcache.googleusercontent.com/search?q=cache:mCJCac4iZ0QJ:lucene.apache.org/solr/api/org/apache/solr/analysis/BaseTokenFilterFactory.html+cd=1hl=enct=clnkgl=us

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: jira tracking of issues fixed in 4.0-ALPHA

2012-07-05 Thread Chris Hostetter

I've seen no comments on this: if anyone objects please speak up or i'll 
move forward with this soon.

: : I think for this case, the much easier fix would be to rename the 4.0 
: : version to 4.0-alpha and create a new 4.0 one. All not yet fixed would 
: : get this new version as fix version.
: 
: Doh! ... why didn't i think of that?
: 
: Anybody object to this sequence?
: 
: In Jira project admin for both SOLR and LUCENE...
:  1) delete version 4.0-ALPHA (it has no issues yet in either project)
:  2) rename version 4.0 to 4.0-ALPHA
:  3) add a new version 4.0
: 
: if/when we get to 4.0-BETA we can do the same thing


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr 3.6.0 javadocs are missing from the site

2012-07-05 Thread Chris Hostetter

: thats the whole problem with /api: its not defined at all.

it has bee nvery clearly defined since it was created: the latest 
javadocs ... just because we no longer explicitly link to it, doesn't 
mean we should stop trying to live up to the point of the link -- 
especially not when it's so fucking easy to do.

: having shit like this just turns into 'lets blame the release manager
: when things change and its not the way i want'.

where do you get that anyone is going to blame release managers for 
something?  having this redirect isn't going to break anything, nor does 
it have anything to do with anything an RM should give a fuck about.

the only reason we had a hicup with it on tuesday was because that was the 
day we made the change from only hosting single copy of the solr javadocs, 
re-using a single path for each new version, to having multiple versions 
with distinct pathes -- and when we made that change we did *NOT* have any 
redirect like this in place at all.

that change could have been made at any time, regardless of wether it 
involved a new release, regardless of wether it was done by an RM, and the 
problem would have been the same: the missing redirect ment old links 
broke.

that was a one time change, that will never affect any other release in 
the future ever again: we just keep adding new directories for the new 
docs.

having this redirect doesn't affect that in any way shape or form

: same goes for download redirect links (I will open an issue tomorrow:
: either we remove these download redirect llinks completely, or we fix
: them to take versions, because having to add ?'s with bogus stuff on

I already opened an issue for that when we noticed this during 3.6 .. no 
one who cares about the google analytics and understands javascript has 
bothered to pick it up...

https://issues.apache.org/jira/browse/LUCENE-3978


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: jira tracking of issues fixed in 4.0-ALPHA

2012-07-05 Thread Uwe Schindler
+1, as it was my idea :)
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de



Chris Hostetter hossman_luc...@fucit.org schrieb:


I've seen no comments on this: if anyone objects please speak up or i'll 
move forward with this soon.

: : I think for this case, the much easier fix would be to rename the 4.0 
: : version to 4.0-alpha and create a new 4.0 one. All not yet fixed would 
: : get this new version as fix version.
: 
: Doh! ... why didn't i think of that?
: 
: Anybody object to this sequence?
: 
: In Jira project admin for both SOLR and LUCENE...
: 1) delete version 4.0-ALPHA (it has no issues yet in either project)
: 2) rename version 4.0 to 4.0-ALPHA
: 3) add a new version 4.0
: 
: if/when we get to 4.0-BETA we can do the same thing


-Hoss

_

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3594) SolrCore() doesn't wait SolrCore.getSearcher() to register _searcher

2012-07-05 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407416#comment-13407416
 ] 

Hoss Man commented on SOLR-3594:


bq. The real question here is unrelated to tests: should the SolrCore 
constructor should wait for a searcher to be registered before returning?

i don't think so. Just because a searcher isn't available yet, doesn't mean the 
SolrCore is unusable - we shouldn't block other uses of the SolrCore just 
because a searcher isn't available yet. the first thread that attempts to use 
getSearcher() is what should block on the listeners (depending on the setting 
of useColdSearcher)

The test failure suggests to me that something is wonky with how were are 
tracking the searcher opens and doing cleanup -- either in SolrCore.close() or 
in the test framework itself.

 SolrCore() doesn't wait SolrCore.getSearcher() to register _searcher
 

 Key: SOLR-3594
 URL: https://issues.apache.org/jira/browse/SOLR-3594
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.4
Reporter: Egor Pahomov
Priority: Minor
  Labels: test
 Attachments: 3594.patch, testSearchersManagement.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 SolrCore() executes SolrCore.getSearcher(...) and returns without checking if 
 getSearcher(...) already registered _searcher. As result: if we have 
 SolrEventListener with slow newSearcher(), we can end test before _searcher 
 registered and get then searcher closes and searcher opens doesn't match. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2012-07-05 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407459#comment-13407459
 ] 

Trey Grainger commented on SOLR-2894:
-

Hi Erik,

Sorry, I missed your original message asking me if I could test out the latest 
patch - I'd be happy to help.  I just tried both your patch and the April 25th 
patch against the Solr 4.0 Alpha revision and neither applied immediately.  
I'll see if I can find some time on Sunday to try to get a revision sorted out 
which will work with the current version.

I think there are some changes in the April 24th patch which may need to be 
re-applied if your changes were based upon the earlier patch.  I'll know more 
once I've had a chance to dig in later this weekend.

Thanks,

-Trey

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
Assignee: Erik Hatcher
 Fix For: 4.0

 Attachments: SOLR-2894.patch, SOLR-2894.patch, 
 distributed_pivot.patch, distributed_pivot.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4198) Allow codecs to index term impacts

2012-07-05 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4198:
---

 Summary: Allow codecs to index term impacts
 Key: LUCENE-4198
 URL: https://issues.apache.org/jira/browse/LUCENE-4198
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/index
Reporter: Robert Muir


Subtask of LUCENE-4100.

Thats an example of something similar to impact indexing (though, his 
implementation currently stores a max for the entire term, the problem is the 
same).

We can imagine other similar algorithms too: I think the codec API should be 
able to support these.

Currently it really doesnt: Stefan worked around the problem by providing a 
tool to 'rewrite' your index, he passes the IndexReader and Similarity to it. 
But it would be better if we fixed the codec API.

One problem is that the Postings writer needs to have access to the Similarity. 
Another problem is that it needs access to the term and collection statistics 
up front, rather than after the fact.

This might have some cost (hopefully minimal), so I'm thinking to experiment in 
a branch with these changes and see if we can make it work well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4198) Allow codecs to index term impacts

2012-07-05 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4198:


Attachment: LUCENE-4198_flush.patch

here's a patch fixing how we compute stats in FreqProxTermsWriter: but the 
codec api is unchanged.

Next ill look at merge, which is trickier, and then see about changing the 
codec api.

 Allow codecs to index term impacts
 --

 Key: LUCENE-4198
 URL: https://issues.apache.org/jira/browse/LUCENE-4198
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/index
Reporter: Robert Muir
 Attachments: LUCENE-4198_flush.patch


 Subtask of LUCENE-4100.
 Thats an example of something similar to impact indexing (though, his 
 implementation currently stores a max for the entire term, the problem is the 
 same).
 We can imagine other similar algorithms too: I think the codec API should be 
 able to support these.
 Currently it really doesnt: Stefan worked around the problem by providing a 
 tool to 'rewrite' your index, he passes the IndexReader and Similarity to it. 
 But it would be better if we fixed the codec API.
 One problem is that the Postings writer needs to have access to the 
 Similarity. Another problem is that it needs access to the term and 
 collection statistics up front, rather than after the fact.
 This might have some cost (hopefully minimal), so I'm thinking to experiment 
 in a branch with these changes and see if we can make it work well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring

2012-07-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407472#comment-13407472
 ] 

Robert Muir commented on LUCENE-4100:
-

I spun off a sub-issue (LUCENE-4198) to see how we can first fix this Codec API 
so that
you don't need an IndexRewriter and this patch could work live.

 Maxscore - Efficient Scoring
 

 Key: LUCENE-4100
 URL: https://issues.apache.org/jira/browse/LUCENE-4100
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs, core/query/scoring, core/search
Affects Versions: 4.0
Reporter: Stefan Pohl
  Labels: api-change, patch, performance
 Fix For: 4.0

 Attachments: contrib_maxscore.tgz, maxscore.patch


 At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient 
 algorithm first published in the IR domain in 1995 by H. Turtle  J. Flood, 
 that I find deserves more attention among Lucene users (and developers).
 I implemented a proof of concept and did some performance measurements with 
 example queries and lucenebench, the package of Mike McCandless, resulting in 
 very significant speedups.
 This ticket is to get started the discussion on including the implementation 
 into Lucene's codebase. Because the technique requires awareness about it 
 from the Lucene user/developer, it seems best to become a contrib/module 
 package so that it consciously can be chosen to be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Multi-thread UpdateProcessor

2012-07-05 Thread Mikhail Khludnev
Hello,

Most times when single thread streaming
http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update is used
I saw lack of cpu utilization at Solr server. Resonable motivation is
utilize more threads to index faster, but it requires more complicated
 client side.
I propose to employ special update processor which can fork the stream
processing onto many threads. If you like it pls vote for
https://issues.apache.org/jira/browse/SOLR-3585 .

Regards

-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


  1   2   >