Re: Classloaders, SPI, Directory ... OSGi?

2016-01-26 Thread Benson Margulies
On Tue, Jan 26, 2016 at 9:35 AM, Uwe Schindler  wrote:
> Hi,
>
>> Down at the bottom level of the SPI mechanism, the API accepts
>> ClassLoader objects. However, the Directory API does not have methods
>> that take a class loader, so, in practical terms, the SPI mechanism
>> always uses the Thread Context Class Loader.
>
> Not entirely true. We use context classloader as per ServiceLoader spec, but 
> we also inspect Lucene's class loader.
>
>> I just spent some time sorting out a muddle in an OSGi application
>> using Lucene, eventually resolved by setting the TCCL correctly.
>>
>> I wonder what people think of the following possible activities:
>>
>> 1: add an explicit ClassLoader the Directory classes (and whatever
>> below them) to allow applications to take advantage of the SPI's API.
>
> Directory has nothing to do with SPI. It is only used by Analyzers and 
> DirectoryReader/IndexWriter.

The stacktrace in question enters through Directory. You open a
directory, it opens a segment, it tries to find a codec. It fails to
find the codec if the TCCL does not contain the right copy of the
lucene-codecs jar. So, I submit, it makes sense for Directory.open to
have doc to have a note about this; it could possibly make sense to
have an explicit class loader, but I'm not motivated to push it. I
accept that the thing to note is more complex than I wrote quickly in
this this. I can fill in more details if you are interested.

>
> But the problem is Codec.forName() and others are static methods, so it is 
> not easy to change this to other classloaders. This is similar to other stuff 
> like Java's Locales, Charsets, Image in Java, but also TIKA stuff. They all 
> depend on static SPI behaviour, not compatible with differen class loaders.
>
> To tell Lucene to load its stuff from other classloaders, we have methods in 
> Codec, PostingsFormats,... to revisit classloaders and collect new SPIs from 
> there (and you can pass one).
>
>> 2: add a bit of javadoc to the Directory classes that reminds people
>> that these APIs consume the TCCL.
>
> As above, it does, but not only. Not sure what your problem is...But, I think 
> the issue is that OSGI tries to load every JAR file into its own classloader 
> and this breaks it. So lucene-core.jar will not see any other Jar files, so 
> it falls back to context class loader.
>
> It is well known that ServiceLoader and other Java stuff around that does not 
> play nicely with all OSGI complexity (see below).
>
>> 3: nothing.
>>
>> Meanwhile, I've been thinking about a proposal to make Lucene build
>> OSGi bundles itself, rather than leaving the job to Apache Servicemix.
>>
>> The hard part here is the OSGi principle that every Java _package_ has
>> a single home Jar file. I don't think I have the stomach lining to try
>> to talk anyone around here into some sort of major refactoring to that
>> end. The servicemix jars work around this adequately. One thing the
>> servicemix jars don't deal with is lucene-codecs.
>>
>> I guess I'd like to start by asking how the dev community feels about
>> the whole idea of getting native support for OSGi into Lucene. If
>> there's a strong reaction of _ugh_, I won't push it.
>
> _GHHHhhh_
>
> OSGI is incompatible with Lucene's pacakging, sorry, that is hard to fix. And 
> in my personal opinion, people who want to make OSGI modules out of Lucene 
> may do this separately, but it will result in major havoc, sorry.
>
> This is the same discussion like people requesting "serialization". To make 
> Lucene work with you own OSGI stuff, you have to repackage your Lucene 
> dependencies into "one big" single JAR file, otherwise you would also get 
> huge slowdowns because of classloader separation (OSGI will add proxy classes 
> everywhere...).

It's not nearly as bad as that, it turns out.

Servicemix builds three OSGi bundles for Lucene: core, analyzers, and
queryparsers. It's not one big jar.

https://github.com/apache/servicemix-bundles/blob/master/lucene-5.4.0/pom.xml
or 
https://github.com/apache/servicemix-bundles/blob/master/lucene-analyzers-common-5.4.0/pom.xml.

So, ugh registered, but I ask you to look at those and see if they
reduce the size of the ugh at all in contemplating them as part of
Lucene itself.





>
> Uwe
>
>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Classloaders, SPI, Directory ... OSGi?

2016-01-26 Thread Benson Margulies
Folks,

Down at the bottom level of the SPI mechanism, the API accepts
ClassLoader objects. However, the Directory API does not have methods
that take a class loader, so, in practical terms, the SPI mechanism
always uses the Thread Context Class Loader.

I just spent some time sorting out a muddle in an OSGi application
using Lucene, eventually resolved by setting the TCCL correctly.

I wonder what people think of the following possible activities:

1: add an explicit ClassLoader the Directory classes (and whatever
below them) to allow applications to take advantage of the SPI's API.

2: add a bit of javadoc to the Directory classes that reminds people
that these APIs consume the TCCL.

3: nothing.

Meanwhile, I've been thinking about a proposal to make Lucene build
OSGi bundles itself, rather than leaving the job to Apache Servicemix.

The hard part here is the OSGi principle that every Java _package_ has
a single home Jar file. I don't think I have the stomach lining to try
to talk anyone around here into some sort of major refactoring to that
end. The servicemix jars work around this adequately. One thing the
servicemix jars don't deal with is lucene-codecs.

I guess I'd like to start by asking how the dev community feels about
the whole idea of getting native support for OSGi into Lucene. If
there's a strong reaction of _ugh_, I won't push it.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Sharing a class across cores

2015-11-11 Thread Benson Margulies
What is the connection of a blob of data and a class in a class
loader? Is it a class of your own that you're using to store the data?

Solr can't change fundamental facts about class loader; if an object
of a class needs to be shared across class loaders, it has to be
loaded into a common parent. If you don't want to do that broadly,
you'll need indeed to factor out a jar for the job.

If it isn't a special class, but rather just an instance of some
boring ordinary class and your problem is sharing the _reference_,
consider JNDI.



On Wed, Nov 11, 2015 at 7:02 PM, Gus Heck  wrote:
> Yes asked by a colleague :). The chat session is now in our jira ticket :).
>
> However, my take on it is that this seems like a pretty broad brush to paint
> with to move *all* our classes up and out of the normal core loading
> process. I assume there are good reasons for segregating this stuff into
> separate class loaders to begin with. It would also be fairly burdensom to
> make a separate jar file to break out this one component...
>
> I really just want a way to stash the map in a place where other cores can
> see it (and thus I can appropriately synchronize things so that the loading
> only happens once). I'm asking because it seems like surely this must be a
> solved problem... if not, it might be easiest to just solve it by adding
> some sort of shared resources facility to CoreContainer?
>
> -Gus
>
> On Wed, Nov 11, 2015 at 6:54 PM, Shawn Heisey  wrote:
>>
>> On 11/11/2015 4:11 PM, Gus Heck wrote:
>> > I have a case where a component loads up a large CSV file (2.5 million
>> > lines) to build a map. This worked ok in a case where we had a single
>> > core, but it isn't working so well with 40 cores because each core loads
>> > a new copy of the component in a new classloader and I get 40 new
>> > versions of the same class each holding it's own private static final
>> > map (one for each core). Each line is small, but a billion of anything
>> > gets kinda heavy. Is this the intended class loading behavior?
>> >
>> > Is there some where that one can cause a class to be loaded in a parent
>> > classloader above the core so that it's loaded just once? I want to load
>> > it in some way that leverages standard solr resource loading, so that
>> > I'm not hard coding or setting sysprops just to be able to find it.
>> >
>> > This is in a copy of trunk from about a month ago... so 6.x stuff is
>> > mostly available.
>>
>> This sounds like a question that I just recently answered on IRC.
>>
>> If you remove all  elements from your solrconfig.xml files and
>> place all extra jars for Solr into ${solr.solr.home}/lib ... Solr will
>> load those jars before any cores are created and they will be available
>> to all cores.
>>
>> There is a minor bug with this that will be fixed in Solr 5.4.0.  It is
>> unlikely that this will affect third-party components, but be aware that
>> until 5.4, jars in that lib directory will be loaded twice by older 5.x
>> versions.
>>
>> https://issues.apache.org/jira/browse/SOLR-6188
>>
>> Thanks,
>> Shawn
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>
>
> --
> http://www.the111shift.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr search

2015-09-30 Thread Benson Margulies
On Wed, Sep 30, 2015 at 9:11 PM, vetrik kumaran murugesan
 wrote:
> Hi Upayavira,
>
> Thanks for your quick reply.
>
> Files above use GPL and MIT license , like junit4-ant 2.1.13 has  GPL and
> MIT.

There is no MIT or GPL to Xerces or ant, they are Apache products. Why
makes you think otherwise?



>
> Regards,
>
> Vetrik
>
>
>
> 2015-09-30 17:27 GMT-05:00 Upayavira :
>>
>> On Wed, Sep 30, 2015, at 10:55 PM, vetrik kumaran murugesan wrote:
>>
>> Dear Team,
>>
>> I am wondering can we use Solr 5.3 search  server  without the following
>> jar files,
>>
>> commonsfileupload: 1.2.1
>> xerces : xercesImpl : 2.9.1
>> org.apache.ant : ant : 1.8.2
>>
>>
>> If yes , how can I do it. I am trying to evaluate Solr 5.3 , you input is
>> valuable and appreciated.
>>
>>
>> Why would you want to do that?
>>
>> xerces is an XML parser. It wouldn't surprise me if Solr couldn't load its
>> configs without it. Seems kinda important.
>>
>> All the files you mention above are (as far as I understand) Apache code,
>> therefore Apache licensed. Why is their use a factor in your evaluation?
>>
>> Upayavira
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Ant/JUnit/Bash bug with RTL languages?

2015-08-21 Thread Benson Margulies
On Fri, Aug 21, 2015 at 10:18 AM, Mike Drob  wrote:
> Yea, I'm fully ready to hear that it's an issue with bash. Didn't mean to
> cast any aspersions on the test framework, mostly was curious if anybody had
> ever thought about this before.

Not bash, X windows or whatever implements your actual UI output :-)

>
> On Fri, Aug 21, 2015 at 9:13 AM, Benson Margulies 
> wrote:
>>
>> Isn't this all about how your console does the Unicode bidi algo, and
>> not about anything in the code?
>>
>>
>> On Fri, Aug 21, 2015 at 10:12 AM, Mike Drob  wrote:
>> > Hello,
>> >
>> > I noticed that when running tests, if the language selected is RTL then
>> > the
>> > "JUnit says hello" output is backwards. However, if I copy the output
>> > and
>> > try to paste it into firefox or gedit then the text is properly
>> > right-to-left.
>> >
>> > For example, when selecting hebrew, on my system it prints
>> > " says [shin-lamed-vav-mem]" instead of starting with [shin] on
>> > the
>> > right.
>> >
>> > This shouldn't be a high priority, since the tests themselves still
>> > pass,
>> > but I was wondering if that's something that we can fix or if the error
>> > is
>> > in a lower level - like the junit libs, or maybe even bash. Anybody have
>> > any
>> > ideas?
>> >
>> > Mike
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Ant/JUnit/Bash bug with RTL languages?

2015-08-21 Thread Benson Margulies
Isn't this all about how your console does the Unicode bidi algo, and
not about anything in the code?


On Fri, Aug 21, 2015 at 10:12 AM, Mike Drob  wrote:
> Hello,
>
> I noticed that when running tests, if the language selected is RTL then the
> "JUnit says hello" output is backwards. However, if I copy the output and
> try to paste it into firefox or gedit then the text is properly
> right-to-left.
>
> For example, when selecting hebrew, on my system it prints
> " says [shin-lamed-vav-mem]" instead of starting with [shin] on the
> right.
>
> This shouldn't be a high priority, since the tests themselves still pass,
> but I was wondering if that's something that we can fix or if the error is
> in a lower level - like the junit libs, or maybe even bash. Anybody have any
> ideas?
>
> Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Coverity scan results of Lucene

2015-07-14 Thread Benson Margulies
On Tue, Jul 14, 2015 at 10:59 AM, Rishabh Patel
 wrote:
> org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java:
>

What does coverity complain of? Why do you, personally, think that the
complaint is legitimate?


>
> On Tue, Jul 14, 2015 at 10:44 AM, Dawid Weiss  wrote:
>>
>> The 444 defects is an overwhelming number. Most of those automated
>> tools detect things that turn to be valid code (upon closer
>> inspection). Could you start by listing, say, the first 5 defects that
>> actually make sense and are indeed flawed code that should be fixed?
>>
>> Dawid
>>
>> On Tue, Jul 14, 2015 at 4:33 PM, Rishabh Patel
>>  wrote:
>> > Hello!
>> >
>> > I scanned the Lucene project with Coverity scanner. 444 defects have
>> > been
>> > detected.
>> > Please check the attached report on the breakup of the issues. Some of
>> > the
>> > issues are false positives.
>> >
>> > I would like to volunteer for fixing these defects.
>> >
>> > Before I start, could you please tell me whether I should I create a
>> > single
>> > JIRA for a kind of issue (e.g. "Concurrent data access" or "Null pointer
>> > exception") or should multiple issues be created according to the module
>> > of
>> > the files to be modified?
>> >
>> > --
>> > Sincerely,
>> > Rishabh Patel
>> >
>> >
>> >
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>
>
> --
> Sincerely,
> Rishabh Patel
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Custom PostingsFormat SPILoader issues

2015-03-13 Thread Benson Margulies
In Solr you need a CodecFactory to deliver a Codec that happens to use
your PostingFormat. The CodecFactory can set any params you like.


On Fri, Mar 13, 2015 at 12:13 PM, Tom Burton-West  wrote:
> Thanks Uwe,
>
> I'm pretty much going from what Hoss told me in the thread
> here::http://lucene.472066.n3.nabble.com/How-to-configure-Solr-PostingsFormat-block-size-tt4179029.html
>
> All I am really trying to do is instantiate the regular
> Lucene41PostingsFormat with non-default minTermBlockSize and
> maxTermBlockSize parameters.  However, that apparently can't be done in
> schema.xml.   So Hoss suggested a wrapper class around PostingsFormat that
> instantiates the Lucene41PostingsFormat with the desired parameters:
>
> "where does that leave you as a solr user who wants to write a plugin, since
> Solr only allows you to configure the SPI name (no constructor args) via
> 'postingFormat="foo"' the anwser is that instead of writing a subclass, you
> would have to write a small proxy class, something like...
>
> public final class MyPfWrapper extends PostingFormat {
>   PostingFormat pf = new Lucene50PostingsFormat(42, 9);
>   public MyPfWrapper() {
> super("MyPfWrapper");
>   }
> 
> rest of code skipped.
>
> I don't really understand SPI and class loaders, but you are right this
> class is a subclass of PostingsFormat not Codecs.   So is there an issue
> with the whole idea, or is there just some subtlety of class loading and the
> SPILoader I'm not understanding?
>
> Tom
>
>
>
>
>
>
> On Fri, Mar 13, 2015 at 11:35 AM, Uwe Schindler  wrote:
>>
>> Hi,
>>
>>
>>
>> To me this looks like the implementing class is not a real subclass of
>> org.apache.lucene.codecs.Codec – because you said “PostingsFormat” not
>> “Codec” in your mail? If you just want to create your own PostingsFormat,
>> you have to put it into the other META-INF file for
>> org.apache.lucene.codecs.PostingsFormats. Creating own codecs is in most
>> cases not needed, most people are only interested in postings formats.
>>
>>
>>
>> Another reason for this could be that the JAR file with the codec is in a
>> different classloader than the one of lucene-core.jar.
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>
>> http://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Is the Solr CodecFactory doc a bit off-kilter?

2015-02-10 Thread Benson Margulies
http://wiki.apache.org/solr/SimpleTextCodecExample

Why does it have:



and then:

postingsFormat="SimpleText"

Shouldn't the postingFormat match the codec factory name? For that
matter, how much of this is obsolete? Is there better doc elsewhere or
does this need help?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Move trunk to Java 8

2014-09-21 Thread Benson Margulies
I wish that the API improvements Rob Muir and I made to the analysis
chain could be released in the foreseeable future, and I wish a little
that they could be released in a version that does not require Java 8.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Move trunk to Java 8

2014-09-15 Thread Benson Margulies
On Mon, Sep 15, 2014 at 2:45 PM, david.w.smi...@gmail.com <
david.w.smi...@gmail.com> wrote:

> Ryan,
> I’m unclear on what makes a “procedural vote” as such.  This seems to me
> to be about code modifications — in a big way as it’s a large change to the
> codebase.
>

David, one way out of this is that a commit is a commit. A sufficiently
unhappy PMC member can veto the commit, and thus force more discussion. If
no PMC members are sufficiently unhappy, it stands.


>
> ~ David
>


Re: [VOTE] Move trunk to Java 8

2014-09-14 Thread Benson Margulies
On Sun, Sep 14, 2014 at 3:10 PM, Jack Krupansky 
wrote:

>   (Hmmm... I wonder what Heliosearch’s Java support policy is.)
>
> There was a Lucene wiki page for the Java 1.4 to 1.5 transition... is any
> of that thinking relevant to this 1.7 to 1.8 transition?
>
> See:
> https://wiki.apache.org/lucene-java/Java_1.5_Migration
>
> IOW, how much of this issue is specific to Java 8, as opposed to ANY Java
> migration with trunk and the stable branch?
>
> And, to what extent does the issue relate to how imminent a 5.0 release is
> expected to be, and Java 7 EOL? EOL relates to office Oracle support, but
> that says nothing about the actual usage at major Lucene user sites.
>

Another thought: what if there were three branches:

 -- 'The next generation' -- leading edge, radical, new stuff. Java 8.
 -- 'The next major release' -- Incompatible API changes, but Java version
constrained for now to 1.7.
 -- the currently-maintained stream of minor releases.





>
> -- Jack Krupansky
>
>  *From:* Ryan Ernst 
> *Sent:* Sunday, September 14, 2014 11:35 AM
> *To:* dev@lucene.apache.org
> *Subject:* Re: [VOTE] Move trunk to Java 8
>
>
> Nothing would force you to use java 8 features in anything you worked on.
> But they would be available. And if the branches are always "in sync", what
> is the point of having 2 branches? There would never be a reason for a
> major release.
> On Sep 14, 2014 9:33 AM, "Anshum Gupta"  wrote:
>
>> I don't have a really strong opinion on 'Should we move to Java 8' but at
>> the same time would not really be happy dealing with 2 continuously
>> diverging branches. I wouldn't want to rewrite the same implementation
>> twice over and get it to work.
>>
>> I am not sure about others but I think it will also make it tougher to
>> attract contributors knowing they might have to deal with 2 divergent
>> branches.
>>
>> At the same time, if there's a reason compelling enough that makes the
>> lives of everyone involved better (read, easier), I'll be in for it.
>>
>>
>> On Fri, Sep 12, 2014 at 8:41 AM, Ryan Ernst  wrote:
>>
>>> It has been 6 months since Java 8 was released.  It has proven to be
>>> both stable (no issues like with the initial release of java 7) and
>>> faster.  And there are a ton of features that would make our lives as
>>> developers easier (and that can improve the quality of Lucene 5 when
>>> it is eventually released).
>>>
>>> We should stay ahead of the curve, and move trunk to Java 8.
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>>
>> --
>>
>> Anshum Gupta
>> http://www.anshumgupta.net
>>
>


Re: [VOTE] Move trunk to Java 8

2014-09-12 Thread Benson Margulies
"Corporate overlords" isn't helpful. Lucene is what it is because of its
wide adoption. That includes big, small, smart, and stupid organizations. I
don't think that an infrastructure component like Lucene needs to be 'ahead
of the curve'. It should aim to be widely adoptable. To me, that means
moving to a new Java requirement after we observe it is semi-ubiquitous. If
1.8 offered some game-changing JVM feature that would allow a giant leap
forward in Lucene, then that would be different. So far, all I see are some
minor programming conveniences.

However, I'm just one very small scale committer, and I've consumed enough
oxygen on this topic.


Re: [VOTE] Move trunk to Java 8

2014-09-12 Thread Benson Margulies
On Fri, Sep 12, 2014 at 3:35 PM, Robert Muir  wrote:

> On Fri, Sep 12, 2014 at 3:31 PM, Chris Hostetter
>  wrote:
> >
> > b) that your argument against benson's claims seemed missleading: just
> > because Oracle is EOLing doesn't mean people won't be using OpenJDK; even
> > if they are using Oracle's JDK, if they are large comercial organizations
> > they might pay oracle to keep using it for a long time.
> >
>
> Its not misleading at all, its being practical. If people want to use
> old jvm versions, good for them. But if they open a corruption bug
> with one of these "commercial" versions, then my only choice is to
> close as "wont fix". So they might as well just use an old lucene
> version, too.
>

Here's what I know. Over the last few years, the large entities my employer
sells to have been very slow to move to new Java versions. Why? I dunno,
maybe all of them have Mordac working there. Do they pay for security fixes
from Oracle? Or do they just stick their heads in the sand? I can't tell
you. One that is on my mind right now may just barely make it to 1.7 this
year.

We (meaning this project, not my employer) generally require that
'significant' changes go into major releases. So, that ties together the
JVM version and these changes. Thus my desire to see a way to get the
pending trunk work to people who are not moving to 1.8 any time soon. An
alternative would be to have a different policy for what can go into a 4.x.
I thought I saw a message go by about a 5x branch the other day, so perhaps
things are already exactly what I am asking for, and I apologize for the
noise. Given how long it is likely to be until 6.0, I am not here to argue
that 6.0 should not require 1.8. I like a nice lambda expression as well as
the next guy.




>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [VOTE] Move trunk to Java 8

2014-09-12 Thread Benson Margulies
If we release the current contents of trunk first, I'm OK with this, not
that i have a veto. There are many large organizations of the sort that use
Lucene & Solr that will not be moving to 8 for years yet. If the current
trunk content is marooned until they move to 8, I will be sad.

On Fri, Sep 12, 2014 at 2:39 PM, Chris Hostetter 
wrote:

>
> : faster.  And there are a ton of features that would make our lives as
> : developers easier (and that can improve the quality of Lucene 5 when
> : it is eventually released).
>
> Examples please?
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


RE: Hints on constructing/running Solr analyzer chains standalone

2014-07-13 Thread Benson Margulies
Uwe, the last time I looked, Solr was perfectly cheerful about using
analysis components that did not advertise themselves via the factory SPI
system.  So someone might want to go further than calling the available
methods.
On Jul 12, 2014 7:24 PM, "Uwe Schindler"  wrote:

> The factories are part of Lucene, Solr is just using them. To list of
> available factories (in classpath) use
> (Tokenizer|TokenFilter|CharFilter)Factory.availableX() methods (to
> list all their names). You can invoke them using the corresponding
> forName() method and build an Analyzer from them. The latter has to be done
> manually, there is no general simple thing like Solr's chains. But that is
> quite easy to implement (if you really need an Analyzer instance). To just
> build a TokenStream for analysis, the factories is all you need (in fact
> Solr's chain just calls the factories in order... and returns it as
> TokenStreamComponents).
> You don't need to deal with SPI, just make the factories available in
> classpath, Lucene finds them automatically.
>
> For loading resources, use Lucene's ResourceLoader, which gets passed to
> the Factory's method inform() method. You only *need* to pass one, if and
> only if the factory implements ResourceLoaderAware. There are several
> ResourceLoaders available, Solr has its own very complicated one, but the
> default Lucene ones are: ClasspathResourceLoader, FilesystemResourceLoader.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> > Sent: Saturday, July 12, 2014 7:17 PM
> > To: dev@lucene.apache.org
> > Subject: Re: Hints on constructing/running Solr analyzer chains
> standalone
> >
> > I don't want to read the schema.xml, but I do want to create factories
> using
> > the same parameters they use in schema. So, it looks like I need to play
> > around with ResourceLoaders and maybe SPI loaders, so things like
> wordlists
> > get loaded.
> >
> > Starting from FieldAnalyzer turned out to be a dead-end because it was
> using
> > pre-initialized field definitions. But starting again from Test cases
> seem to be
> > somewhat more productive.
> >
> > The idea for the project is to give a web UI where a user can quickly
> put one
> > or more analyzer stacks together and see how it/they perform against text
> > (multiple texts). A bit similar to FieldAnalyzer but allow to have
> multiple
> > stacks side-by-side and NOT needing to reload the core to add new ones.
> > Then, generate the XML definition, ready for pasting in. That's the
> target
> > anyway.
> >
> > Regards,
> >Alex.
> > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources:
> > http://www.solr-start.com/ and @solrstart Solr popularizers community:
> > https://www.linkedin.com/groups?gid=6713853
> >
> >
> > On Sat, Jul 12, 2014 at 11:34 PM, Uwe Schindler  wrote:
> > > Hi,
> > >
> > >
> > >> H, I think it's reasonably straightforward to construct what is
> > >> implied by a Solr analysis chain in Lucene, would that do? Or do you
> > >> want to read a schema.xml file outside Solr?
> > >>
> > >> If the former, then you can pretty much skip the Solr code entirely.
> > >
> > > Read this:
> > >
> > http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/pa
> > > ckage-summary.html#package_description
> > >
> > > To do analysis, Solr is not needed at all, unless you want to read
> > schema.xml files. If you want to do this, that is quite easy using the
> > IndexSchema class. You can then get the analyzer from the field type or
> field
> > name. How to use the analyzer is described above and unrelated to Solr.
> > >
> > > Uwe
> > >
> > >> On Sat, Jul 12, 2014 at 6:59 AM, Alexandre Rafalovitch
> > >> 
> > >> wrote:
> > >> > Hello,
> > >> >
> > >> > I am interested in creating and running Solr analyzer chains
> > >> > outside of normal process (no live Solr). Just construct a chain,
> > >> > feed it tokens and see what happens.
> > >> >
> > >> > I would appreciate any hints on what that takes and whether there
> > >> > are any hidden/weird dependencies (e.g. for resource discoveries).
> > >> > I tried tracing through FieldAnalysis calls, but can't actually
> > >> > seem to find the point where the actual analysis is done. Just
> > >> > getting lost in sets of NamedList > >> >
> > >> > Regards,
> > >> >Alex.
> > >> > Personal: http://www.outerthoughts.com/ and @arafalov Solr
> > resources:
> > >> > http://www.solr-start.com/ and @solrstart Solr popularizers
> > community:
> > >> > https://www.linkedin.com/groups?gid=6713853
> > >> >
> > >> > ---
> > >> > -- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
> > >> > additional commands, e-mail: dev-h...@lucene.apache.org
> > >> >
> > >>
> > >> 

Re: CJKBigramFilter - position bug with outputUnigrams?

2014-04-21 Thread Benson Margulies
Grateful as I am for the plugs (I'm the Basis CTO), I would point out
that both our product and Kuromoji can be configured as Tokenizers
rather than token filters. (Ironically, someone just asked us for a
token filter option). We apply our own implementation of ICU
tokenization to runs of ASCII, being careful to allow some hybrid
tokens. Have you tried Kuromoji as a tokenizer? If you want to try us,
drop me a line.


On Sun, Apr 20, 2014 at 10:40 PM, Walter Underwood
 wrote:
> I have used Basistech linguistics in two products at two companies and they
> make high-quality software. At one point, I met with our Japanese partner,
> in Japan, and was able to make them comfortable with using Basistech instead
> of their own morphological package.
>
> wunder
>
> On Apr 20, 2014, at 7:16 PM, Alexandre Rafalovitch 
> wrote:
>
> Have you looked at commercial offerings? At some point, it becomes an
> ROI issue. If it is becoming such a serious issue:
> http://www.basistech.com/text-analytics/rosette/base-linguistics/asian-languages/
>
> Regards,
>   Alex.
> P.s. This is a link, not a recommendation. I haven't tested either
> their quality or their pricing
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Mon, Apr 21, 2014 at 8:50 AM, Shawn Heisey  wrote:
>
> On 4/20/2014 6:20 PM, Benson Margulies wrote:
>
> Could I perhaps wonder why your customer is so intent on indexing
> ngrams? Why not use Kuromoji and index words?
>
>
> The data is not just Japanese.  There is a mixture.  For text in the
> Latin character set, StandardTokenizer and other similar things do not
> work for us, mostly because of the way that they handle punctuation.
> ICUTokenizer with its default rule set wouldn't work either, but as
> you'll see below, I've got a modified ruleset for Latin.
>
> The following is what I currently have for my analysis.  A lot of this
> has evolved over the last few years on my other index that is primarily
> English:
>
> http://apaste.info/ypy
>
> We may need to have a major overhaul of our analysis chain for this
> customer.  Perhaps what we've learned in the past won't apply here.
>
> Right now we have outputUnigrams enabled for both index and query.  This
> solves the phrase query problem but causes things to match that the
> customer doesn't want to match.
>
> Thanks,
> Shawn
>
>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: CJKBigramFilter - position bug with outputUnigrams?

2014-04-20 Thread Benson Margulies
Could I perhaps wonder why your customer is so intent on indexing
ngrams? Why not use Kuromoji and index words?


On Sun, Apr 20, 2014 at 2:00 PM, Robert Muir  wrote:
> On Sun, Apr 20, 2014 at 1:53 PM, Shawn Heisey  wrote:
>> On 4/20/2014 11:10 AM, Robert Muir wrote:
>>> I think you need to use 2 separate fields here? (one for n=1 and one for 
>>> n=2)
>>>
>>> You just cant really have "correct positions" for n=1 and n=2, its not 
>>> possible.
>>
>> There may be details to this that I do not understand.  I'm fairly
>> clueless about both CJK and writing Lucene code -- Solr does all of that
>> for me.
>>
>> What is "n" in what you wrote above?
>
> This is just the mathematics, its the "n" of the n-gram. You should
> only really ever have a fixed value of this for a field, otherwise the
> positions are confusing.
>
> There is nothing this filter can do to change this mathematical fact.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



FilterDirectoryReader and close

2014-04-08 Thread Benson Margulies
I added a test case to TestDirectoryReaderReopen that tests an
experiment of mine with a FilterDirectoryReader.

It fails in

org.apache.lucene.index.TestDirectoryReaderReopen#performDefaultTests

at

  index2_refreshed.close();
assertReaderClosed(index2, true);

So it seems that the reader returned by my filter's doOpenIfChanged is
not properly coupled to the original. Is this something I need to
code, something missing from FilterDirectoryReader, or something that
isn't supposed to work?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
OK, I'm slow but I'm getting there. A funny wrapping Codec would
require messing with how codecs come into being, and it's too late to
do that without a lot of changes. On the other hand, the insides of
the D-P-F could be used accomplish the same thing out on the filter
reader.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
Interested / easily amused parties are welcomed to observe the
proceedings in https://github.com/apache/lucene-solr/pull/44. It's a
PR _only_ to offer visibility! So far, I've got a 'delegating codec'
that interposes the direct posting idea atop any other codec. Next
comes the filtering.

I'm not sure that I ever concisely reported the situation that got me
started on this: a profile in which _time in the codec_ dominated my
application. So the RAMDirectory was useless, since that removes no
codec CPU time, but the D-P-F did the job, since it does.


On Mon, Apr 7, 2014 at 5:34 PM, Benson Margulies  wrote:
> On Mon, Apr 7, 2014 at 5:32 PM, Alan Woodward  wrote:
>> Does FilterDirectoryReader do what you want?
>> https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/index/FilterDirectoryReader.html
>
> Yes, indeed, precisely what the doctor ordered.
>
>>
>> Alan Woodward
>> www.flax.co.uk
>>
>>
>> On 7 Apr 2014, at 22:19, Benson Margulies wrote:
>>
>> Typically, an app gets a directory reader, which is a composite
>> reader. To get a filter down there into the leaves of the composite
>> reader, does anyone have a suggestion about where to enter the
>> modularity?
>>
>> I sort of want to insert myself at
>> org.apache.lucene.index.StandardDirectoryReader#open(org.apache.lucene.store.Directory,
>> org.apache.lucene.index.IndexCommit) wrapping the segment readers, or
>> I could make a sort of filtering composite reader that wraps each of
>> the segment readers in a filter.
>>
>>
>> On Mon, Apr 7, 2014 at 1:02 PM, Shai Erera  wrote:
>>
>> Given that DPF delegates indexing to another PF anyway (currently Lucene41),
>>
>> I think this might be the case. We would need to test of course. The key
>>
>> point is that this FilterAtomicReader will be able to serve anything as
>>
>> direct, even DV, so it might eliminate DVF too. We need to experiment and
>>
>> benchmark!
>>
>>
>> Shai
>>
>>
>> On Apr 7, 2014 7:32 PM, "david.w.smi...@gmail.com"
>>
>>  wrote:
>>
>>
>> Aaaah, nice idea to simply use FilterAtomicReader — of course!  So this
>>
>> would ultimately be a new IndexReaderFactory that creates
>>
>> FilterAtomicReaders for a subset of the fields you want to do this on.
>>
>> Cool!  With that, I don’t think there would be a need for
>>
>> DirectPostingsFormat as a postings format, would there be?
>>
>>
>> ~ David
>>
>>
>>
>> On Mon, Apr 7, 2014 at 10:58 AM, Shai Erera  wrote:
>>
>>
>> The only problem is how the Codec makes a dynamic decision on whether to
>>
>> use the wrapped Codec for reading vs pre-load data into in-memory
>>
>> structures, because Codecs are loaded through reflection by the SPI loading
>>
>> mechanism.
>>
>>
>> There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just
>>
>> mentioning in case you want to tackle DPF.
>>
>>
>> I think that if we allowed passing something like a CodecLookupService,
>>
>> with an SPILookupService default impl, you could easily pass that to
>>
>> DirectoryReader which will use your runtime logic to load the right PF (e.g.
>>
>> DPF) instead of the one the index was created with.
>>
>>
>> But it sounds like the core problem is that when we load a Codec/PF/DVF
>>
>> for reading, we cannot pass it any arguments, and so we must make an
>>
>> index-time decision about how we're going to read the data later on. If we
>>
>> could somehow support that, I think that will help you to achieve what you
>>
>> want too.
>>
>>
>> E.g. currently it's an all-or-nothing decision, but if we could pass a
>>
>> parameter like "50% available heap", the Codec/PF/DVF could cache the
>>
>> frequently accessed postings instead of loading all of them into memory.
>>
>> But, that can also be achieved at the IndexReader level, through a custom
>>
>> FilterAtomicReader. And if you could reuse DPF's structures (like
>>
>> DirectTermsEnum, DirectFields...), it should be easier to do this. So
>>
>> perhaps we can think about a DirectAtomicReader which does that? I believe
>>
>> it can share some code w/ DPF, as long as we don't make these APIs public,
>>
>> or make them @super.experimental and @super.expert.
>>
>>
>> Just throwing some ideas...
>>
>>
>> Shai
>>
>>
>>
>> On Mon, Apr 7, 

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
On Mon, Apr 7, 2014 at 5:32 PM, Alan Woodward  wrote:
> Does FilterDirectoryReader do what you want?
> https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/index/FilterDirectoryReader.html

Yes, indeed, precisely what the doctor ordered.

>
> Alan Woodward
> www.flax.co.uk
>
>
> On 7 Apr 2014, at 22:19, Benson Margulies wrote:
>
> Typically, an app gets a directory reader, which is a composite
> reader. To get a filter down there into the leaves of the composite
> reader, does anyone have a suggestion about where to enter the
> modularity?
>
> I sort of want to insert myself at
> org.apache.lucene.index.StandardDirectoryReader#open(org.apache.lucene.store.Directory,
> org.apache.lucene.index.IndexCommit) wrapping the segment readers, or
> I could make a sort of filtering composite reader that wraps each of
> the segment readers in a filter.
>
>
> On Mon, Apr 7, 2014 at 1:02 PM, Shai Erera  wrote:
>
> Given that DPF delegates indexing to another PF anyway (currently Lucene41),
>
> I think this might be the case. We would need to test of course. The key
>
> point is that this FilterAtomicReader will be able to serve anything as
>
> direct, even DV, so it might eliminate DVF too. We need to experiment and
>
> benchmark!
>
>
> Shai
>
>
> On Apr 7, 2014 7:32 PM, "david.w.smi...@gmail.com"
>
>  wrote:
>
>
> Aaaah, nice idea to simply use FilterAtomicReader — of course!  So this
>
> would ultimately be a new IndexReaderFactory that creates
>
> FilterAtomicReaders for a subset of the fields you want to do this on.
>
> Cool!  With that, I don’t think there would be a need for
>
> DirectPostingsFormat as a postings format, would there be?
>
>
> ~ David
>
>
>
> On Mon, Apr 7, 2014 at 10:58 AM, Shai Erera  wrote:
>
>
> The only problem is how the Codec makes a dynamic decision on whether to
>
> use the wrapped Codec for reading vs pre-load data into in-memory
>
> structures, because Codecs are loaded through reflection by the SPI loading
>
> mechanism.
>
>
> There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just
>
> mentioning in case you want to tackle DPF.
>
>
> I think that if we allowed passing something like a CodecLookupService,
>
> with an SPILookupService default impl, you could easily pass that to
>
> DirectoryReader which will use your runtime logic to load the right PF (e.g.
>
> DPF) instead of the one the index was created with.
>
>
> But it sounds like the core problem is that when we load a Codec/PF/DVF
>
> for reading, we cannot pass it any arguments, and so we must make an
>
> index-time decision about how we're going to read the data later on. If we
>
> could somehow support that, I think that will help you to achieve what you
>
> want too.
>
>
> E.g. currently it's an all-or-nothing decision, but if we could pass a
>
> parameter like "50% available heap", the Codec/PF/DVF could cache the
>
> frequently accessed postings instead of loading all of them into memory.
>
> But, that can also be achieved at the IndexReader level, through a custom
>
> FilterAtomicReader. And if you could reuse DPF's structures (like
>
> DirectTermsEnum, DirectFields...), it should be easier to do this. So
>
> perhaps we can think about a DirectAtomicReader which does that? I believe
>
> it can share some code w/ DPF, as long as we don't make these APIs public,
>
> or make them @super.experimental and @super.expert.
>
>
> Just throwing some ideas...
>
>
> Shai
>
>
>
> On Mon, Apr 7, 2014 at 5:35 PM, david.w.smi...@gmail.com
>
>  wrote:
>
>
> Benson, I like your idea.
>
>
> I think your idea can be achieved as a codec, one that wraps another
>
> codec that establishes the on-disk format.  By default the wrapped codec can
>
> be Lucene’s default codec.  I think, if implemented, this would be a change
>
> to DPF instead of an additional DPF-variant codec.
>
>
> ~ David
>
>
>
> On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies 
>
> wrote:
>
>
> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir  wrote:
>
> On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies
>
>  wrote:
>
>
>
> My takeaway from the prior conversation was that various people
>
> didn't
>
> entirely believe that I'd seen a dramatic improvement in query perfo
>
> using D-P-F, and so would not smile upon a patch intended to
>
> liberate
>
> D-P-F from codecs. It could be that the effect I saw has to do with
>
> the fact that our system depends on hitting and scoring 50% of the
>
> documents in an index wit

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
Typically, an app gets a directory reader, which is a composite
reader. To get a filter down there into the leaves of the composite
reader, does anyone have a suggestion about where to enter the
modularity?

I sort of want to insert myself at
org.apache.lucene.index.StandardDirectoryReader#open(org.apache.lucene.store.Directory,
org.apache.lucene.index.IndexCommit) wrapping the segment readers, or
I could make a sort of filtering composite reader that wraps each of
the segment readers in a filter.


On Mon, Apr 7, 2014 at 1:02 PM, Shai Erera  wrote:
> Given that DPF delegates indexing to another PF anyway (currently Lucene41),
> I think this might be the case. We would need to test of course. The key
> point is that this FilterAtomicReader will be able to serve anything as
> direct, even DV, so it might eliminate DVF too. We need to experiment and
> benchmark!
>
> Shai
>
> On Apr 7, 2014 7:32 PM, "david.w.smi...@gmail.com"
>  wrote:
>>
>> Aaaah, nice idea to simply use FilterAtomicReader — of course!  So this
>> would ultimately be a new IndexReaderFactory that creates
>> FilterAtomicReaders for a subset of the fields you want to do this on.
>> Cool!  With that, I don’t think there would be a need for
>> DirectPostingsFormat as a postings format, would there be?
>>
>> ~ David
>>
>>
>> On Mon, Apr 7, 2014 at 10:58 AM, Shai Erera  wrote:
>>>
>>> The only problem is how the Codec makes a dynamic decision on whether to
>>> use the wrapped Codec for reading vs pre-load data into in-memory
>>> structures, because Codecs are loaded through reflection by the SPI loading
>>> mechanism.
>>>
>>> There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just
>>> mentioning in case you want to tackle DPF.
>>>
>>> I think that if we allowed passing something like a CodecLookupService,
>>> with an SPILookupService default impl, you could easily pass that to
>>> DirectoryReader which will use your runtime logic to load the right PF (e.g.
>>> DPF) instead of the one the index was created with.
>>>
>>> But it sounds like the core problem is that when we load a Codec/PF/DVF
>>> for reading, we cannot pass it any arguments, and so we must make an
>>> index-time decision about how we're going to read the data later on. If we
>>> could somehow support that, I think that will help you to achieve what you
>>> want too.
>>>
>>> E.g. currently it's an all-or-nothing decision, but if we could pass a
>>> parameter like "50% available heap", the Codec/PF/DVF could cache the
>>> frequently accessed postings instead of loading all of them into memory.
>>> But, that can also be achieved at the IndexReader level, through a custom
>>> FilterAtomicReader. And if you could reuse DPF's structures (like
>>> DirectTermsEnum, DirectFields...), it should be easier to do this. So
>>> perhaps we can think about a DirectAtomicReader which does that? I believe
>>> it can share some code w/ DPF, as long as we don't make these APIs public,
>>> or make them @super.experimental and @super.expert.
>>>
>>> Just throwing some ideas...
>>>
>>> Shai
>>>
>>>
>>> On Mon, Apr 7, 2014 at 5:35 PM, david.w.smi...@gmail.com
>>>  wrote:
>>>>
>>>> Benson, I like your idea.
>>>>
>>>> I think your idea can be achieved as a codec, one that wraps another
>>>> codec that establishes the on-disk format.  By default the wrapped codec 
>>>> can
>>>> be Lucene’s default codec.  I think, if implemented, this would be a change
>>>> to DPF instead of an additional DPF-variant codec.
>>>>
>>>> ~ David
>>>>
>>>>
>>>> On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies 
>>>> wrote:
>>>>>
>>>>> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir  wrote:
>>>>> > On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies
>>>>> >  wrote:
>>>>> >
>>>>> >>
>>>>> >> My takeaway from the prior conversation was that various people
>>>>> >> didn't
>>>>> >> entirely believe that I'd seen a dramatic improvement in query perfo
>>>>> >> using D-P-F, and so would not smile upon a patch intended to
>>>>> >> liberate
>>>>> >> D-P-F from codecs. It could be that the effect I saw has to do with
>>>>> >> the fact that our system d

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
Eventually, I'll care about how to set this up in Solr. For now I
think I'll see if I can figure out the luceneutils benchmark.



On Mon, Apr 7, 2014 at 1:02 PM, Shai Erera  wrote:
> Given that DPF delegates indexing to another PF anyway (currently Lucene41),
> I think this might be the case. We would need to test of course. The key
> point is that this FilterAtomicReader will be able to serve anything as
> direct, even DV, so it might eliminate DVF too. We need to experiment and
> benchmark!
>
> Shai
>
> On Apr 7, 2014 7:32 PM, "david.w.smi...@gmail.com"
>  wrote:
>>
>> Aaaah, nice idea to simply use FilterAtomicReader — of course!  So this
>> would ultimately be a new IndexReaderFactory that creates
>> FilterAtomicReaders for a subset of the fields you want to do this on.
>> Cool!  With that, I don’t think there would be a need for
>> DirectPostingsFormat as a postings format, would there be?
>>
>> ~ David
>>
>>
>> On Mon, Apr 7, 2014 at 10:58 AM, Shai Erera  wrote:
>>>
>>> The only problem is how the Codec makes a dynamic decision on whether to
>>> use the wrapped Codec for reading vs pre-load data into in-memory
>>> structures, because Codecs are loaded through reflection by the SPI loading
>>> mechanism.
>>>
>>> There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just
>>> mentioning in case you want to tackle DPF.
>>>
>>> I think that if we allowed passing something like a CodecLookupService,
>>> with an SPILookupService default impl, you could easily pass that to
>>> DirectoryReader which will use your runtime logic to load the right PF (e.g.
>>> DPF) instead of the one the index was created with.
>>>
>>> But it sounds like the core problem is that when we load a Codec/PF/DVF
>>> for reading, we cannot pass it any arguments, and so we must make an
>>> index-time decision about how we're going to read the data later on. If we
>>> could somehow support that, I think that will help you to achieve what you
>>> want too.
>>>
>>> E.g. currently it's an all-or-nothing decision, but if we could pass a
>>> parameter like "50% available heap", the Codec/PF/DVF could cache the
>>> frequently accessed postings instead of loading all of them into memory.
>>> But, that can also be achieved at the IndexReader level, through a custom
>>> FilterAtomicReader. And if you could reuse DPF's structures (like
>>> DirectTermsEnum, DirectFields...), it should be easier to do this. So
>>> perhaps we can think about a DirectAtomicReader which does that? I believe
>>> it can share some code w/ DPF, as long as we don't make these APIs public,
>>> or make them @super.experimental and @super.expert.
>>>
>>> Just throwing some ideas...
>>>
>>> Shai
>>>
>>>
>>> On Mon, Apr 7, 2014 at 5:35 PM, david.w.smi...@gmail.com
>>>  wrote:
>>>>
>>>> Benson, I like your idea.
>>>>
>>>> I think your idea can be achieved as a codec, one that wraps another
>>>> codec that establishes the on-disk format.  By default the wrapped codec 
>>>> can
>>>> be Lucene’s default codec.  I think, if implemented, this would be a change
>>>> to DPF instead of an additional DPF-variant codec.
>>>>
>>>> ~ David
>>>>
>>>>
>>>> On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies 
>>>> wrote:
>>>>>
>>>>> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir  wrote:
>>>>> > On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies
>>>>> >  wrote:
>>>>> >
>>>>> >>
>>>>> >> My takeaway from the prior conversation was that various people
>>>>> >> didn't
>>>>> >> entirely believe that I'd seen a dramatic improvement in query perfo
>>>>> >> using D-P-F, and so would not smile upon a patch intended to
>>>>> >> liberate
>>>>> >> D-P-F from codecs. It could be that the effect I saw has to do with
>>>>> >> the fact that our system depends on hitting and scoring 50% of the
>>>>> >> documents in an index with a lot of documents.
>>>>> >>
>>>>> >
>>>>> > I dont understand the word "liberate" here. why i

Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
On Mon, Apr 7, 2014 at 9:30 AM, Robert Muir  wrote:
> On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies  
> wrote:
>> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir  wrote:
>>> On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies  
>>> wrote:
>>>
>>>>
>>>> My takeaway from the prior conversation was that various people didn't
>>>> entirely believe that I'd seen a dramatic improvement in query perfo
>>>> using D-P-F, and so would not smile upon a patch intended to liberate
>>>> D-P-F from codecs. It could be that the effect I saw has to do with
>>>> the fact that our system depends on hitting and scoring 50% of the
>>>> documents in an index with a lot of documents.
>>>>
>>>
>>> I dont understand the word "liberate" here. why is it such a problem
>>> that this is a codec?
>>
>>  I don't want to have to declare my intentions at the time I create
>> the index. I don't want to have to use D-P-F for all readers all the
>> time. Because I want to be able to decide to open up an index with an
>> arbitrary on-disk format and get the in-memory cache behavior of
>> D-P-F. Thus 'liberate' -- split the question of 'keep a copy in
>> memory' from the choice of the on-disk format.
>>
>>
>>>
>>> i do not think we should give it any more status than that, it wastes
>>> too much ram.
>>
>> It didn't seem like 'waste' when it solved a big practical for us. We
>> had an application that was too slow, and had plenty of RAM available,
>> and we were able to trade space for time by applying D-P-F.
>>
>> Maybe I'm going about this backwards; if I can come up with a small,
>> inconspicuous proposed change that does what I want, there won't be
>> any disagreement.
>>
>>
>
> On the previous thread, i already mentioned that in such cases the
> Directory API should be used.

Fair enough. Could I ask you again to elaborate on that thought?

>
> Sorry, this DirectPostingsFormat is a huge trap. I don't think we need
> _yet another_ way to waste ram, when someone can already do this via
> directory (making the decision at any time).
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir  wrote:
> On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies  
> wrote:
>
>>
>> My takeaway from the prior conversation was that various people didn't
>> entirely believe that I'd seen a dramatic improvement in query perfo
>> using D-P-F, and so would not smile upon a patch intended to liberate
>> D-P-F from codecs. It could be that the effect I saw has to do with
>> the fact that our system depends on hitting and scoring 50% of the
>> documents in an index with a lot of documents.
>>
>
> I dont understand the word "liberate" here. why is it such a problem
> that this is a codec?

 I don't want to have to declare my intentions at the time I create
the index. I don't want to have to use D-P-F for all readers all the
time. Because I want to be able to decide to open up an index with an
arbitrary on-disk format and get the in-memory cache behavior of
D-P-F. Thus 'liberate' -- split the question of 'keep a copy in
memory' from the choice of the on-disk format.


>
> i do not think we should give it any more status than that, it wastes
> too much ram.

It didn't seem like 'waste' when it solved a big practical for us. We
had an application that was too slow, and had plenty of RAM available,
and we were able to trade space for time by applying D-P-F.

Maybe I'm going about this backwards; if I can come up with a small,
inconspicuous proposed change that does what I want, there won't be
any disagreement.


>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Anticipating a benchmark for direct posting format

2014-04-07 Thread Benson Margulies
If you look at people.apache.org:~bimargulies/dpf-bench.log
(http://people.apache.org/bimargulies/dpf-bench.log should also work),
you'll see the results of a luceneutil run that compares DPF to
'normal' on the 10M wikipedia case. Some things are better, some are
worse, some are the same.

The claim here was never that DPF was some sort of universal solvent;
it was that for certain applications it made a material speedup, and
so it was worth some API complexity to liberate it from the codecs.
I'm going to assert here that these results support the claim well
enough to justify taking a run at the API, and then we'll see if I can
come up with something that people find tolerable in proportion to the
benefit.


On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies  wrote:
> On Thu, Apr 3, 2014 at 11:37 AM, Michael McCandless
>  wrote:
>> Is the benchmark just trying to measure speedups by using DirectPF vs
>> the default PF?  You could do this today w/ luceneutil (using
>> Wikipedia as content).
>>
>> But if you have another content source / index, I'm happy to run the
>> benchmark.  It'd be easier to make the content available (CSV, or line
>> docs file format), then ship around big indices ...
>>
>> I have a box with 48 GB RAM.
>>
>> Mike McCandless
>
> My takeaway from the prior conversation was that various people didn't
> entirely believe that I'd seen a dramatic improvement in query perfo
> using D-P-F, and so would not smile upon a patch intended to liberate
> D-P-F from codecs. It could be that the effect I saw has to do with
> the fact that our system depends on hitting and scoring 50% of the
> documents in an index with a lot of documents.
>
> If you can help me try to simulate this situation with luceneutil, I'd
> be happy to skip the work I was about to do to build another
> benchmark.
>
>
>
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Apr 3, 2014 at 8:38 AM, Benson Margulies  
>> wrote:
>>> Some of you may recall that I started a thread some time ago about
>>> wishing for the benefits of the direct posting format without needing
>>> to use a codec. The thread landed as a challenge: show a benchmark of
>>> the benefit of D-P-F.
>>>
>>> After a lot of distraction, I'm now in a position to build it. The
>>> core is a rather large index, and to show the effect (always assuming
>>> that I succeed) will take a machine with a large amount of RAM.
>>>
>>> One approach is for me to simply build the index involved and make it
>>> available as an index. Another would be to side-step into a giant pile
>>> of  CSV or JSON and provide a do-it-yourself kit.
>>>
>>> Anyone have a preference?
>>>
>>> What have we got for hardware with, 40G of RAM? Anything, or will this
>>> be up to individuals to try out on dayjob hardware?
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: I may have run into something interesting with luceneutil

2014-04-07 Thread Benson Margulies
Well, it never produced any benchmark results; it is not as if it ran
the benchmark and then got itself stuck. When I run the same thing
with the 10M wikipedia, it does not get stuck.

Does anyone else have a jumbo computer to try this on? I could try
adding print statements.

On Mon, Apr 7, 2014 at 6:27 AM, Dawid Weiss
 wrote:
> Looks like an orphaned ThreadPoolExecutor thread preventing JVM exit.
> Hard to tell where it came from based on just the name (generic
> factory).
>
> D.
>
> On Mon, Apr 7, 2014 at 12:21 PM, Benson Margulies  
> wrote:
>> ➜  util git:(trunk) ✗ jstack 75623
>> 2014-04-06 20:42:34
>> Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode):
>>
>> "Attach Listener" daemon prio=10 tid=0x7f1760001000 nid=0x135cd
>> waiting on condition [0x]
>>java.lang.Thread.State: RUNNABLE
>>
>> "DestroyJavaVM" prio=10 tid=0x7f2cb8009800 nid=0x12768 waiting on
>> condition [0x]
>>java.lang.Thread.State: RUNNABLE
>>
>> "pool-1-thread-1" prio=10 tid=0x7f2cb81c1800 nid=0x12788 waiting
>> on condition [0x7f175f6cc000]
>>java.lang.Thread.State: WAITING (parking)
>> at sun.misc.Unsafe.park(Native Method)
>> - parking to wait for  <0x7f18ee497038> (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1079)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:722)
>>
>> "Service Thread" daemon prio=10 tid=0x7f2cb8119000 nid=0x12786
>> runnable [0x]
>>java.lang.Thread.State: RUNNABLE
>>
>> "C2 CompilerThread1" daemon prio=10 tid=0x7f2cb8117000 nid=0x12785
>> waiting on condition [0x]
>>java.lang.Thread.State: RUNNABLE
>>
>> "C2 CompilerThread0" daemon prio=10 tid=0x7f2cb8114000 nid=0x12784
>> waiting on condition [0x]
>>java.lang.Thread.State: RUNNABLE
>>
>> "Signal Dispatcher" daemon prio=10 tid=0x7f2cb8112000 nid=0x12783
>> runnable [0x]
>>java.lang.Thread.State: RUNNABLE
>>
>> "Finalizer" daemon prio=10 tid=0x7f2cb80c4800 nid=0x12782 in
>> Object.wait() [0x7f17746a]
>>java.lang.Thread.State: WAITING (on object monitor)
>> at java.lang.Object.wait(Native Method)
>> - waiting on <0x7f18b4bd0b50> (a java.lang.ref.ReferenceQueue$Lock)
>> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
>> - locked <0x7f18b4bd0b50> (a java.lang.ref.ReferenceQueue$Lock)
>> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
>> at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
>>
>> "Reference Handler" daemon prio=10 tid=0x7f2cb80c2000 nid=0x12781
>> in Object.wait() [0x7f17747a1000]
>>java.lang.Thread.State: WAITING (on object monitor)
>> at java.lang.Object.wait(Native Method)
>> - waiting on <0x7f18b4bd7738> (a java.lang.ref.Reference$Lock)
>> at java.lang.Object.wait(Object.java:503)
>> at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
>> - locked <0x7f18b4bd7738> (a java.lang.ref.Reference$Lock)
>>
>> "VM Thread" prio=10 tid=0x7f2cb80ba800 nid=0x12780 runnable
>>
>> "GC task thread#0 (ParallelGC)" prio=10 tid=0x7f2cb8017800
>> nid=0x12769 runnable
>>
>> "GC task thread#1 (ParallelGC)" prio=10 tid=0x7f2cb8019000
>> nid=0x1276a runnable
>>
>> "GC task thread#2 (ParallelGC)" prio=10 tid=0x7f2cb801b000
>> nid=0x1276b runnable
>>
>> "GC task thread#3 (ParallelGC)" prio=10 tid=0x7f2cb801d000
>> nid=0x1276c runnable
>>
>> "GC task thread#4 (ParallelGC)" prio=10 tid=0x7f2cb801e800
>> nid=0x1276d runnable
>>
>> "GC task thread

Re: I may have run into something interesting with luceneutil

2014-04-07 Thread Benson Margulies
 thread#19 (ParallelGC)" prio=10 tid=0x7f2cb803a800
nid=0x1277c runnable

"GC task thread#20 (ParallelGC)" prio=10 tid=0x7f2cb803c000
nid=0x1277d runnable

"GC task thread#21 (ParallelGC)" prio=10 tid=0x7f2cb803e000
nid=0x1277e runnable

"GC task thread#22 (ParallelGC)" prio=10 tid=0x7f2cb804
nid=0x1277f runnable

"VM Periodic Task Thread" prio=10 tid=0x7f2cb8124000 nid=0x12787
waiting on condition

JNI global references: 130

On Mon, Apr 7, 2014 at 2:50 AM, Uwe Schindler  wrote:
> Hi Benson,
>
> there must be another thread that sits on this lock:
> - parking to wait for  <0x7f18ee497038>
> But the stack trace you have shown has nothing to do with Lucene! This looks 
> like one of the normal threads always waiting for some external trigger (they 
> are used by the garbage collector). Could this be that one: 
> https://issues.apache.org/jira/browse/LUCENE-5573
>
> So it would be better to get the *full* stack trace of all threads.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Benson Margulies [mailto:bimargul...@gmail.com]
>> Sent: Monday, April 07, 2014 2:41 AM
>> To: dev@lucene.apache.org
>> Subject: I may have run into something interesting with luceneutil
>>
>> Or I may not.
>>
>> https://code.google.com/a/apache-
>> extras.org/p/luceneutil/wiki/AddToBuildTree?ts=1396830970&updated=Add
>> ToBuildTree
>>
>> I'm trying to learn something about direct posting format using luceneutil.
>>
>> The above-linked page is what I'm trying on a 160G multicore machine.
>>
>> Using trunk, the SearchPerfTest process seems to be stuck.
>>
>> top shows a memory size of 60g -- not even the full 80 I gave it.
>>
>> No CPU is being consumed.
>>
>> No significant I/O from iostat.
>>
>> strace shows no activity.
>>
>> jstack is completely boring except for the one thread shown below.
>>
>> Anyone got any ideas?
>>
>>
>> "pool-1-thread-1" prio=10 tid=0x7f2cb81c1800 nid=0x12788 waiting on
>> condition [0x7f175f6cc000]
>>java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native
>> Method)
>> - parking to wait for  <0x7f18ee497038> (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw
>> ait(AbstractQueuedSynchronizer.java:2043)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.tak
>> e(ScheduledThreadPoolExecutor.java:1079)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.tak
>> e(ScheduledThreadPoolExecutor.java:807)
>> at
>> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1
>> 068)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav
>> a:1130)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
>> va:615)
>> at java.lang.Thread.run(Thread.java:722)
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>> commands, e-mail: dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



I may have run into something interesting with luceneutil

2014-04-06 Thread Benson Margulies
Or I may not.

https://code.google.com/a/apache-extras.org/p/luceneutil/wiki/AddToBuildTree?ts=1396830970&updated=AddToBuildTree

I'm trying to learn something about direct posting format using luceneutil.

The above-linked page is what I'm trying on a 160G multicore machine.

Using trunk, the SearchPerfTest process seems to be stuck.

top shows a memory size of 60g -- not even the full 80 I gave it.

No CPU is being consumed.

No significant I/O from iostat.

strace shows no activity.

jstack is completely boring except for the one thread shown below.

Anyone got any ideas?


"pool-1-thread-1" prio=10 tid=0x7f2cb81c1800 nid=0x12788 waiting
on condition [0x7f175f6cc000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x7f18ee497038> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1079)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Anticipating a benchmark for direct posting format

2014-04-03 Thread Benson Margulies
On Thu, Apr 3, 2014 at 11:37 AM, Michael McCandless
 wrote:
> Is the benchmark just trying to measure speedups by using DirectPF vs
> the default PF?  You could do this today w/ luceneutil (using
> Wikipedia as content).
>
> But if you have another content source / index, I'm happy to run the
> benchmark.  It'd be easier to make the content available (CSV, or line
> docs file format), then ship around big indices ...
>
> I have a box with 48 GB RAM.
>
> Mike McCandless

My takeaway from the prior conversation was that various people didn't
entirely believe that I'd seen a dramatic improvement in query perfo
using D-P-F, and so would not smile upon a patch intended to liberate
D-P-F from codecs. It could be that the effect I saw has to do with
the fact that our system depends on hitting and scoring 50% of the
documents in an index with a lot of documents.

If you can help me try to simulate this situation with luceneutil, I'd
be happy to skip the work I was about to do to build another
benchmark.



>
> http://blog.mikemccandless.com
>
>
> On Thu, Apr 3, 2014 at 8:38 AM, Benson Margulies  
> wrote:
>> Some of you may recall that I started a thread some time ago about
>> wishing for the benefits of the direct posting format without needing
>> to use a codec. The thread landed as a challenge: show a benchmark of
>> the benefit of D-P-F.
>>
>> After a lot of distraction, I'm now in a position to build it. The
>> core is a rather large index, and to show the effect (always assuming
>> that I succeed) will take a machine with a large amount of RAM.
>>
>> One approach is for me to simply build the index involved and make it
>> available as an index. Another would be to side-step into a giant pile
>> of  CSV or JSON and provide a do-it-yourself kit.
>>
>> Anyone have a preference?
>>
>> What have we got for hardware with, 40G of RAM? Anything, or will this
>> be up to individuals to try out on dayjob hardware?
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Anticipating a benchmark for direct posting format

2014-04-03 Thread Benson Margulies
Some of you may recall that I started a thread some time ago about
wishing for the benefits of the direct posting format without needing
to use a codec. The thread landed as a challenge: show a benchmark of
the benefit of D-P-F.

After a lot of distraction, I'm now in a position to build it. The
core is a rather large index, and to show the effect (always assuming
that I succeed) will take a machine with a large amount of RAM.

One approach is for me to simply build the index involved and make it
available as an index. Another would be to side-step into a giant pile
of  CSV or JSON and provide a do-it-yourself kit.

Anyone have a preference?

What have we got for hardware with, 40G of RAM? Anything, or will this
be up to individuals to try out on dayjob hardware?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5228) Deprecate and tags in schema.xml

2014-03-24 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944971#comment-13944971
 ] 

Benson Margulies commented on SOLR-5228:


DTD's are useless. We need to pick one of W3C XML Schema or RNG. RNG is a lot 
easier to work with. Schematron is another possibility, but I have no 
experience. See 
http://docs.oracle.com/javase/7/docs/api/javax/xml/validation/package-summary.html.

Choices are:

* validation is easy to disable; people who customize disable it
* customizers take the entire schema, add to it, and provide their added one. 
Not so good for multiples.
* customizers are constrained to use _namespaces_ -- you customize, you add an 
XML namespace, and you provide a schema for your namespace. 

Of course the first time we try this we'll find problems in the test schemas.

Has anyone done anything in this area that I could start from if I was inclined 
to try to work on this?


> Deprecate  and  tags in schema.xml
> -
>
> Key: SOLR-5228
> URL: https://issues.apache.org/jira/browse/SOLR-5228
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Hoss Man
>Assignee: Erick Erickson
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5228.patch, SOLR-5228.patch
>
>
> On the solr-user mailing list, Nutan recently mentioned spending days trying 
> to track down a problem that turned out to be because he had attempted to add 
> a {{}} that was outside of the {{}} block in his 
> schema.xml -- Solr was just silently ignoring it.
> We have made improvements in other areas of config validation by generating 
> statup errors when tags/attributes are found that are not expected -- but in 
> this case i think we should just stop expecting/requiring that the 
> {{}} and {{}} tags will be used to group these sorts of 
> things.  I think schema.xml parsing should just start ignoring them and only 
> care about finding the {{}}, {{}}, and {{}} 
> tags wherever they may be.
> If people want to keep using them, fine.  If people want to mix fieldTypes 
> and fields side by side (perhaps specify a fieldType, then list all the 
> fields using it) fine.  I don't see any value in forcing people to use them, 
> but we definitely shouldn't leave things the way they are with otherwise 
> perfectly valid field/type declarations being silently ignored.
> ---
> I'll take this on unless i see any objections.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5228) Deprecate and tags in schema.xml

2014-03-24 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944920#comment-13944920
 ] 

Benson Margulies commented on SOLR-5228:


Allow the person extending the schema to provide a, well, extended schema.



> Deprecate  and  tags in schema.xml
> -
>
> Key: SOLR-5228
> URL: https://issues.apache.org/jira/browse/SOLR-5228
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Hoss Man
>Assignee: Erick Erickson
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5228.patch, SOLR-5228.patch
>
>
> On the solr-user mailing list, Nutan recently mentioned spending days trying 
> to track down a problem that turned out to be because he had attempted to add 
> a {{}} that was outside of the {{}} block in his 
> schema.xml -- Solr was just silently ignoring it.
> We have made improvements in other areas of config validation by generating 
> statup errors when tags/attributes are found that are not expected -- but in 
> this case i think we should just stop expecting/requiring that the 
> {{}} and {{}} tags will be used to group these sorts of 
> things.  I think schema.xml parsing should just start ignoring them and only 
> care about finding the {{}}, {{}}, and {{}} 
> tags wherever they may be.
> If people want to keep using them, fine.  If people want to mix fieldTypes 
> and fields side by side (perhaps specify a fieldType, then list all the 
> fields using it) fine.  I don't see any value in forcing people to use them, 
> but we definitely shouldn't leave things the way they are with otherwise 
> perfectly valid field/type declarations being silently ignored.
> ---
> I'll take this on unless i see any objections.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5228) Deprecate and tags in schema.xml

2014-03-23 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944636#comment-13944636
 ] 

Benson Margulies commented on SOLR-5228:


I apologize for showing up so late with an opinion. I can't get over the 
feeling that this might be solving the wrong problem.

In XML, the structure of 

{code}
  


...

{code}

is ancient and honorable. Yea, some schemas dispense with the container for the 
group, but plenty do not. The source of this was someone who misplaced an item 
and didn't get a diagnosis. _Why don't we concentrate on diagnosis?_ Why not 
create a schema and, by default, check it? It's not like we're in a giant hurry 
at start-up compared to the extra time of enabling a validating parse.

Grouping these guys together is harmless at worst and slight helpful at best.

If we are going to change the schema, I would beg that anyone changing it put 
forth an actual, well, _schema_ that is an accurate representation of what is 
allowed.

So I'm belatedly -1 on this change, for why tiny little bit its worth.



> Deprecate  and  tags in schema.xml
> -
>
> Key: SOLR-5228
> URL: https://issues.apache.org/jira/browse/SOLR-5228
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Hoss Man
>Assignee: Erick Erickson
> Attachments: SOLR-5228.patch, SOLR-5228.patch
>
>
> On the solr-user mailing list, Nutan recently mentioned spending days trying 
> to track down a problem that turned out to be because he had attempted to add 
> a {{}} that was outside of the {{}} block in his 
> schema.xml -- Solr was just silently ignoring it.
> We have made improvements in other areas of config validation by generating 
> statup errors when tags/attributes are found that are not expected -- but in 
> this case i think we should just stop expecting/requiring that the 
> {{}} and {{}} tags will be used to group these sorts of 
> things.  I think schema.xml parsing should just start ignoring them and only 
> care about finding the {{}}, {{}}, and {{}} 
> tags wherever they may be.
> If people want to keep using them, fine.  If people want to mix fieldTypes 
> and fields side by side (perhaps specify a fieldType, then list all the 
> fields using it) fine.  I don't see any value in forcing people to use them, 
> but we definitely shouldn't leave things the way they are with otherwise 
> perfectly valid field/type declarations being silently ignored.
> ---
> I'll take this on unless i see any objections.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Threads in the LuceneTestCase system

2014-03-21 Thread Benson Margulies
Yes, but in a few hours. THis test currently runs with Lucene 4.1, and
I'll need to move some furniture to use it with a trunk-y environment.

On Fri, Mar 21, 2014 at 2:54 PM, Robert Muir  wrote:
> Can you try http://svn.apache.org/r1580020 and tell me if it is better
> for catching/reproducing thread safety issues?
>
> On Fri, Mar 21, 2014 at 2:48 PM, Benson Margulies  
> wrote:
>> Yea, right now I have this failure that repros every time on a big
>> computer and never on my not-so-small MacBook Pro.
>>
>>
>> On Fri, Mar 21, 2014 at 2:43 PM, Robert Muir  wrote:
>>> I just reviewed the code thinking of how to make it easier to
>>> reproduce issues, we should give this test class a startingGun.
>>>
>>> On Fri, Mar 21, 2014 at 2:38 PM, Benson Margulies  
>>> wrote:
>>>> I could share it right here, but in any case I just found _another_
>>>> stupid mistake where I was doing something in which multiple analyzers
>>>> would end up sharing something unsharable.
>>>>
>>>>
>>>> build 21-Mar-2014 10:36:59
>>>> testRandomStressWithBasisTokenizer(com.basistech.rosette.lucene.BaseLinguisticsTokenFilterTest)
>>>>  Time elapsed: 9.249 sec  <<< FAILURE!
>>>> build 21-Mar-2014 10:36:59 org.junit.ComparisonFailure: term 5
>>>> expected:<[kntii?j?rjesstelm?]> but was:<[ssiikojenen]>
>>>> build 21-Mar-2014 10:36:59 at
>>>> __randomizedtesting.SeedInfo.seed([A574C0CEE3A9C8A3:47C87E8DE3A08C35]:0)
>>>> build 21-Mar-2014 10:36:59 at 
>>>> org.junit.Assert.assertEquals(Assert.java:115)
>>>> build 21-Mar-2014 10:36:59 at
>>>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:169)
>>>> build 21-Mar-2014 10:36:59 at
>>>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:747)
>>>> build 21-Mar-2014 10:36:59 at
>>>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
>>>> build 21-Mar-2014 10:36:59 at
>>>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:447)
>>>> build 21-Mar-2014 10:36:59 at
>>>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:375)
>>>>
>>>>
>>>> On Fri, Mar 21, 2014 at 2:33 PM, Dawid Weiss
>>>>  wrote:
>>>>>> I just fixed a thread-safety bug, but I just saw another failure, and
>>>>>> I'm pulling my hair because it refuses to repro.
>>>>>
>>>>> You can run with a single JVM by passing -Dtests.jvms=1 (or so I
>>>>> believe; try ant test-help). This shouldn't affect multi-threaded
>>>>> tests but if you have a problem with dependency between suites (test
>>>>> classes), such as some values from preinitialized static fields or the
>>>>> like then it may be the cause.
>>>>>
>>>>> Otherwise the  test framework makes it really simple: if a thread was
>>>>> created within a test class then it should die before the test class
>>>>> completes. What is the exception (stack?) you're getting? Can you
>>>>> share it privately?
>>>>>
>>>>> Dawid
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>
>>>>
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Threads in the LuceneTestCase system

2014-03-21 Thread Benson Margulies
Yea, right now I have this failure that repros every time on a big
computer and never on my not-so-small MacBook Pro.


On Fri, Mar 21, 2014 at 2:43 PM, Robert Muir  wrote:
> I just reviewed the code thinking of how to make it easier to
> reproduce issues, we should give this test class a startingGun.
>
> On Fri, Mar 21, 2014 at 2:38 PM, Benson Margulies  
> wrote:
>> I could share it right here, but in any case I just found _another_
>> stupid mistake where I was doing something in which multiple analyzers
>> would end up sharing something unsharable.
>>
>>
>> build 21-Mar-2014 10:36:59
>> testRandomStressWithBasisTokenizer(com.basistech.rosette.lucene.BaseLinguisticsTokenFilterTest)
>>  Time elapsed: 9.249 sec  <<< FAILURE!
>> build 21-Mar-2014 10:36:59 org.junit.ComparisonFailure: term 5
>> expected:<[kntii?j?rjesstelm?]> but was:<[ssiikojenen]>
>> build 21-Mar-2014 10:36:59 at
>> __randomizedtesting.SeedInfo.seed([A574C0CEE3A9C8A3:47C87E8DE3A08C35]:0)
>> build 21-Mar-2014 10:36:59 at org.junit.Assert.assertEquals(Assert.java:115)
>> build 21-Mar-2014 10:36:59 at
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:169)
>> build 21-Mar-2014 10:36:59 at
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:747)
>> build 21-Mar-2014 10:36:59 at
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
>> build 21-Mar-2014 10:36:59 at
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:447)
>> build 21-Mar-2014 10:36:59 at
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:375)
>>
>>
>> On Fri, Mar 21, 2014 at 2:33 PM, Dawid Weiss
>>  wrote:
>>>> I just fixed a thread-safety bug, but I just saw another failure, and
>>>> I'm pulling my hair because it refuses to repro.
>>>
>>> You can run with a single JVM by passing -Dtests.jvms=1 (or so I
>>> believe; try ant test-help). This shouldn't affect multi-threaded
>>> tests but if you have a problem with dependency between suites (test
>>> classes), such as some values from preinitialized static fields or the
>>> like then it may be the cause.
>>>
>>> Otherwise the  test framework makes it really simple: if a thread was
>>> created within a test class then it should die before the test class
>>> completes. What is the exception (stack?) you're getting? Can you
>>> share it privately?
>>>
>>> Dawid
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Threads in the LuceneTestCase system

2014-03-21 Thread Benson Margulies
I could share it right here, but in any case I just found _another_
stupid mistake where I was doing something in which multiple analyzers
would end up sharing something unsharable.


build 21-Mar-2014 10:36:59
testRandomStressWithBasisTokenizer(com.basistech.rosette.lucene.BaseLinguisticsTokenFilterTest)
 Time elapsed: 9.249 sec  <<< FAILURE!
build 21-Mar-2014 10:36:59 org.junit.ComparisonFailure: term 5
expected:<[kntii?j?rjesstelm?]> but was:<[ssiikojenen]>
build 21-Mar-2014 10:36:59 at
__randomizedtesting.SeedInfo.seed([A574C0CEE3A9C8A3:47C87E8DE3A08C35]:0)
build 21-Mar-2014 10:36:59 at org.junit.Assert.assertEquals(Assert.java:115)
build 21-Mar-2014 10:36:59 at
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:169)
build 21-Mar-2014 10:36:59 at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:747)
build 21-Mar-2014 10:36:59 at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
build 21-Mar-2014 10:36:59 at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:447)
build 21-Mar-2014 10:36:59 at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:375)


On Fri, Mar 21, 2014 at 2:33 PM, Dawid Weiss
 wrote:
>> I just fixed a thread-safety bug, but I just saw another failure, and
>> I'm pulling my hair because it refuses to repro.
>
> You can run with a single JVM by passing -Dtests.jvms=1 (or so I
> believe; try ant test-help). This shouldn't affect multi-threaded
> tests but if you have a problem with dependency between suites (test
> classes), such as some values from preinitialized static fields or the
> like then it may be the cause.
>
> Otherwise the  test framework makes it really simple: if a thread was
> created within a test class then it should die before the test class
> completes. What is the exception (stack?) you're getting? Can you
> share it privately?
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Threads in the LuceneTestCase system

2014-03-21 Thread Benson Margulies
To be clear, it's my own code that is amiss here, nothing in Lucene itself.

I just fixed a thread-safety bug, but I just saw another failure, and
I'm pulling my hair because it refuses to repro.


On Fri, Mar 21, 2014 at 2:03 PM, Robert Muir  wrote:
> Do you have a stacktrace of where in BaseTokenStreamTestCase that it hit?
>
> We refactored the asserting here, to always test with a single thread
> *first*, then with multiple threads.
>
> So if you are failing in the multithreaded part, it means you have a
> thread safety issue...
>
> (as long as this part of the base test class is working, and i hope it
> still is, i havent seen anything crazy to indicate otherwise, and i've
> been in TestRandomChains for a few hours this week)
>
> On Fri, Mar 21, 2014 at 1:57 PM, Benson Margulies  
> wrote:
>> I'm fighting with a test that uses the random analysis chain testing.
>> It does not repro when I pass in the usual collection of -D's. I think
>> that the reason is to do with threads; the failure is always on a big
>> multicore build machine.
>>
>> Are there any more of those Carrot control -D's that change how many
>> threads are in the act?
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Threads in the LuceneTestCase system

2014-03-21 Thread Benson Margulies
I'm fighting with a test that uses the random analysis chain testing.
It does not repro when I pass in the usual collection of -D's. I think
that the reason is to do with threads; the failure is always on a big
multicore build machine.

Are there any more of those Carrot control -D's that change how many
threads are in the act?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Reducing the number of warnings in the codebase

2014-03-16 Thread Benson Margulies
I think we avoid bikeshed by making incremental changes. If you offer
a commit to turn off serial version UID whining, I'll +1 it. And then
we iterate, in small doses, agreeing to either spike the warning or
change the code.


In passing, I will warn you that the IDEs can be very stubborn; in
some cases, there is no way to avoid some amount of whining. Eclipse
used to insist on warning on every @SuppressWarnings that it didn't
understand. It might still.

On Sun, Mar 16, 2014 at 5:29 PM, Shawn Heisey  wrote:
> A starting comment: We could bikeshed for *years*.
>
> General thought: The more I think about it, the more I like the notion
> of confining most of the cleanup to trunk.  Actual bug fixes and changes
> that are relatively non-invasive should be backported.
>
> On 3/16/2014 2:48 PM, Uwe Schindler wrote:
>>> Just because some tool expresses distaste, doesn't imply that everyone here
>>> agrees that it's a problem we should fix.
>>
>> Yes that is my biggest problem. Lots of warnings by Eclipse are just 
>> bullshit because of the code style in Lucene and for example the way we do 
>> things - e.g., it complains about missing close() all the time, just because 
>> we use IOUtils.closeWhileHandlingExceptions() for that.
>
> My original thought on this was that we should use a combination of
> SuppressWarnings and actual code changes to eliminate most of the
> warnings that show up in the well-supported IDEs when they are
> configured with *default* settings.
>
> Uwe brings up a really good point that there are a number of completely
> useless warnings, but I think there's still value in looking through
> EVERY default IDE warning and evaluating each one on a case-by-case
> basis to decide whether that specific warning should be fixed or
> ignored.  It could be a sort of background task with an open Jira for
> tracking commits.  It could also be something that we decide isn't worth
> the effort.
>
>>> In my experience, the default Sonar rulesets contain many things that people
>>> here are prone to disagree with. Start with serialVersionUID:
>>> do we care? Why would we care? In what cases to we really believe that a
>>> sane person would be using Java serialization with a Lucene/Solr class?
>>
>> We officially don't support serialization, so all warnings are useless. It's 
>> just Eclipse that complains for no reason.
>
> Project-specific IDE settings for errors/warnings (set by the ant build
> target) will go a long way towards making the whole situation better.
> For the current stable branch, we should include settings for anything
> that we want to ignore on trunk, but only a subset of those problems
> that get elevated to error status.
>
>>> Sonar can also be a bit cranky; it arranges for various tools to run via
>>> mechanisms that sometimes conflict with the ways you might run them
>>> yourself.
>>>
>>> So I'd suggest a process like:
>>>
>>> 1. Someone proposes a set of (e.g.) checkstyle rules to live by.
>>> 2. That ruleset is refined by experiment.
>>> 3. We make violations fail the build.
>>>
>>> Then lather, rinse, repeat for other tools.
>>
>> Yes I agree. I am strongly against PMD or CheckStyle without our own rules. 
>> Forbiddeen-apis was invented because of the brokenness of PMD and CheckStyle 
>> to detect default Locale/Charset/Timezone violations (and also because those 
>> tools are slow).
>> We should better fix out Eclipse Project generate to hide the warnings that 
>> are just wrong.
>>
>> I would prefer: Before we fix warnings by 3rd party tools like Eclipse, we 
>> should first fix only the warnings emitted by Javac. The others are just 
>> unimportant to me and I don't want to fix those which are just wrong for our 
>> code style.
>>
>> We already have ECJ in our build (to lint javadocs), we can make some 
>> Eclipse warnings fatal through the ecj config file in our SVN, to fail build 
>> on some warnings. I disagree with using PMD or Checkstyle, those tools are 
>> uncomplete and broken, sorry.
>
> +1 all around.  I want to eliminate all the noise.  If we had IDE
> warnings measured in dozens instead of thousands, it would be a useful
> data point that wouldn't get ignored.
>
> Thanks,
> Shawn
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Reducing the number of warnings in the codebase

2014-03-16 Thread Benson Margulies
Just because some tool expresses distaste, doesn't imply that everyone
here agrees that it's a problem we should fix.

In my experience, the default Sonar rulesets contain many things that
people here are prone to disagree with. Start with serialVersionUID:
do we care? Why would we care? In what cases to we really believe that
a sane person would be using Java serialization with a Lucene/Solr
class?

Sonar can also be a bit cranky; it arranges for various tools to run
via mechanisms that sometimes conflict with the ways you might run
them yourself.

So I'd suggest a process like:

1. Someone proposes a set of (e.g.) checkstyle rules to live by.
2. That ruleset is refined by experiment.
3. We make violations fail the build.

Then lather, rinse, repeat for other tools.

Once we have rulesets we agree are worth enforcing, we can look to
Sonar for a pretty way to visualize their results if we like.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.7.0 RC2

2014-02-20 Thread Benson Margulies
I get it. You're cherry-picking changes onto the rel branch. No,
there's absolutely no reason to imagine grabbing 5449.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.7.0 RC2

2014-02-20 Thread Benson Margulies
Does this mean that 5449 moves to 4.7? If so, who fiddles with
CHANGES.txt? Me as the author or the RM?

On Thu, Feb 20, 2014 at 11:03 AM, Simon Willnauer
 wrote:
> I agree - I will respin
>
> On Thu, Feb 20, 2014 at 4:54 PM, Adrien Grand  wrote:
>> Sorry for bringing the bad news, but I think /LUCENE-5462 is worth a
>> respin. I just committed to the 4.7 branch.
>>
>> On Thu, Feb 20, 2014 at 1:36 PM, Shai Erera  wrote:
>>> +1, smoke tester says: SUCCESS!
>>>
>>> Shai
>>>
>>>
>>> On Thu, Feb 20, 2014 at 10:44 AM, Simon Willnauer
>>>  wrote:

 Please vote for the first Release Candidate for Lucene/Solr 4.7.0

 you can download it here:

 http://people.apache.org/~simonw/staging_area/lucene-solr-4.7.0-RC2-rev1569857/

 or run the smoke tester directly with this commandline (don't forget
 to set JAVA6_HOME etc.):

 $ python3.2 -u dev-tools/scripts/smokeTestRelease.py

 http://people.apache.org/~simonw/staging_area/lucene-solr-4.7.0-RC2-rev1569857/
 1569857 4.7.0 /tmp/smoke_test_4_7

 Smoketester said: SUCCESS!

 here is my +1

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

>>>
>>
>>
>>
>> --
>> Adrien
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5449) Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil

2014-02-19 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies resolved LUCENE-5449.
--

Resolution: Fixed

> Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil
> --
>
> Key: LUCENE-5449
> URL: https://issues.apache.org/jira/browse/LUCENE-5449
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 4.6.1
>    Reporter: Benson Margulies
>Assignee: Benson Margulies
>Priority: Minor
> Fix For: 4.8, 5.0
>
>
> _TestUtil and _TestHelper begin with _ for historical reasons that don't 
> apply any longer. Lets eliminate those _'s.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5449) Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil

2014-02-19 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies updated LUCENE-5449:
-

Fix Version/s: 5.0
   4.8

> Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil
> --
>
> Key: LUCENE-5449
> URL: https://issues.apache.org/jira/browse/LUCENE-5449
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 4.6.1
>    Reporter: Benson Margulies
>Assignee: Benson Margulies
>Priority: Minor
> Fix For: 4.8, 5.0
>
>
> _TestUtil and _TestHelper begin with _ for historical reasons that don't 
> apply any longer. Lets eliminate those _'s.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5449) Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil

2014-02-19 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906174#comment-13906174
 ] 

Benson Margulies commented on LUCENE-5449:
--

I'm unable to reconstruct how I laid this egg. My only theory is that I had 
somehow cd'd back to the wrong tree before running ant precommit after thinking 
i've set up the merge. Rob's commit really just finishes my work on 'part 1': 
part 2 was always going to be the _TestHelper commit. Let's see if I can get 
that one right.


> Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil
> --
>
> Key: LUCENE-5449
> URL: https://issues.apache.org/jira/browse/LUCENE-5449
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 4.6.1
>    Reporter: Benson Margulies
>Assignee: Benson Margulies
>Priority: Minor
>
> _TestUtil and _TestHelper begin with _ for historical reasons that don't 
> apply any longer. Lets eliminate those _'s.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5449) Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil

2014-02-19 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906174#comment-13906174
 ] 

Benson Margulies edited comment on LUCENE-5449 at 2/19/14 10:02 PM:


I'm unable to reconstruct how I laid this egg. My only theory is that I had 
somehow cd'd back to the wrong tree before running ant precommit after thinking 
I had made all the corrections after the merge. Rob's commit really just 
finishes my work on 'part 1': part 2 was always going to be the _TestHelper 
commit. Let's see if I can get that one right.



was (Author: bmargulies):
I'm unable to reconstruct how I laid this egg. My only theory is that I had 
somehow cd'd back to the wrong tree before running ant precommit after thinking 
i've set up the merge. Rob's commit really just finishes my work on 'part 1': 
part 2 was always going to be the _TestHelper commit. Let's see if I can get 
that one right.


> Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil
> --
>
> Key: LUCENE-5449
> URL: https://issues.apache.org/jira/browse/LUCENE-5449
> Project: Lucene - Core
>      Issue Type: Improvement
>Affects Versions: 4.6.1
>Reporter: Benson Margulies
>Assignee: Benson Margulies
>Priority: Minor
>
> _TestUtil and _TestHelper begin with _ for historical reasons that don't 
> apply any longer. Lets eliminate those _'s.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-5449) Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil

2014-02-19 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies reassigned LUCENE-5449:


Assignee: Benson Margulies

> Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil
> --
>
> Key: LUCENE-5449
> URL: https://issues.apache.org/jira/browse/LUCENE-5449
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 4.6.1
>    Reporter: Benson Margulies
>Assignee: Benson Margulies
>Priority: Minor
>
> _TestUtil and _TestHelper begin with _ for historical reasons that don't 
> apply any longer. Lets eliminate those _'s.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: I broke a build, but I can't find the email

2014-02-19 Thread Benson Margulies
OK, I'm puzzled.

'Part 1' was (supposedly) self-contained, and passed ant precommit.

Once I figure out how the explosion got past me, can anyone help me
with how to re-apply without remaking all the manual merge repairs?
It's a bit beyond my svn foo.

Drat. here I am in my workspace, cd'd to lucene, running ant compile
compile-test test, and I got no compile errors. So why does it fail on
Jenkins?



On Wed, Feb 19, 2014 at 3:31 PM, Benson Margulies  wrote:
> I see Rob's revert, but I can't find the email about it.
>
> The thing passed 'ant precommit' before I committed it.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



I broke a build, but I can't find the email

2014-02-19 Thread Benson Margulies
I see Rob's revert, but I can't find the email about it.

The thing passed 'ant precommit' before I committed it.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Refactoring code through Github pull requests

2014-02-18 Thread Benson Margulies
See https://wiki.apache.org/lucene-java/BensonMargulies/GitSvnWorkflow
for an experiment with git svn.

On Tue, Feb 18, 2014 at 1:47 PM, Benson Margulies  wrote:
> Once I have transported a change from branch to branch via diff\apply, git
> stops discussing a rename at all.
>
> On February 18, 2014 9:06:30 AM EST, Thomas Matthijs 
> wrote:
>>
>> Unfortunately i can't find a way to make it explicitly show it will do a
>> svn rename, but it does do it, so that makes this solution not very useful
>> either i guess.
>>
>>
>> --- git ---
>> [master svntest] % git status
>> On branch master
>> Changes to be committed:
>>   (use "git reset HEAD ..." to unstage)
>>
>> renamed:test -> moo
>>
>> [master svntest] % git commit -m "woof"
>> [master 6e2c0b3] woof
>>  1 file changed, 0 insertions(+), 0 deletions(-)
>>  rename test => moo (100%)
>> [master svntest] % git svn dcommit
>> Committing to https://.../trunk ...
>> R test => moo
>> Committed r3
>> D test
>> A moo
>> W: -empty_dir: trunk/test
>> r3 = 0ae41e170cf7d07ec3679eb85d55c068617e0a66 (refs/remotes/trunk)
>>
>>
>> - svn ---
>>
>> [trunk] % svn log --diff -v
>> --------
>> r3 | thomas | 2014-02-18 14:32:07 +0100 (Tue, 18 Feb 2014) | 1 line
>> Changed paths:
>>A /trunk/moo (from /trunk/test:2)
>>D /trunk/test
>>
>> woof
>>
>>
>> On Tue, Feb 18, 2014 at 2:22 PM, Benson Margulies 
>> wrote:
>>>
>>> Let me be specific. If I am sitting in a git clone that has been set
>>> up with git svn, and I use git apply to apply the output of git
>>> format-patch, if I dcommit, is the autodetection going to result in an
>>> svn mv?
>>>
>>>
>>> On Tue, Feb 18, 2014 at 8:20 AM, Thomas Matthijs 
>>> wrote:
>>> > Git does not track renames, but can show/detect it, the magic options
>>> > are -C
>>> > and -M  for diff/show etc
>>> >
>>> >
>>> >
>>> > On Tue, Feb 18, 2014 at 2:16 PM, Benson Margulies
>>> > 
>>> > wrote:
>>> >>
>>> >> I tried using git apply on a patch (from github's .patch URL)  that
>>> >> included a rename. no sign of a rename; just a delete and an add. I
>>> >> feel like I'm missing something.
>>> >>
>>> >> On Tue, Feb 18, 2014 at 7:36 AM, Shai Erera  wrote:
>>> >> > The problem I see is that if you generate a patch using 'git diff',
>>> >> > it
>>> >> > applies just fine to svn (if you generate it w/ --no-prefix) without
>>> >> > any
>>> >> > warnings about missing files due the rename. Wanted to warn the
>>> >> > community
>>> >> > about it, so that when committers assign themselves to PRs, they
>>> >> > review
>>> >> > the
>>> >> > patch closer and detect manually if a rename as happened.
>>> >> >
>>> >> > We could decide that renames are done in a separate commit, but it's
>>> >> > not
>>> >> > always possible.
>>> >> >
>>> >> > So mainly, FYI.
>>> >> >
>>> >> > And if someone has an idea for a script/ant-target we could write to
>>> >> > detect
>>> >> > this case, that would be awesome.
>>> >> >
>>> >> > Shai
>>> >> >
>>> >> >
>>> >> > On Tue, Feb 18, 2014 at 2:31 PM, Thomas Matthijs 
>>> >> > wrote:
>>> >> >>
>>> >> >> Github pull requests can be treated as individual cherry picked
>>> >> >> patch
>>> >> >> sets
>>> >> >> really, not branch merges ? (ie rebased) from there on out you're
>>> >> >> in
>>> >> >> svn
>>> >> >> land. No need to "merge".
>>> >> >>
>>> >> >> But indeed, it tries to detect it based on the file content, and
>>> >> >> doesn't
>>> >> >> work 100% as manual svn moves.
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Tue, Feb 18, 2014 at 1:27 PM, Benson Margu

Re: Refactoring code through Github pull requests

2014-02-18 Thread Benson Margulies
Once I have transported a change from branch to branch via diff\apply, git 
stops discussing a rename at all. 

On February 18, 2014 9:06:30 AM EST, Thomas Matthijs  wrote:
>Unfortunately i can't find a way to make it explicitly show it will do
>a
>svn rename, but it does do it, so that makes this solution not very
>useful
>either i guess.
>
>
>--- git ---
>[master svntest] % git status
>On branch master
>Changes to be committed:
>  (use "git reset HEAD ..." to unstage)
>
>renamed:test -> moo
>
>[master svntest] % git commit -m "woof"
>[master 6e2c0b3] woof
> 1 file changed, 0 insertions(+), 0 deletions(-)
> rename test => moo (100%)
>[master svntest] % git svn dcommit
>Committing to https://.../trunk ...
>R test => moo
>Committed r3
>D test
>A moo
>W: -empty_dir: trunk/test
>r3 = 0ae41e170cf7d07ec3679eb85d55c068617e0a66 (refs/remotes/trunk)
>
>
>- svn ---
>
>[trunk] % svn log --diff -v
>
>r3 | thomas | 2014-02-18 14:32:07 +0100 (Tue, 18 Feb 2014) | 1 line
>Changed paths:
>   A /trunk/moo (from /trunk/test:2)
>   D /trunk/test
>
>woof
>
>
>On Tue, Feb 18, 2014 at 2:22 PM, Benson Margulies
>wrote:
>
>> Let me be specific. If I am sitting in a git clone that has been set
>> up with git svn, and I use git apply to apply the output of git
>> format-patch, if I dcommit, is the autodetection going to result in
>an
>> svn mv?
>>
>>
>> On Tue, Feb 18, 2014 at 8:20 AM, Thomas Matthijs 
>wrote:
>> > Git does not track renames, but can show/detect it, the magic
>options
>> are -C
>> > and -M  for diff/show etc
>> >
>> >
>> >
>> > On Tue, Feb 18, 2014 at 2:16 PM, Benson Margulies
>> >
>> > wrote:
>> >>
>> >> I tried using git apply on a patch (from github's .patch URL) 
>that
>> >> included a rename. no sign of a rename; just a delete and an add.
>I
>> >> feel like I'm missing something.
>> >>
>> >> On Tue, Feb 18, 2014 at 7:36 AM, Shai Erera 
>wrote:
>> >> > The problem I see is that if you generate a patch using 'git
>diff', it
>> >> > applies just fine to svn (if you generate it w/ --no-prefix)
>without
>> any
>> >> > warnings about missing files due the rename. Wanted to warn the
>> >> > community
>> >> > about it, so that when committers assign themselves to PRs, they
>> review
>> >> > the
>> >> > patch closer and detect manually if a rename as happened.
>> >> >
>> >> > We could decide that renames are done in a separate commit, but
>it's
>> not
>> >> > always possible.
>> >> >
>> >> > So mainly, FYI.
>> >> >
>> >> > And if someone has an idea for a script/ant-target we could
>write to
>> >> > detect
>> >> > this case, that would be awesome.
>> >> >
>> >> > Shai
>> >> >
>> >> >
>> >> > On Tue, Feb 18, 2014 at 2:31 PM, Thomas Matthijs
>
>> >> > wrote:
>> >> >>
>> >> >> Github pull requests can be treated as individual cherry picked
>patch
>> >> >> sets
>> >> >> really, not branch merges ? (ie rebased) from there on out
>you're in
>> >> >> svn
>> >> >> land. No need to "merge".
>> >> >>
>> >> >> But indeed, it tries to detect it based on the file content,
>and
>> >> >> doesn't
>> >> >> work 100% as manual svn moves.
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Feb 18, 2014 at 1:27 PM, Benson Margulies
>> >> >> 
>> >> >> wrote:
>> >> >>>
>> >> >>> Well, git-svn has a heap of warnings against using it for
>merges;
>> it's
>> >> >>> also a really bad idea when renaming a whole package, as it
>does it
>> >> >>> one-file-at-a-time.
>> >> >>>
>> >> >>> If you have a workflow that works with the ASF mirror and svn,
>> please
>> >> >>> write it up on the Wiki!
>> >> >>>
>> >> >>>
>> >> >>> On Tue, Feb 18, 2014 at 7:23 AM, Thomas Matthijs
>
>> &

Re: Refactoring code through Github pull requests

2014-02-18 Thread Benson Margulies
On Tue, Feb 18, 2014 at 8:24 AM, Uwe Schindler  wrote:
> Hi Beson,
>
> The problem is that by this approach the rename gets a delete and add with 
> full file content, versus a native SVN rename (which is a svn cp followed by 
> a delete of the original file). By this the history is lost, because for SVN 
> you patch looks like a complete removal of the original file and a later 
> addition of a totally new file. With a native SVN rename, you would see 
> changes to the old file also in the history of the new file. You would even 
> see the file content changes of the commit renaming the file in svn's diff. 
> Now you cannot see the differences between old and new file, because its just 
> a big blob removed/added.


Uwe, that's precisely what I'm chasing. I thought, some years ago,
that I'd seen git svn do a real svn mv, but this morning's experiments
do not lead me to believe that this can be arranged with the tools at
hand.


>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Benson Margulies [mailto:bimargul...@gmail.com]
>> Sent: Tuesday, February 18, 2014 2:17 PM
>> To: dev@lucene.apache.org
>> Subject: Re: Refactoring code through Github pull requests
>>
>> I tried using git apply on a patch (from github's .patch URL)  that included 
>> a
>> rename. no sign of a rename; just a delete and an add. I feel like I'm 
>> missing
>> something.
>>
>> On Tue, Feb 18, 2014 at 7:36 AM, Shai Erera  wrote:
>> > The problem I see is that if you generate a patch using 'git diff', it
>> > applies just fine to svn (if you generate it w/ --no-prefix) without
>> > any warnings about missing files due the rename. Wanted to warn the
>> > community about it, so that when committers assign themselves to PRs,
>> > they review the patch closer and detect manually if a rename as happened.
>> >
>> > We could decide that renames are done in a separate commit, but it's
>> > not always possible.
>> >
>> > So mainly, FYI.
>> >
>> > And if someone has an idea for a script/ant-target we could write to
>> > detect this case, that would be awesome.
>> >
>> > Shai
>> >
>> >
>> > On Tue, Feb 18, 2014 at 2:31 PM, Thomas Matthijs 
>> wrote:
>> >>
>> >> Github pull requests can be treated as individual cherry picked patch
>> >> sets really, not branch merges ? (ie rebased) from there on out
>> >> you're in svn land. No need to "merge".
>> >>
>> >> But indeed, it tries to detect it based on the file content, and
>> >> doesn't work 100% as manual svn moves.
>> >>
>> >>
>> >>
>> >> On Tue, Feb 18, 2014 at 1:27 PM, Benson Margulies
>> >> 
>> >> wrote:
>> >>>
>> >>> Well, git-svn has a heap of warnings against using it for merges;
>> >>> it's also a really bad idea when renaming a whole package, as it
>> >>> does it one-file-at-a-time.
>> >>>
>> >>> If you have a workflow that works with the ASF mirror and svn,
>> >>> please write it up on the Wiki!
>> >>>
>> >>>
>> >>> On Tue, Feb 18, 2014 at 7:23 AM, Thomas Matthijs 
>> >>> wrote:
>> >>> >
>> >>> > On Tue, Feb 18, 2014 at 1:18 PM, Shai Erera 
>> wrote:
>> >>> >>
>> >>> >>
>> >>> >> Second, has anyone perhaps found a way to overcome that issue? I
>> >>> >> thought about maybe writing a script to detect that, looking at
>> >>> >> the patch file, but it seems hard to detect that the deleted Foo
>> >>> >> is the new Bar. If it's just rename, maybe, but if part of the
>> >>> >> rename the code changed a lot ... it becomes harder.
>> >>> >
>> >>> >
>> >>> > Probably not the answer you want but If you use the git-svn bridge
>> >>> > it should detect the rename and commit it in svn as a move/copy
>> >>> >
>> >>> > https://www.kernel.org/pub/software/scm/git/docs/git-svn.html
>> >>>
>> >>> 
>> >>> - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>> >>> additional commands, e-mail: dev-h...@lucene.apache.org
>> >>>
>> >>
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>> commands, e-mail: dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Refactoring code through Github pull requests

2014-02-18 Thread Benson Margulies
Let me be specific. If I am sitting in a git clone that has been set
up with git svn, and I use git apply to apply the output of git
format-patch, if I dcommit, is the autodetection going to result in an
svn mv?


On Tue, Feb 18, 2014 at 8:20 AM, Thomas Matthijs  wrote:
> Git does not track renames, but can show/detect it, the magic options are -C
> and -M  for diff/show etc
>
>
>
> On Tue, Feb 18, 2014 at 2:16 PM, Benson Margulies 
> wrote:
>>
>> I tried using git apply on a patch (from github's .patch URL)  that
>> included a rename. no sign of a rename; just a delete and an add. I
>> feel like I'm missing something.
>>
>> On Tue, Feb 18, 2014 at 7:36 AM, Shai Erera  wrote:
>> > The problem I see is that if you generate a patch using 'git diff', it
>> > applies just fine to svn (if you generate it w/ --no-prefix) without any
>> > warnings about missing files due the rename. Wanted to warn the
>> > community
>> > about it, so that when committers assign themselves to PRs, they review
>> > the
>> > patch closer and detect manually if a rename as happened.
>> >
>> > We could decide that renames are done in a separate commit, but it's not
>> > always possible.
>> >
>> > So mainly, FYI.
>> >
>> > And if someone has an idea for a script/ant-target we could write to
>> > detect
>> > this case, that would be awesome.
>> >
>> > Shai
>> >
>> >
>> > On Tue, Feb 18, 2014 at 2:31 PM, Thomas Matthijs 
>> > wrote:
>> >>
>> >> Github pull requests can be treated as individual cherry picked patch
>> >> sets
>> >> really, not branch merges ? (ie rebased) from there on out you're in
>> >> svn
>> >> land. No need to "merge".
>> >>
>> >> But indeed, it tries to detect it based on the file content, and
>> >> doesn't
>> >> work 100% as manual svn moves.
>> >>
>> >>
>> >>
>> >> On Tue, Feb 18, 2014 at 1:27 PM, Benson Margulies
>> >> 
>> >> wrote:
>> >>>
>> >>> Well, git-svn has a heap of warnings against using it for merges; it's
>> >>> also a really bad idea when renaming a whole package, as it does it
>> >>> one-file-at-a-time.
>> >>>
>> >>> If you have a workflow that works with the ASF mirror and svn, please
>> >>> write it up on the Wiki!
>> >>>
>> >>>
>> >>> On Tue, Feb 18, 2014 at 7:23 AM, Thomas Matthijs 
>> >>> wrote:
>> >>> >
>> >>> > On Tue, Feb 18, 2014 at 1:18 PM, Shai Erera 
>> >>> > wrote:
>> >>> >>
>> >>> >>
>> >>> >> Second, has anyone perhaps found a way to overcome that issue? I
>> >>> >> thought
>> >>> >> about maybe writing a script to detect that, looking at the patch
>> >>> >> file, but
>> >>> >> it seems hard to detect that the deleted Foo is the new Bar. If
>> >>> >> it's
>> >>> >> just
>> >>> >> rename, maybe, but if part of the rename the code changed a lot ...
>> >>> >> it
>> >>> >> becomes harder.
>> >>> >
>> >>> >
>> >>> > Probably not the answer you want but
>> >>> > If you use the git-svn bridge it should detect the rename and commit
>> >>> > it
>> >>> > in
>> >>> > svn as a move/copy
>> >>> >
>> >>> > https://www.kernel.org/pub/software/scm/git/docs/git-svn.html
>> >>>
>> >>> -
>> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>>
>> >>
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Refactoring code through Github pull requests

2014-02-18 Thread Benson Margulies
I tried using git apply on a patch (from github's .patch URL)  that
included a rename. no sign of a rename; just a delete and an add. I
feel like I'm missing something.

On Tue, Feb 18, 2014 at 7:36 AM, Shai Erera  wrote:
> The problem I see is that if you generate a patch using 'git diff', it
> applies just fine to svn (if you generate it w/ --no-prefix) without any
> warnings about missing files due the rename. Wanted to warn the community
> about it, so that when committers assign themselves to PRs, they review the
> patch closer and detect manually if a rename as happened.
>
> We could decide that renames are done in a separate commit, but it's not
> always possible.
>
> So mainly, FYI.
>
> And if someone has an idea for a script/ant-target we could write to detect
> this case, that would be awesome.
>
> Shai
>
>
> On Tue, Feb 18, 2014 at 2:31 PM, Thomas Matthijs  wrote:
>>
>> Github pull requests can be treated as individual cherry picked patch sets
>> really, not branch merges ? (ie rebased) from there on out you're in svn
>> land. No need to "merge".
>>
>> But indeed, it tries to detect it based on the file content, and doesn't
>> work 100% as manual svn moves.
>>
>>
>>
>> On Tue, Feb 18, 2014 at 1:27 PM, Benson Margulies 
>> wrote:
>>>
>>> Well, git-svn has a heap of warnings against using it for merges; it's
>>> also a really bad idea when renaming a whole package, as it does it
>>> one-file-at-a-time.
>>>
>>> If you have a workflow that works with the ASF mirror and svn, please
>>> write it up on the Wiki!
>>>
>>>
>>> On Tue, Feb 18, 2014 at 7:23 AM, Thomas Matthijs 
>>> wrote:
>>> >
>>> > On Tue, Feb 18, 2014 at 1:18 PM, Shai Erera  wrote:
>>> >>
>>> >>
>>> >> Second, has anyone perhaps found a way to overcome that issue? I
>>> >> thought
>>> >> about maybe writing a script to detect that, looking at the patch
>>> >> file, but
>>> >> it seems hard to detect that the deleted Foo is the new Bar. If it's
>>> >> just
>>> >> rename, maybe, but if part of the rename the code changed a lot ... it
>>> >> becomes harder.
>>> >
>>> >
>>> > Probably not the answer you want but
>>> > If you use the git-svn bridge it should detect the rename and commit it
>>> > in
>>> > svn as a move/copy
>>> >
>>> > https://www.kernel.org/pub/software/scm/git/docs/git-svn.html
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5449) Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil

2014-02-18 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904004#comment-13904004
 ] 

Benson Margulies commented on LUCENE-5449:
--

OK, then this is good to go. (I did include one example of switching to a 
static import, even though I agree with [~mikemccand] in general.

> Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil
> --
>
> Key: LUCENE-5449
> URL: https://issues.apache.org/jira/browse/LUCENE-5449
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 4.6.1
>    Reporter: Benson Margulies
>Priority: Minor
>
> _TestUtil and _TestHelper begin with _ for historical reasons that don't 
> apply any longer. Lets eliminate those _'s.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Refactoring code through Github pull requests

2014-02-18 Thread Benson Margulies
Thomas, please help us comparative git ignoramuses, and detail the
sequence of commands you would use with git-svn to apply a PR,
starting from your fork/clone setup.

On Tue, Feb 18, 2014 at 7:31 AM, Thomas Matthijs  wrote:
> Github pull requests can be treated as individual cherry picked patch sets
> really, not branch merges ? (ie rebased) from there on out you're in svn
> land. No need to "merge".
>
> But indeed, it tries to detect it based on the file content, and doesn't
> work 100% as manual svn moves.
>
>
> On Tue, Feb 18, 2014 at 1:27 PM, Benson Margulies 
> wrote:
>>
>> Well, git-svn has a heap of warnings against using it for merges; it's
>> also a really bad idea when renaming a whole package, as it does it
>> one-file-at-a-time.
>>
>> If you have a workflow that works with the ASF mirror and svn, please
>> write it up on the Wiki!
>>
>>
>> On Tue, Feb 18, 2014 at 7:23 AM, Thomas Matthijs  wrote:
>> >
>> > On Tue, Feb 18, 2014 at 1:18 PM, Shai Erera  wrote:
>> >>
>> >>
>> >> Second, has anyone perhaps found a way to overcome that issue? I
>> >> thought
>> >> about maybe writing a script to detect that, looking at the patch file,
>> >> but
>> >> it seems hard to detect that the deleted Foo is the new Bar. If it's
>> >> just
>> >> rename, maybe, but if part of the rename the code changed a lot ... it
>> >> becomes harder.
>> >
>> >
>> > Probably not the answer you want but
>> > If you use the git-svn bridge it should detect the rename and commit it
>> > in
>> > svn as a move/copy
>> >
>> > https://www.kernel.org/pub/software/scm/git/docs/git-svn.html
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Refactoring code through Github pull requests

2014-02-18 Thread Benson Margulies
Well, git-svn has a heap of warnings against using it for merges; it's
also a really bad idea when renaming a whole package, as it does it
one-file-at-a-time.

If you have a workflow that works with the ASF mirror and svn, please
write it up on the Wiki!


On Tue, Feb 18, 2014 at 7:23 AM, Thomas Matthijs  wrote:
>
> On Tue, Feb 18, 2014 at 1:18 PM, Shai Erera  wrote:
>>
>>
>> Second, has anyone perhaps found a way to overcome that issue? I thought
>> about maybe writing a script to detect that, looking at the patch file, but
>> it seems hard to detect that the deleted Foo is the new Bar. If it's just
>> rename, maybe, but if part of the rename the code changed a lot ... it
>> becomes harder.
>
>
> Probably not the answer you want but
> If you use the git-svn bridge it should detect the rename and commit it in
> svn as a move/copy
>
> https://www.kernel.org/pub/software/scm/git/docs/git-svn.html

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5449) Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil

2014-02-18 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903983#comment-13903983
 ] 

Benson Margulies commented on LUCENE-5449:
--

[~thetaphi], I am not enthusiastic about > 1000 edits to change from importing 
the class to static importing the methods. Do you see this as a requirement, or 
just a desirable practice going forward?


> Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil
> --
>
> Key: LUCENE-5449
> URL: https://issues.apache.org/jira/browse/LUCENE-5449
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 4.6.1
>Reporter: Benson Margulies
>Priority: Minor
>
> _TestUtil and _TestHelper begin with _ for historical reasons that don't 
> apply any longer. Lets eliminate those _'s.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Edit permission to the wiki

2014-02-17 Thread Benson Margulies
Tanks

On February 17, 2014 7:14:13 PM EST, Erick Erickson  
wrote:
>You should be set now.
>
>Erick
>
>
>On Mon, Feb 17, 2014 at 3:17 PM, Benson Margulies
>wrote:
>
>> BensonMargulies
>>
>> On February 17, 2014 5:27:16 PM EST, Uwe Schindler 
>> wrote:
>>
>>> Hi Benson,
>>>
>>> what is your username?
>>>
>>> -
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: u...@thetaphi.de
>>>
>>>
>>>  -Original Message-
>>>>  From: Benson Margulies [mailto:bimargul...@gmail.com]
>>>>  Sent: Monday, February 17, 2014 11:16 PM
>>>>  To: dev@lucene.apache.org
>>>>  Subject: Edit permission to the wiki
>>>>
>>>>  Looks like I lack access to modify
>>>>  https://wiki.apache.org/lucene-java. Could someone fix that.
>>>>
>>>> --
>>>>
>>>>  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>additional
>>>>  commands, e-mail: dev-h...@lucene.apache.org
>>>>
>>>
>>>
>>> --
>>>
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

RE: Edit permission to the wiki

2014-02-17 Thread Benson Margulies
BensonMargulies

On February 17, 2014 5:27:16 PM EST, Uwe Schindler  wrote:
>Hi Benson,
>
>what is your username?
>
>-
>Uwe Schindler
>H.-H.-Meier-Allee 63, D-28213 Bremen
>http://www.thetaphi.de
>eMail: u...@thetaphi.de
>
>
>> -----Original Message-
>> From: Benson Margulies [mailto:bimargul...@gmail.com]
>> Sent: Monday, February 17, 2014 11:16 PM
>> To: dev@lucene.apache.org
>> Subject: Edit permission to the wiki
>> 
>> Looks like I lack access to modify
>> https://wiki.apache.org/lucene-java. Could someone fix that.
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>additional
>> commands, e-mail: dev-h...@lucene.apache.org
>
>
>-
>To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>For additional commands, e-mail: dev-h...@lucene.apache.org

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

[jira] [Created] (LUCENE-5449) Two ancient classes renamed to be less peculiar: _TestHelper and _TestUtil

2014-02-17 Thread Benson Margulies (JIRA)
Benson Margulies created LUCENE-5449:


 Summary: Two ancient classes renamed to be less peculiar: 
_TestHelper and _TestUtil
 Key: LUCENE-5449
 URL: https://issues.apache.org/jira/browse/LUCENE-5449
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.6.1
Reporter: Benson Margulies
Priority: Minor


_TestUtil and _TestHelper begin with _ for historical reasons that don't apply 
any longer. Lets eliminate those _'s.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Edit permission to the wiki

2014-02-17 Thread Benson Margulies
Looks like I lack access to modify
https://wiki.apache.org/lucene-java. Could someone fix that.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Exposing random string generation

2014-02-17 Thread Benson Margulies
OK. Yes, refactor is our friend here.

On February 17, 2014 8:25:25 AM EST, Uwe Schindler  wrote:
>Hi Benson,
>
>Open an issue to change _TestUtil and _TestHelper's name! The crazy
>name is no longer needed. This is mostly an Eclipse->Refactor->Rename
>task :-)
>I would be happy to change it. And maybe use static imports in the
>future.
>
>Uwe
>
>-
>Uwe Schindler
>H.-H.-Meier-Allee 63, D-28213 Bremen
>http://www.thetaphi.de
>eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Benson Margulies [mailto:bimargul...@gmail.com]
>> Sent: Monday, February 17, 2014 2:22 PM
>> To: dev@lucene.apache.org
>> Subject: Re: Exposing random string generation
>> 
>> I'm not on a campaign of noisiness. If people prefer to leave _
>alone, alone it
>> is.
>> 
>> On Mon, Feb 17, 2014 at 8:20 AM, Uwe Schindler 
>wrote:
>> > Hi,
>> >
>> > the crazy name for this class is based on the fact that in earlier
>days this
>> class was part of the main test cases. So the class name needed to
>not match
>> the filename pattern "**/Test*.java **/*Test.java", otherwise JUnit
>would
>> have ran it as testcase. In Lucene 4 we have a separate module for
>the "test-
>> framework", and we never run tests inside the "test-framework"
>module, so
>> there is no issue with file names. Everything in test-framework is
>just "utility"
>> classes to be extended by tests outside of the module. The classes in
>"test-
>> framework/src/java" are never ran as test, so file names don't care
>anymore.
>> It is just verbose to rename the class. Ideally you would refactor
>the import
>> statements to something like "import static ..._TestUtil.*" and use
>them as
>> simple static external methods in affected tests.
>> >
>> > Uwe
>> >
>> > -
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: u...@thetaphi.de
>> >
>> >
>> >> -Original Message-
>> >> From: Benson Margulies [mailto:bimargul...@gmail.com]
>> >> Sent: Monday, February 17, 2014 1:49 PM
>> >> To: dev@lucene.apache.org
>> >> Subject: Re: Exposing random string generation
>> >>
>> >> Right, that's my target.
>> >>
>> >> Might I rename _TestUtil for 5.0 :-?
>> >>
>> >> On Mon, Feb 17, 2014 at 7:44 AM, Robert Muir 
>> wrote:
>> >> > There are, but the stuff inside BaseTokenStreamTestCase is
>"better"
>> >> > at finding bugs than any of those methods. it should be moved
>there
>> too.
>> >> >
>> >> >
>> >> > On Mon, Feb 17, 2014 at 7:29 AM, Uwe Schindler 
>> >> wrote:
>> >> >>
>> >> >> Hi Benson,
>> >> >>
>> >> >> See
>> >> >>
>> >> >> http://lucene.apache.org/core/4_6_0/test-
>> >> framework/org/apache/lucene/
>> >> >> util/_TestUtil.html
>> >> >>
>> >> >> There are methods like:
>> >> >> randomRealisticUnicodeString(Random r, int minLength, int
>> >> >> maxLength)
>> >> >>
>> >> >> or more fancy:
>> >> >> randomRegexpishString(Random r, int maxLength)
>> >> >>
>> >> >> Those methods are all static, so you can use from anywhere. The
>> >> >> name _TestUtil goes back to older days. But it is part of test-
>> framework.
>> >> >>
>> >> >> -
>> >> >> Uwe Schindler
>> >> >> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
>> >> >> eMail: u...@thetaphi.de
>> >> >>
>> >> >> > -Original Message-
>> >> >> > From: Benson Margulies [mailto:bimargul...@gmail.com]
>> >> >> > Sent: Monday, February 17, 2014 1:19 PM
>> >> >> > To: dev@lucene.apache.org
>> >> >> > Subject: Exposing random string generation
>> >> >> >
>> >> >> > Down in the bottom of the randomized testing apparatus is
>some
>> >> >> > code for generating random stress data. The only
>> >> >> > public/protected API for it is to push it into an analysis
>> >> >> > chain. Would

Re: Exposing random string generation

2014-02-17 Thread Benson Margulies
I'm not on a campaign of noisiness. If people prefer to leave _ alone,
alone it is.

On Mon, Feb 17, 2014 at 8:20 AM, Uwe Schindler  wrote:
> Hi,
>
> the crazy name for this class is based on the fact that in earlier days this 
> class was part of the main test cases. So the class name needed to not match 
> the filename pattern "**/Test*.java **/*Test.java", otherwise JUnit would 
> have ran it as testcase. In Lucene 4 we have a separate module for the 
> "test-framework", and we never run tests inside the "test-framework" module, 
> so there is no issue with file names. Everything in test-framework is just 
> "utility" classes to be extended by tests outside of the module. The classes 
> in "test-framework/src/java" are never ran as test, so file names don't care 
> anymore. It is just verbose to rename the class. Ideally you would refactor 
> the import statements to something like "import static ..._TestUtil.*" and 
> use them as simple static external methods in affected tests.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Benson Margulies [mailto:bimargul...@gmail.com]
>> Sent: Monday, February 17, 2014 1:49 PM
>> To: dev@lucene.apache.org
>> Subject: Re: Exposing random string generation
>>
>> Right, that's my target.
>>
>> Might I rename _TestUtil for 5.0 :-?
>>
>> On Mon, Feb 17, 2014 at 7:44 AM, Robert Muir  wrote:
>> > There are, but the stuff inside BaseTokenStreamTestCase is "better" at
>> > finding bugs than any of those methods. it should be moved there too.
>> >
>> >
>> > On Mon, Feb 17, 2014 at 7:29 AM, Uwe Schindler 
>> wrote:
>> >>
>> >> Hi Benson,
>> >>
>> >> See
>> >>
>> >> http://lucene.apache.org/core/4_6_0/test-
>> framework/org/apache/lucene/
>> >> util/_TestUtil.html
>> >>
>> >> There are methods like:
>> >> randomRealisticUnicodeString(Random r, int minLength, int maxLength)
>> >>
>> >> or more fancy:
>> >> randomRegexpishString(Random r, int maxLength)
>> >>
>> >> Those methods are all static, so you can use from anywhere. The name
>> >> _TestUtil goes back to older days. But it is part of test-framework.
>> >>
>> >> -
>> >> Uwe Schindler
>> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> http://www.thetaphi.de
>> >> eMail: u...@thetaphi.de
>> >>
>> >> > -Original Message-
>> >> > From: Benson Margulies [mailto:bimargul...@gmail.com]
>> >> > Sent: Monday, February 17, 2014 1:19 PM
>> >> > To: dev@lucene.apache.org
>> >> > Subject: Exposing random string generation
>> >> >
>> >> > Down in the bottom of the randomized testing apparatus is some code
>> >> > for generating random stress data. The only public/protected API
>> >> > for it is to push it into an analysis chain. Would anyone object to
>> >> > a patch to allow direct access to methods that just deliver the
>> >> > randomized text? I'd like some random strings for code below the
>> >> > level of the analysis components.
>> >> >
>> >> > ---
>> >> > -- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>> >> > additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>> >> additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>> commands, e-mail: dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5448) Random string generation centralized in _TestUtil

2014-02-17 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies updated LUCENE-5448:
-

Fix Version/s: 4.7

> Random string generation centralized in _TestUtil
> -
>
> Key: LUCENE-5448
> URL: https://issues.apache.org/jira/browse/LUCENE-5448
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 4.6.1
>    Reporter: Benson Margulies
>Assignee: Benson Margulies
> Fix For: 5.0, 4.7
>
>
> The random string generators in BaseTokenStreamTestCase have wider 
> applicability and should move in with their cousins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5448) Random string generation centralized in _TestUtil

2014-02-17 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies resolved LUCENE-5448.
--

   Resolution: Fixed
Fix Version/s: 5.0
 Assignee: Benson Margulies

> Random string generation centralized in _TestUtil
> -
>
> Key: LUCENE-5448
> URL: https://issues.apache.org/jira/browse/LUCENE-5448
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 4.6.1
>    Reporter: Benson Margulies
>Assignee: Benson Margulies
> Fix For: 5.0
>
>
> The random string generators in BaseTokenStreamTestCase have wider 
> applicability and should move in with their cousins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Exposing random string generation

2014-02-17 Thread Benson Margulies
Yes, that's for sure. I've set up a JIRA and PR for the simple
function move, let's do that and backmerge before making the big
noise.
Pity I missed 4.7 with this.

On Mon, Feb 17, 2014 at 7:52 AM, Robert Muir  wrote:
> Maybe good to separate the two items. renaming that class should be very
> noisy!
>
>
> On Mon, Feb 17, 2014 at 7:48 AM, Benson Margulies 
> wrote:
>>
>> Right, that's my target.
>>
>> Might I rename _TestUtil for 5.0 :-?
>>
>> On Mon, Feb 17, 2014 at 7:44 AM, Robert Muir  wrote:
>> > There are, but the stuff inside BaseTokenStreamTestCase is "better" at
>> > finding bugs than any of those methods. it should be moved there too.
>> >
>> >
>> > On Mon, Feb 17, 2014 at 7:29 AM, Uwe Schindler  wrote:
>> >>
>> >> Hi Benson,
>> >>
>> >> See
>> >>
>> >>
>> >> http://lucene.apache.org/core/4_6_0/test-framework/org/apache/lucene/util/_TestUtil.html
>> >>
>> >> There are methods like:
>> >> randomRealisticUnicodeString(Random r, int minLength, int maxLength)
>> >>
>> >> or more fancy:
>> >> randomRegexpishString(Random r, int maxLength)
>> >>
>> >> Those methods are all static, so you can use from anywhere. The name
>> >> _TestUtil goes back to older days. But it is part of test-framework.
>> >>
>> >> -
>> >> Uwe Schindler
>> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> http://www.thetaphi.de
>> >> eMail: u...@thetaphi.de
>> >>
>> >> > -Original Message-
>> >> > From: Benson Margulies [mailto:bimargul...@gmail.com]
>> >> > Sent: Monday, February 17, 2014 1:19 PM
>> >> > To: dev@lucene.apache.org
>> >> > Subject: Exposing random string generation
>> >> >
>> >> > Down in the bottom of the randomized testing apparatus is some code
>> >> > for
>> >> > generating random stress data. The only public/protected API for it
>> >> > is
>> >> > to push
>> >> > it into an analysis chain. Would anyone object to a patch to allow
>> >> > direct access
>> >> > to methods that just deliver the randomized text? I'd like some
>> >> > random
>> >> > strings for code below the level of the analysis components.
>> >> >
>> >> > -
>> >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>> >> > additional
>> >> > commands, e-mail: dev-h...@lucene.apache.org
>> >>
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5448) Random string generation centralized in _TestUtil

2014-02-17 Thread Benson Margulies (JIRA)
Benson Margulies created LUCENE-5448:


 Summary: Random string generation centralized in _TestUtil
 Key: LUCENE-5448
 URL: https://issues.apache.org/jira/browse/LUCENE-5448
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.6.1
Reporter: Benson Margulies


The random string generators in BaseTokenStreamTestCase have wider 
applicability and should move in with their cousins.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Exposing random string generation

2014-02-17 Thread Benson Margulies
Right, that's my target.

Might I rename _TestUtil for 5.0 :-?

On Mon, Feb 17, 2014 at 7:44 AM, Robert Muir  wrote:
> There are, but the stuff inside BaseTokenStreamTestCase is "better" at
> finding bugs than any of those methods. it should be moved there too.
>
>
> On Mon, Feb 17, 2014 at 7:29 AM, Uwe Schindler  wrote:
>>
>> Hi Benson,
>>
>> See
>>
>> http://lucene.apache.org/core/4_6_0/test-framework/org/apache/lucene/util/_TestUtil.html
>>
>> There are methods like:
>> randomRealisticUnicodeString(Random r, int minLength, int maxLength)
>>
>> or more fancy:
>> randomRegexpishString(Random r, int maxLength)
>>
>> Those methods are all static, so you can use from anywhere. The name
>> _TestUtil goes back to older days. But it is part of test-framework.
>>
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>> > -Original Message-
>> > From: Benson Margulies [mailto:bimargul...@gmail.com]
>> > Sent: Monday, February 17, 2014 1:19 PM
>> > To: dev@lucene.apache.org
>> > Subject: Exposing random string generation
>> >
>> > Down in the bottom of the randomized testing apparatus is some code for
>> > generating random stress data. The only public/protected API for it is
>> > to push
>> > it into an analysis chain. Would anyone object to a patch to allow
>> > direct access
>> > to methods that just deliver the randomized text? I'd like some random
>> > strings for code below the level of the analysis components.
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>> > commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Exposing random string generation

2014-02-17 Thread Benson Margulies
I'll have a look and see if there is code to move from the test case
to the _TestUtil.


On Mon, Feb 17, 2014 at 7:29 AM, Uwe Schindler  wrote:
> Hi Benson,
>
> See
> http://lucene.apache.org/core/4_6_0/test-framework/org/apache/lucene/util/_TestUtil.html
>
> There are methods like:
> randomRealisticUnicodeString(Random r, int minLength, int maxLength)
>
> or more fancy:
> randomRegexpishString(Random r, int maxLength)
>
> Those methods are all static, so you can use from anywhere. The name 
> _TestUtil goes back to older days. But it is part of test-framework.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>> -Original Message-
>> From: Benson Margulies [mailto:bimargul...@gmail.com]
>> Sent: Monday, February 17, 2014 1:19 PM
>> To: dev@lucene.apache.org
>> Subject: Exposing random string generation
>>
>> Down in the bottom of the randomized testing apparatus is some code for
>> generating random stress data. The only public/protected API for it is to 
>> push
>> it into an analysis chain. Would anyone object to a patch to allow direct 
>> access
>> to methods that just deliver the randomized text? I'd like some random
>> strings for code below the level of the analysis components.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>> commands, e-mail: dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Exposing random string generation

2014-02-17 Thread Benson Margulies
Down in the bottom of the randomized testing apparatus is some code
for generating random stress data. The only public/protected API for
it is to push it into an analysis chain. Would anyone object to a
patch to allow direct access to methods that just deliver the
randomized text? I'd like some random strings for code below the level
of the analysis components.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2014-02-08 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895706#comment-13895706
 ] 

Benson Margulies commented on LUCENE-4956:
--

This is a patch, not an accepted component of Apache Lucene. There's no 
guarantee that anyone will work on it.

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Why is String.format[] forbidden?

2014-02-05 Thread Benson Margulies
OK, I get it.

On Wed, Feb 5, 2014 at 7:53 AM, Uwe Schindler  wrote:
> Hi Benson,
>
> If you use String#format without an explicit Locale, its forbidden because 
> not platform independent, which is a requirement for an library like Lucene. 
> If you really know for sure that you want to use the default locale, add 
> Locale.getDefault() explicit. Forbidden checker mentions that fact when it 
> reports the error.
> Lucene 5 (trunk) is based on Java 7, Lucene 4 is based on Java 6.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Benson Margulies [mailto:bimargul...@gmail.com]
>> Sent: Wednesday, February 05, 2014 1:46 PM
>> To: dev@lucene.apache.org
>> Subject: Why is String.format[] forbidden?
>>
>> Or, more specifically, what's the minimum JVM for 4.x versus 5.x? I had the
>> idea that even 4.x required 1.6.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>> commands, e-mail: dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Why is String.format[] forbidden?

2014-02-05 Thread Benson Margulies
Or, more specifically, what's the minimum JVM for 4.x versus 5.x? I
had the idea that even 4.x required 1.6.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5623) Better diagnosis of RuntimeExceptions in analysis

2014-02-05 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892049#comment-13892049
 ] 

Benson Margulies commented on SOLR-5623:


[~shalinmangar] Apparently I haven't learned to read the output of ant test 
very well, and fooled myself into believing that all as well. Thanks for 
cleaning up after me.


> Better diagnosis of RuntimeExceptions in analysis
> -
>
> Key: SOLR-5623
> URL: https://issues.apache.org/jira/browse/SOLR-5623
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6.1
>    Reporter: Benson Margulies
>Assignee: Benson Margulies
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5623-nowrap.patch, SOLR-5623-nowrap.patch
>
>
> If an analysis component (tokenizer, filter, etc) gets really into a hissy 
> fit and throws a RuntimeException, the resulting log traffic is less than 
> informative, lacking any pointer to the doc under discussion (in the doc 
> case). It would be more better if there was a catch/try shortstop that logged 
> this more informatively.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5623) Better diagnosis of RuntimeExceptions in analysis

2014-02-04 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies resolved SOLR-5623.


Resolution: Fixed

> Better diagnosis of RuntimeExceptions in analysis
> -
>
> Key: SOLR-5623
> URL: https://issues.apache.org/jira/browse/SOLR-5623
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6.1
>    Reporter: Benson Margulies
>Assignee: Benson Margulies
> Fix For: 5.0, 4.7
>
>
> If an analysis component (tokenizer, filter, etc) gets really into a hissy 
> fit and throws a RuntimeException, the resulting log traffic is less than 
> informative, lacking any pointer to the doc under discussion (in the doc 
> case). It would be more better if there was a catch/try shortstop that logged 
> this more informatively.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5623) Better diagnosis of RuntimeExceptions in analysis

2014-02-04 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891538#comment-13891538
 ] 

Benson Margulies commented on SOLR-5623:


backported in 1564592.

> Better diagnosis of RuntimeExceptions in analysis
> -
>
> Key: SOLR-5623
> URL: https://issues.apache.org/jira/browse/SOLR-5623
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6.1
>    Reporter: Benson Margulies
>Assignee: Benson Margulies
> Fix For: 5.0, 4.7
>
>
> If an analysis component (tokenizer, filter, etc) gets really into a hissy 
> fit and throws a RuntimeException, the resulting log traffic is less than 
> informative, lacking any pointer to the doc under discussion (in the doc 
> case). It would be more better if there was a catch/try shortstop that logged 
> this more informatively.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5623) Better diagnosis of RuntimeExceptions in analysis

2014-02-04 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies updated SOLR-5623:
---

Affects Version/s: 4.6.1
Fix Version/s: 4.7
   5.0

> Better diagnosis of RuntimeExceptions in analysis
> -
>
> Key: SOLR-5623
> URL: https://issues.apache.org/jira/browse/SOLR-5623
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6.1
>    Reporter: Benson Margulies
>Assignee: Benson Margulies
> Fix For: 5.0, 4.7
>
>
> If an analysis component (tokenizer, filter, etc) gets really into a hissy 
> fit and throws a RuntimeException, the resulting log traffic is less than 
> informative, lacking any pointer to the doc under discussion (in the doc 
> case). It would be more better if there was a catch/try shortstop that logged 
> this more informatively.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5623) Better diagnosis of RuntimeExceptions in analysis

2014-02-04 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891512#comment-13891512
 ] 

Benson Margulies commented on SOLR-5623:


trunk patch 1564584.

> Better diagnosis of RuntimeExceptions in analysis
> -
>
> Key: SOLR-5623
> URL: https://issues.apache.org/jira/browse/SOLR-5623
> Project: Solr
>  Issue Type: Bug
>    Reporter: Benson Margulies
>    Assignee: Benson Margulies
>
> If an analysis component (tokenizer, filter, etc) gets really into a hissy 
> fit and throws a RuntimeException, the resulting log traffic is less than 
> informative, lacking any pointer to the doc under discussion (in the doc 
> case). It would be more better if there was a catch/try shortstop that logged 
> this more informatively.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Having trouble applying a PR to svn

2014-02-04 Thread Benson Margulies
So, I did:

wget https://github.com/apache/lucene-solr/pull/18.diff

and then:

svn patch --dry-run --strip 1 18.diff

and some of the pathnames have 2 components stripped -- but not all.

Has anyone got another approach? I hesitate to use git-svn, but I'll
follow someone else's lead.


svn patch --dry-run --strip 1 18.diff
U solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
A core
A core/src
A core/src/test-files
A core/src/test-files/solr
A core/src/test-files/solr/analysisconfs
A core/src/test-files/solr/analysisconfs/analysis-err-schema.xml
A solr/core/src/test-files/solr/analysisconfs
A solr/core/src/test-files/solr/analysisconfs/analysis-err-schema.xml
A core
A core/src
A core/src/test
A core/src/test/org
A core/src/test/org/apache
A core/src/test/org/apache/solr
A core/src/test/org/apache/solr/analysis
A 
core/src/test/org/apache/solr/analysis/ThrowingMockTokenFilterFactory.java
A 
solr/core/src/test/org/apache/solr/analysis/ThrowingMockTokenFilterFactory.java
A core
A core/src
A core/src/test
A core/src/test/org
A core/src/test/org/apache
A core/src/test/org/apache/solr
A core/src/test/org/apache/solr/update
A core/src/test/org/apache/solr/update/AnalysisErrorHandlingTest.java
A 
solr/core/src/test/org/apache/solr/update/AnalysisErrorHandlingTest.java

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1563855 - /lucene/dev/trunk/lucene/CHANGES.txt

2014-02-03 Thread Benson Margulies
OK, I see the idea. Can Do.

On Mon, Feb 3, 2014 at 7:11 AM, Michael McCandless
 wrote:
> Hmm, I think you need to move trunk's LUCENE-4505's entry down under
> 4.7's section?
>
> Ie, it should be in the same position that it is in on the 4.x branch.
>
> Hmm, LUCENE-5406 should be down in 4.7 as well; it looks like Grant
> back-ported to 4.x.
>
> Basically, very few entries should be under 5.0 :)  We try to backport
> most things except major changes ...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Feb 3, 2014 at 7:04 AM,   wrote:
>> Author: bimargulies
>> Date: Mon Feb  3 12:04:33 2014
>> New Revision: 1563855
>>
>> URL: http://svn.apache.org/r1563855
>> Log:
>> LUCENE-5405: changes.txt; and fix a typo of Grant's for LUCENE-5406.
>>
>> Modified:
>> lucene/dev/trunk/lucene/CHANGES.txt
>>
>> Modified: lucene/dev/trunk/lucene/CHANGES.txt
>> URL: 
>> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/CHANGES.txt?rev=1563855&r1=1563854&r2=1563855&view=diff
>> ==
>> --- lucene/dev/trunk/lucene/CHANGES.txt (original)
>> +++ lucene/dev/trunk/lucene/CHANGES.txt Mon Feb  3 12:04:33 2014
>> @@ -48,10 +48,17 @@ API Changes
>>this term index, pass it directly in your codec, where it can also be 
>> configured
>>per-field. (Robert Muir)
>>
>> -* LUCENE-5388: Remove Reader from Tokenizer's constructor.
>> +* LUCENE-5388: Remove Reader from Tokenizer's constructor and from
>> +  Analyzer's createComponents. TokenStreams now always get their input
>> +  via setReader.
>>(Benson Margulies via Robert Muir - pull request #16)
>>
>> -* LUCENE-5405: Make ShingleAnalzyerWrapper.getWrappedAnalyzer() public 
>> final (gsingers)
>> +* LUCENE-5405: If an analysis component throws an exception, Lucene
>> +  logs the field name to the info stream to assist in
>> +  diagnosis. (Benson Margulies)
>> +
>> +* LUCENE-5406: Make ShingleAnalzyerWrapper.getWrappedAnalyzer() public
>> +  final (gsingers)
>>
>>  Documentation
>>
>>
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5405) Exception strategy for analysis improved

2014-02-03 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies resolved LUCENE-5405.
--

Resolution: Fixed

backported, CHANGES.txt filled in. 'this time for sure'

> Exception strategy for analysis improved
> 
>
> Key: LUCENE-5405
> URL: https://issues.apache.org/jira/browse/LUCENE-5405
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Benson Margulies
>    Assignee: Benson Margulies
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5405-4.x.patch
>
>
> SOLR-5623 included some conversation about the dilemmas of exception 
> management and reporting in the analysis chain. 
> I've belatedly become educated about the infostream, and this situation is a 
> job for it. The DocInverterPerField can note exceptions in the analysis 
> chain, log out to the infostream, and then rethrow them as before. No 
> wrapping, no muss, no fuss.
> There are comments on this JIRA from a more complex prior idea that readers 
> might want to ignore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5405) Exception strategy for analysis improved

2014-02-03 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889424#comment-13889424
 ] 

Benson Margulies commented on LUCENE-5405:
--

rev 1563850 provides the backport.

> Exception strategy for analysis improved
> 
>
> Key: LUCENE-5405
> URL: https://issues.apache.org/jira/browse/LUCENE-5405
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Benson Margulies
>    Assignee: Benson Margulies
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5405-4.x.patch
>
>
> SOLR-5623 included some conversation about the dilemmas of exception 
> management and reporting in the analysis chain. 
> I've belatedly become educated about the infostream, and this situation is a 
> job for it. The DocInverterPerField can note exceptions in the analysis 
> chain, log out to the infostream, and then rethrow them as before. No 
> wrapping, no muss, no fuss.
> There are comments on this JIRA from a more complex prior idea that readers 
> might want to ignore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5405) Exception strategy for analysis improved

2014-02-03 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies updated LUCENE-5405:
-

Fix Version/s: 4.7

> Exception strategy for analysis improved
> 
>
> Key: LUCENE-5405
> URL: https://issues.apache.org/jira/browse/LUCENE-5405
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Benson Margulies
>    Assignee: Benson Margulies
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5405-4.x.patch
>
>
> SOLR-5623 included some conversation about the dilemmas of exception 
> management and reporting in the analysis chain. 
> I've belatedly become educated about the infostream, and this situation is a 
> job for it. The DocInverterPerField can note exceptions in the analysis 
> chain, log out to the infostream, and then rethrow them as before. No 
> wrapping, no muss, no fuss.
> There are comments on this JIRA from a more complex prior idea that readers 
> might want to ignore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5405) Exception strategy for analysis improved

2014-02-03 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889406#comment-13889406
 ] 

Benson Margulies commented on LUCENE-5405:
--

Will do. Thanks, this is exactly what sort of feedback I was looking for.

> Exception strategy for analysis improved
> 
>
> Key: LUCENE-5405
> URL: https://issues.apache.org/jira/browse/LUCENE-5405
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Benson Margulies
>    Assignee: Benson Margulies
> Fix For: 5.0
>
> Attachments: LUCENE-5405-4.x.patch
>
>
> SOLR-5623 included some conversation about the dilemmas of exception 
> management and reporting in the analysis chain. 
> I've belatedly become educated about the infostream, and this situation is a 
> job for it. The DocInverterPerField can note exceptions in the analysis 
> chain, log out to the infostream, and then rethrow them as before. No 
> wrapping, no muss, no fuss.
> There are comments on this JIRA from a more complex prior idea that readers 
> might want to ignore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5405) Exception strategy for analysis improved

2014-02-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889101#comment-13889101
 ] 

Benson Margulies commented on LUCENE-5405:
--

[~mikemccand] and [~rcmuir]: The code in the 4.x branch is more complex. I 
_think_ I've managed to carry the strategy across, but I'd be grateful for some 
skeptical eyeballs before I commit the attach patch that does the backport.

> Exception strategy for analysis improved
> 
>
> Key: LUCENE-5405
> URL: https://issues.apache.org/jira/browse/LUCENE-5405
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Benson Margulies
>Assignee: Benson Margulies
> Fix For: 5.0
>
> Attachments: LUCENE-5405-4.x.patch
>
>
> SOLR-5623 included some conversation about the dilemmas of exception 
> management and reporting in the analysis chain. 
> I've belatedly become educated about the infostream, and this situation is a 
> job for it. The DocInverterPerField can note exceptions in the analysis 
> chain, log out to the infostream, and then rethrow them as before. No 
> wrapping, no muss, no fuss.
> There are comments on this JIRA from a more complex prior idea that readers 
> might want to ignore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5405) Exception strategy for analysis improved

2014-02-02 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies updated LUCENE-5405:
-

Attachment: LUCENE-5405-4.x.patch

Reviewable port.

> Exception strategy for analysis improved
> 
>
> Key: LUCENE-5405
> URL: https://issues.apache.org/jira/browse/LUCENE-5405
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Benson Margulies
>    Assignee: Benson Margulies
> Fix For: 5.0
>
> Attachments: LUCENE-5405-4.x.patch
>
>
> SOLR-5623 included some conversation about the dilemmas of exception 
> management and reporting in the analysis chain. 
> I've belatedly become educated about the infostream, and this situation is a 
> job for it. The DocInverterPerField can note exceptions in the analysis 
> chain, log out to the infostream, and then rethrow them as before. No 
> wrapping, no muss, no fuss.
> There are comments on this JIRA from a more complex prior idea that readers 
> might want to ignore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5405) Exception strategy for analysis improved

2014-02-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889095#comment-13889095
 ] 

Benson Margulies commented on LUCENE-5405:
--

Well, svn merge did something I can't make heads or tails of, so I'm going to 
merge by hand.  

> Exception strategy for analysis improved
> 
>
> Key: LUCENE-5405
> URL: https://issues.apache.org/jira/browse/LUCENE-5405
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Benson Margulies
>Assignee: Benson Margulies
> Fix For: 5.0
>
>
> SOLR-5623 included some conversation about the dilemmas of exception 
> management and reporting in the analysis chain. 
> I've belatedly become educated about the infostream, and this situation is a 
> job for it. The DocInverterPerField can note exceptions in the analysis 
> chain, log out to the infostream, and then rethrow them as before. No 
> wrapping, no muss, no fuss.
> There are comments on this JIRA from a more complex prior idea that readers 
> might want to ignore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5405) Exception strategy for analysis improved

2014-02-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889051#comment-13889051
 ] 

Benson Margulies commented on LUCENE-5405:
--

Somehow the unit test escaped the prior commit. 1563711 fills it in.

> Exception strategy for analysis improved
> 
>
> Key: LUCENE-5405
> URL: https://issues.apache.org/jira/browse/LUCENE-5405
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Benson Margulies
>    Assignee: Benson Margulies
> Fix For: 5.0
>
>
> SOLR-5623 included some conversation about the dilemmas of exception 
> management and reporting in the analysis chain. 
> I've belatedly become educated about the infostream, and this situation is a 
> job for it. The DocInverterPerField can note exceptions in the analysis 
> chain, log out to the infostream, and then rethrow them as before. No 
> wrapping, no muss, no fuss.
> There are comments on this JIRA from a more complex prior idea that readers 
> might want to ignore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5405) Exception strategy for analysis improved

2014-02-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888958#comment-13888958
 ] 

Benson Margulies commented on LUCENE-5405:
--

I can backport, [~mikemccand]. Is there any doc on how the project manages 
branches? If not, I can add some to the web site to help guide patch-offerers.

> Exception strategy for analysis improved
> 
>
> Key: LUCENE-5405
> URL: https://issues.apache.org/jira/browse/LUCENE-5405
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Benson Margulies
>    Assignee: Benson Margulies
> Fix For: 5.0
>
>
> SOLR-5623 included some conversation about the dilemmas of exception 
> management and reporting in the analysis chain. 
> I've belatedly become educated about the infostream, and this situation is a 
> job for it. The DocInverterPerField can note exceptions in the analysis 
> chain, log out to the infostream, and then rethrow them as before. No 
> wrapping, no muss, no fuss.
> There are comments on this JIRA from a more complex prior idea that readers 
> might want to ignore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Getting Apache on your GitHub

2014-01-31 Thread Benson Margulies
Mark, is there any possibility of a 1-many mapping from apache ID to
github ID? I have two.

On Fri, Jan 31, 2014 at 10:51 AM, Mark Miller  wrote:
> If your a committer, you might be interested in some stuff I learned about 
> Apache->GitHub recently:
>
>
> If you want to show off your Apache affiliation and show up as part of the 
> Apache org, add your info here: 
> https://svn.apache.org/repos/private/committers/docs/github_team.txt
>
> If you want to see your Apache contributions show up on GitHub as 
> contributions to the GitHub Apache mirrors, add your @apache.org email 
> address to your GitHub profile.
>
> If you want to see past Apache contributions affect your profile stats and 
> history, send an email to GitHub support and ask for a rebuild to pick up 
> past stats. They did it for me about 2 hours later on a Sunday.
>
> - Mark
>
> http://about.me/markrmiller
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5405) Exception strategy for analysis improved

2014-01-29 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies resolved LUCENE-5405.
--

   Resolution: Fixed
Fix Version/s: 5.0

Fixed in rev 1562657.

> Exception strategy for analysis improved
> 
>
> Key: LUCENE-5405
> URL: https://issues.apache.org/jira/browse/LUCENE-5405
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Benson Margulies
>    Assignee: Benson Margulies
> Fix For: 5.0
>
>
> SOLR-5623 included some conversation about the dilemmas of exception 
> management and reporting in the analysis chain. 
> I've belatedly become educated about the infostream, and this situation is a 
> job for it. The DocInverterPerField can note exceptions in the analysis 
> chain, log out to the infostream, and then rethrow them as before. No 
> wrapping, no muss, no fuss.
> There are comments on this JIRA from a more complex prior idea that readers 
> might want to ignore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Closing a PR

2014-01-29 Thread Benson Margulies
Was that supposed to be my svn commit message :-)

On Wed, Jan 29, 2014 at 7:42 PM, Robert Muir  wrote:
> the way to close is to add "closes #x" to your commit message
>
>
> On Wed, Jan 29, 2014 at 4:36 PM, Benson Margulies 
> wrote:
>>
>> As a committer can I get my github ID added so that I can close PR's?
>>
>> bimargulies would be the relevant ID.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Closing a PR

2014-01-29 Thread Benson Margulies
As a committer can I get my github ID added so that I can close PR's?

bimargulies would be the relevant ID.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5677) HaversineConstFunction ignores one of its two values, is this on purpose?

2014-01-29 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies resolved SOLR-5677.


   Resolution: Fixed
Fix Version/s: 5.0
 Assignee: Benson Margulies

Well, the trunk code no longer has this problem.

> HaversineConstFunction ignores one of its two values, is this on purpose?
> -
>
> Key: SOLR-5677
> URL: https://issues.apache.org/jira/browse/SOLR-5677
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Benson Margulies
>    Assignee: Benson Margulies
> Fix For: 5.0
>
>
> org.apache.solr.search.function.distance.HaversineConstFunction.parser.new 
> ValueSourceParser() {...}.parse(FunctionQParser)
> has an unused variable warning for 'vs2', and uses vs1 to initialize mv2. 
> Maybe vs2 should just be deleted?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Moved] (SOLR-5677) HaversineConstFunction ignores one of its two values, is this on purpose?

2014-01-29 Thread Benson Margulies (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies moved LUCENE-4036 to SOLR-5677:


  Component/s: (was: core/other)
   Schema and Analysis
Lucene Fields:   (was: New)
Affects Version/s: (was: 4.0-ALPHA)
   4.0-ALPHA
  Key: SOLR-5677  (was: LUCENE-4036)
  Project: Solr  (was: Lucene - Core)

> HaversineConstFunction ignores one of its two values, is this on purpose?
> -
>
> Key: SOLR-5677
> URL: https://issues.apache.org/jira/browse/SOLR-5677
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Benson Margulies
>
> org.apache.solr.search.function.distance.HaversineConstFunction.parser.new 
> ValueSourceParser() {...}.parse(FunctionQParser)
> has an unused variable warning for 'vs2', and uses vs1 to initialize mv2. 
> Maybe vs2 should just be deleted?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5623) Better diagnosis of RuntimeExceptions in analysis

2014-01-29 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886068#comment-13886068
 ] 

Benson Margulies commented on SOLR-5623:


[~hossman_luc...@fucit.org] have you looked at my revs?

> Better diagnosis of RuntimeExceptions in analysis
> -
>
> Key: SOLR-5623
> URL: https://issues.apache.org/jira/browse/SOLR-5623
> Project: Solr
>  Issue Type: Bug
>    Reporter: Benson Margulies
>    Assignee: Benson Margulies
>
> If an analysis component (tokenizer, filter, etc) gets really into a hissy 
> fit and throws a RuntimeException, the resulting log traffic is less than 
> informative, lacking any pointer to the doc under discussion (in the doc 
> case). It would be more better if there was a catch/try shortstop that logged 
> this more informatively.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5405) Exception strategy for analysis improved

2014-01-29 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886067#comment-13886067
 ] 

Benson Margulies commented on LUCENE-5405:
--

Am I good to commit here?

> Exception strategy for analysis improved
> 
>
> Key: LUCENE-5405
> URL: https://issues.apache.org/jira/browse/LUCENE-5405
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Benson Margulies
>    Assignee: Benson Margulies
>
> SOLR-5623 included some conversation about the dilemmas of exception 
> management and reporting in the analysis chain. 
> I've belatedly become educated about the infostream, and this situation is a 
> job for it. The DocInverterPerField can note exceptions in the analysis 
> chain, log out to the infostream, and then rethrow them as before. No 
> wrapping, no muss, no fuss.
> There are comments on this JIRA from a more complex prior idea that readers 
> might want to ignore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   >