date:20131110


[ 
https://issues.apache.org/jira/browse/LUCENE-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818394#comment-13818394
 ] 

Shai Erera commented on LUCENE-5327:


I don't think the API isn't useful, just that it's wrong to have it on 
IndexSearcher where there's no matching API on IndexReader.

If you use MultiDocValues, you shouldn't write that complicated code, you could 
instead do:

{code}
NumericDocValues uid = 
MultiDocValues.getNumericDocValues(searcher.getIndexReader(), uid);
for (ScoreDoc sd : topDocs.scoreDocs) {
  long uidValue = uid.get(sd.doc);
}
{code}

That's not so bad I think? I mean, IndexSearcher.getNumericDocValues() will 
essentially save you just the first call, so I don't see any great benefits in 
having the API there.

If you want to avoid the binary search, you should re-sort the topDocs by 
increasing doc IDs, then iterate on reader.leaves(), obtain the NDV from each 
AtomicReader and pull the right values. First, I don't think you should do 
that, unless you're asking for thousands of hits. Second, this won't be solved 
by adding IndexSearcher.getNDV().

I agree w/ Uwe, I think we should close that issue as Won't Fix.

 Expose getNumericDocValues and getBinaryDocValues at toplevel reader and 
 searcher levels
 

 Key: LUCENE-5327
 URL: https://issues.apache.org/jira/browse/LUCENE-5327
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: 4.5
Reporter: John Wang
 Attachments: patch.diff


 Expose NumericDocValues and BinaryDocValues in both IndexReader and 
 IndexSearcher apis.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5327) Expose getNumericDocValues and getBinaryDocValues at toplevel reader and searcher levels

2013-11-10 Thread John Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818402#comment-13818402
 ] 

John Wang commented on LUCENE-5327:
---

done, closed.

 Expose getNumericDocValues and getBinaryDocValues at toplevel reader and 
 searcher levels
 

 Key: LUCENE-5327
 URL: https://issues.apache.org/jira/browse/LUCENE-5327
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: 4.5
Reporter: John Wang
 Attachments: patch.diff


 Expose NumericDocValues and BinaryDocValues in both IndexReader and 
 IndexSearcher apis.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (LUCENE-5327) Expose getNumericDocValues and getBinaryDocValues at toplevel reader and searcher levels

2013-11-10 Thread John Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Wang closed LUCENE-5327.
-

Resolution: Won't Fix

 Expose getNumericDocValues and getBinaryDocValues at toplevel reader and 
 searcher levels
 

 Key: LUCENE-5327
 URL: https://issues.apache.org/jira/browse/LUCENE-5327
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: 4.5
Reporter: John Wang
 Attachments: patch.diff


 Expose NumericDocValues and BinaryDocValues in both IndexReader and 
 IndexSearcher apis.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5374) Support user configured doc-centric versioning rules

2013-11-10 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818410#comment-13818410
 ] 

Anshum Gupta commented on SOLR-5374:


Just a thought, should we change the commented logging to log.debug?
I'm assuming that's the intention behind leaving it in there.

 Support user configured doc-centric versioning rules
 

 Key: SOLR-5374
 URL: https://issues.apache.org/jira/browse/SOLR-5374
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.6, 5.0

 Attachments: SOLR-5374.patch, SOLR-5374.patch, SOLR-5374.patch, 
 SOLR-5374.patch, SOLR-5374.patch, SOLR-5374.patch


 The existing optimistic concurrency features of Solr can be very handy for 
 ensuring that you are only updating/replacing the version of the doc you 
 think you are updating/replacing, w/o the risk of someone else 
 adding/removing the doc in the mean time -- but I've recently encountered 
 some situations where I really wanted to be able to let the client specify an 
 arbitrary version, on a per document basis, (ie: generated by an external 
 system, or perhaps a timestamp of when a file was last modified) and ensure 
 that the corresponding document update was processed only if the new 
 version is greater then the old version -- w/o needing to check exactly 
 which version is currently in Solr.  (ie: If a client wants to index version 
 101 of a doc, that update should fail if version 102 is already in the index, 
 but succeed if the currently indexed version is 99 -- w/o the client needing 
 to ask Solr what the current version)
 The idea Yonik brought up in SOLR-5298 (letting the client specify a 
 {{\_new\_version\_}} that would be used by the existing optimistic 
 concurrency code to control the assignment of the {{\_version\_}} field for 
 documents) looked like a good direction to go -- but after digging into the 
 way {{\_version\_}} is used internally I realized it requires a uniqueness 
 constraint across all update commands, that would make it impossible to allow 
 multiple independent documents to have the same {{\_version\_}}.
 So instead I've tackled the problem in a different way, using an 
 UpdateProcessor that is configured with user defined field to track a 
 DocBasedVersion and uses the RTG logic to figure out if the update is 
 allowed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2013-11-10 Thread Elran Dvir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818422#comment-13818422
 ] 

Elran Dvir commented on SOLR-2894:
--

I didn't manage to make ditributed pivot on date field to blow up with toObject.
Can you please attach an example query that blows Solr up and I'll adjust it to 
my environment? 

Thanks. 

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.6

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5336) Add a simple QueryParser to parse human-entered queries.

2013-11-10 Thread Paul Elschot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818431#comment-13818431
 ] 

Paul Elschot commented on LUCENE-5336:
--

A realistic query parser is not likely to be any simpler than this, so why not 
call it simple?

 Add a simple QueryParser to parse human-entered queries.
 

 Key: LUCENE-5336
 URL: https://issues.apache.org/jira/browse/LUCENE-5336
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Jack Conradson
 Attachments: LUCENE-5336.patch


 I would like to add a new simple QueryParser to Lucene that is designed to 
 parse human-entered queries.  This parser will operate on an entire entered 
 query using a specified single field or a set of weighted fields (using term 
 boost).
 All features/operations in this parser can be enabled or disabled depending 
 on what is necessary for the user.  A default operator may be specified as 
 either 'MUST' representing 'and' or 'SHOULD' representing 'or.'  The 
 features/operations that this parser will include are the following:
 * AND specified as '+'
 * OR specified as '|'
 * NOT specified as '-'
 * PHRASE surrounded by double quotes
 * PREFIX specified as '*'
 * PRECEDENCE surrounded by '(' and ')'
 * WHITESPACE specified as ' ' '\n' '\r' and '\t' will cause the default 
 operator to be used
 * ESCAPE specified as '\' will allow operators to be used in terms
 The key differences between this parser and other existing parsers will be 
 the following:
 * No exceptions will be thrown, and errors in syntax will be ignored.  The 
 parser will do a best-effort interpretation of any query entered.
 * It uses minimal syntax to express queries.  All available operators are 
 single characters or pairs of single characters.
 * The parser is hand-written and in a single Java file making it easy to 
 modify.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 978 - Failure!

2013-11-10 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/978/
Java: 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 22541 lines...]
BUILD FAILED
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/build.xml:428: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/build.xml:67: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/build.xml:188: 
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.init(String.java:215)
at java.lang.StringBuffer.toString(StringBuffer.java:585)
at de.thetaphi.forbiddenapis.asm.commons.Method.toString(Unknown Source)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at 
de.thetaphi.forbiddenapis.Checker$1$1.checkMethodAccess(Checker.java:475)
at 
de.thetaphi.forbiddenapis.Checker$1$1.visitMethodInsn(Checker.java:527)
at de.thetaphi.forbiddenapis.asm.ClassReader.a(Unknown Source)
at de.thetaphi.forbiddenapis.asm.ClassReader.b(Unknown Source)
at de.thetaphi.forbiddenapis.asm.ClassReader.accept(Unknown Source)
at de.thetaphi.forbiddenapis.asm.ClassReader.accept(Unknown Source)
at de.thetaphi.forbiddenapis.Checker.checkClass(Checker.java:378)
at de.thetaphi.forbiddenapis.Checker.run(Checker.java:563)
at de.thetaphi.forbiddenapis.AntTask.execute(AntTask.java:166)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor459.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:390)
at org.apache.tools.ant.Target.performTasks(Target.java:411)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
at 
org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:442)
at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:302)
at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:221)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor459.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

Total time: 95 minutes 39 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseParallelGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

[
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818443#comment-13818443
]

Shai Erera commented on LUCENE-5316:

Mike, I looked at the different runs results and the QPS column (e.g. QPS Base)
varies dramatically between runs. I'm not talking about base vs comp, but
base vs itself in all runs. E.g. in the last run when you compared
ALL_BUT_DIM and NO_PARENTS, AndHighLow of ALL_BUT_DIM was 70.5 vs 41.14
respectively. In the ALL_BUT_DIM run after Gilad's patch with getChildren()
returning null, it's 103.19.

Can you explain the great differences? Note that I don't compare that to the
easy run (w/ 7 dims only) as it does not run the same thing. But I wonder if
the changes in absolute QPS may hint at some instability (maybe temporal) with
the machine or the test? Still, when comp is slowest than base so
ultimately I think it shows the abstraction hurts us, but I'd feel better if
the test was more stable across runs.

Separately, I'm torn about what we should do here. On one hand the abstraction
hurts us, but on the other hand, it eliminates any chance of doing anything
smart in the taxonomy in-memory representation. For example, if a dimension is
flat and some taxonomy implementation manages to assign successive ordinals to
its children, we don't even need to materialize all children in an int[], and
rather hold a start/end range (a'la SortedSetDocValuesReaderState.OrdRange) and
implement ChildrenIterator on top. If we commit to an int[] on the API, it
immediately kills any chance to further optimize that in the future (e.g.
PackedInts even).

I know Gilad is making progress w/ returning an int[] per ord, so I wonder what
the performance will be with it. I really wish we can make that API abstraction
without losing much - it feels the right thing to do ... and I'd hate to do it,
knowing that we lose :).

Taxonomy tree traversing improvement

Key: LUCENE-5316
URL: https://issues.apache.org/jira/browse/LUCENE-5316
Project: Lucene - Core
Issue Type: Improvement
Components: modules/facet
Reporter: Gilad Barkai
Priority: Minor
Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch

The taxonomy traversing is done today utilizing the
{{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays
which hold for each ordinal it's (array #1) youngest child and (array #2)
older sibling.
This is a compact way of holding the tree information in memory, but it's not
perfect:
* Large (8 bytes per ordinal in memory)
* Exposes internal implementation
* Utilizing these arrays for tree traversing is not straight forward
* Lose reference locality while traversing (the array is accessed in
increasing only entries, but they may be distant from one another)
* In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
This issue is about making the traversing more easy, the code more readable,
and open it for future improvements (i.e memory footprint and NRT cost) -
without changing any of the internals.
A later issue(s?) could be opened to address the gaps once this one is done.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

2013-11-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818457#comment-13818457
 ] 

Michael McCandless commented on LUCENE-5316:


I'm not sure why there's such a difference; I do fix the static seed in the 
test, so it's running the same queries every time (otherwise it would pick a 
different set of queries out of each category).

Let me go re-run ... maybe I messed something up.

It would be best if others could run too, to avoid a stupid mistake on my part 
causing us to abandon what would have been a good change!

 Taxonomy tree traversing improvement
 

 Key: LUCENE-5316
 URL: https://issues.apache.org/jira/browse/LUCENE-5316
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Gilad Barkai
Priority: Minor
 Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch


 The taxonomy traversing is done today utilizing the 
 {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
 which hold for each ordinal it's (array #1) youngest child and (array #2) 
 older sibling.
 This is a compact way of holding the tree information in memory, but it's not 
 perfect:
 * Large (8 bytes per ordinal in memory)
 * Exposes internal implementation
 * Utilizing these arrays for tree traversing is not straight forward
 * Lose reference locality while traversing (the array is accessed in 
 increasing only entries, but they may be distant from one another)
 * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
 This issue is about making the traversing more easy, the code more readable, 
 and open it for future improvements (i.e memory footprint and NRT cost) - 
 without changing any of the internals. 
 A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5333) Support sparse faceting for heterogeneous indices

2013-11-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818458#comment-13818458
 ] 

Michael McCandless commented on LUCENE-5333:


bq. Why is it an overkill?

Well, I think the facet module already has too many classes /
abstractions: aggregators, accumulators, ordinal policies, search
params, indexing params, cat paths, encoders, decoders, etc.  I think
this (huge API surface area) is a big impediment to users adopting it
and devs contributing to it.

So, I really don't want to make this worse, by adding yet another
Accumulator, that has static factory methods, to create yet other
Accumulators that are subclasses of existing Accumulators.  I think
it's too much.

I also don't like separating concerns: I think that's a sign that
something is wrong.  I don't think a single class (AllFA) should be
expected to handle both taxonomy based and SSDV based cases.

We already have classes that count facets using those two methods, so
I think we should just add this capability to each of those classes.

And, if we add the enum facet method (and others), then the natural
place to add sparse handling for it would be to its own class, I
think.

bq. So I'm curious - did you try a dedicated class and ran into troubles?

No, I haven't tried: I just didn't really like that approach... so I
focused on the impl instead ...

bq. Is there a reason to not allocating the CFRs up front and setting them on 
the FSP?

I really don't like the approach of create CFR for every possible
dim.  I realize this is a simple way to implement it, but it seems
wrong.  And I especially don't want the API to expose that we are
somehow doing this: it's an impl detail.

So I wanted to get closer to not creating all CFRs up-front, and
doing it transiently seemed at least a bit better than bringing the
entire list into existence.

But I think I can improve on the patch so that we don't even make a
CFR until we see that any labels had non-zero count ... I'll work
towards that.

bq. You sort the FacetResult based on the FResNode.value (their root). Does 
SortedSet always assign a value to the root of a FacetResult.node?

Yes, it does, in the sparse case (I ignore the ord policy).


 Support sparse faceting for heterogeneous indices
 -

 Key: LUCENE-5333
 URL: https://issues.apache.org/jira/browse/LUCENE-5333
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/facet
Reporter: Michael McCandless
 Attachments: LUCENE-5333.patch


 In some search apps, e.g. a large e-commerce site, the index can have
 a mix of wildly different product categories and facet dimensions, and
 the number of dimensions could be huge.
 E.g. maybe the index has shirts, computer memory, hard drives, etc.,
 and each of these many categories has different attributes.
 In such an index, when someone searches for so dimm, which should
 match a bunch of laptop memory modules, you can't (easily) know up
 front which facet dimensions will be important.
 But, I think this is very easy for the facet module, since ords are
 stored row stride (each doc lists all facet labels it has), we could
 simply count all facets that the hits actually saw, and then in the
 end see which ones got traction and return facet results for these
 top dims.
 I'm not sure what the API would look like, but conceptually this
 should work very well, because of how the facet module works.
 You shouldn't have to state up front exactly which facet dimensions
 to count...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5333) Support sparse faceting for heterogeneous indices


[ 
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818466#comment-13818466
 ] 

Shai Erera commented on LUCENE-5333:


bq. Well, I think the facet module already has too many classes

That's unrelated. It's like saying Lucene has many APIs: IndexWriter, 
IndexWriterConfig, Document, Field, MergePolicy, Query, QueryParser, Collector, 
IndexReader, IndexSearcher... just to name a few :). What's important here is 
FacetAccumulator and FacetRequest .. that's it. The rest are *totally* 
unrelated.

This scenario fits into another accumulator. Or else, we'll end up with facet 
code diverging left and right. Even now, for really no good reason, if you 
choose to index facets using SortedSetDV, you can only count them. Why? What 
prevents these ords from weighted by SumScore or a ValueSource? Nothing I 
think? So I'm worried that if you add this to only SortedSetDV, it will 
increase the difference between the two.

Rather, I prefer to pick the right API. We say that FacetsAccumulator is your 
entry point to accumulating facets. So far we've made FacetsAccumulator.create 
adhere to all existing FacetRequests and accumulators and return the proper 
one. I think that's a good API? And if all an AllFA needs to do is create dummy 
requests and filter out the not interesting ones, why complicate the code of 
all other accumulators (existing and future ones)? Won't it be simpler to add 
EnumFacetsAccumulator support to AllFA?

Look, this is not a rocket science feature. Besides that I don't think it's 
such an important or common feature, I think the app doesn't really need to go 
out of its way to support it -- it can easily create all possible FRs using 
very simple API, and filter out FacetResults whose FRN.subResults is empty. Can 
we make a simple utility for these apps - I'm all for it! But I prefer that we 
don't complicate the code of existing FAs.


 Support sparse faceting for heterogeneous indices
 -

 Key: LUCENE-5333
 URL: https://issues.apache.org/jira/browse/LUCENE-5333
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/facet
Reporter: Michael McCandless
 Attachments: LUCENE-5333.patch


 In some search apps, e.g. a large e-commerce site, the index can have
 a mix of wildly different product categories and facet dimensions, and
 the number of dimensions could be huge.
 E.g. maybe the index has shirts, computer memory, hard drives, etc.,
 and each of these many categories has different attributes.
 In such an index, when someone searches for so dimm, which should
 match a bunch of laptop memory modules, you can't (easily) know up
 front which facet dimensions will be important.
 But, I think this is very easy for the facet module, since ords are
 stored row stride (each doc lists all facet labels it has), we could
 simply count all facets that the hits actually saw, and then in the
 end see which ones got traction and return facet results for these
 top dims.
 I'm not sure what the API would look like, but conceptually this
 should work very well, because of how the facet module works.
 You shouldn't have to state up front exactly which facet dimensions
 to count...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

2013-11-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818469#comment-13818469
 ] 

Michael McCandless commented on LUCENE-5316:


I re-ran ALL_BUT_DIM and NO_PARENTS on the last patch:

ALL_BUT_DIM:

{noformat}
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
 LowSloppyPhrase  195.79  (6.4%)  160.76  (6.5%)  
-17.9% ( -28% -   -5%)
 MedSpanNear  189.11  (6.1%)  155.88  (6.5%)  
-17.6% ( -28% -   -5%)
  AndHighLow  171.05  (5.4%)  142.46  (5.9%)  
-16.7% ( -26% -   -5%)
  HighPhrase  165.56  (5.6%)  140.32  (6.0%)  
-15.2% ( -25% -   -3%)
HighSloppyPhrase  135.86  (4.7%)  117.90  (5.3%)  
-13.2% ( -22% -   -3%)
HighSpanNear   98.69  (4.1%)   88.28  (4.5%)  
-10.5% ( -18% -   -2%)
   MedPhrase   89.68  (4.3%)   81.23  (3.7%)   
-9.4% ( -16% -   -1%)
OrNotHighLow   93.45  (5.5%)   85.07  (4.9%)   
-9.0% ( -18% -1%)
 LowTerm   87.06  (3.4%)   79.50  (3.8%)   
-8.7% ( -15% -   -1%)
  Fuzzy1   63.87  (2.5%)   59.39  (2.9%)   
-7.0% ( -12% -   -1%)
  AndHighMed   53.60  (1.9%)   50.49  (2.6%)   
-5.8% ( -10% -   -1%)
   OrHighLow   54.32  (2.2%)   51.18  (2.4%)   
-5.8% ( -10% -   -1%)
   OrNotHighHigh   62.71  (5.5%)   59.11  (5.0%)   
-5.7% ( -15% -5%)
OrNotHighMed   47.72  (3.4%)   45.35  (3.1%)   
-5.0% ( -11% -1%)
  Fuzzy2   48.40  (2.2%)   46.07  (2.4%)   
-4.8% (  -9% -0%)
 AndHighHigh   31.48  (1.6%)   30.33  (1.5%)   
-3.7% (  -6% -0%)
 MedTerm   35.33  (2.0%)   34.06  (1.9%)   
-3.6% (  -7% -0%)
 MedSloppyPhrase   17.17  (4.4%)   16.67  (4.3%)   
-2.9% ( -11% -6%)
 Prefix3   27.73  (1.6%)   26.93  (1.2%)   
-2.9% (  -5% -0%)
OrHighNotMed   24.31  (2.4%)   23.79  (1.1%)   
-2.1% (  -5% -1%)
   LowPhrase   14.56  (4.2%)   14.28  (4.0%)   
-1.9% (  -9% -6%)
 LowSpanNear   11.25  (2.4%)   11.04  (1.7%)   
-1.9% (  -5% -2%)
  OrHighHigh   17.63  (1.6%)   17.38  (1.1%)   
-1.4% (  -4% -1%)
OrHighNotLow   18.97  (1.8%)   18.69  (0.9%)   
-1.4% (  -4% -1%)
Wildcard   13.21  (1.4%)   13.03  (0.9%)   
-1.4% (  -3% -0%)
HighTerm   16.34  (1.8%)   16.14  (1.9%)   
-1.3% (  -4% -2%)
   OrHighMed   18.11  (1.6%)   17.93  (1.4%)   
-1.0% (  -3% -2%)
 Respell   89.31  (2.8%)   88.78  (2.2%)   
-0.6% (  -5% -4%)
   OrHighNotHigh9.09  (2.0%)9.08  (1.4%)   
-0.1% (  -3% -3%)
  IntNRQ4.87  (1.2%)4.90  (1.2%)
0.7% (  -1% -3%)
{noformat}


NO_PARENTS:

{noformat}
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
 LowSloppyPhrase   98.63  (4.7%)   28.73  (2.9%)  
-70.9% ( -74% -  -66%)
 MedSpanNear   97.31  (4.7%)   28.54  (2.9%)  
-70.7% ( -74% -  -66%)
  AndHighLow   91.63  (3.9%)   28.04  (2.9%)  
-69.4% ( -73% -  -65%)
  HighPhrase   90.81  (3.6%)   27.94  (2.9%)  
-69.2% ( -73% -  -65%)
HighSloppyPhrase   80.24  (3.2%)   26.90  (3.1%)  
-66.5% ( -70% -  -62%)
HighSpanNear   65.93  (2.7%)   24.97  (3.3%)  
-62.1% ( -66% -  -57%)
OrNotHighLow   64.00  (3.3%)   24.74  (3.2%)  
-61.3% ( -65% -  -56%)
   MedPhrase   62.06  (4.1%)   24.52  (3.3%)  
-60.5% ( -65% -  -55%)
 LowTerm   61.33  (2.6%)   24.40  (3.3%)  
-60.2% ( -64% -  -55%)
   OrNotHighHigh   48.27  (2.8%)   21.97  (3.4%)  
-54.5% ( -58% -  -49%)
  Fuzzy1   47.61  (2.2%)   21.90  (3.5%)  
-54.0% ( -58% -  -49%)
   OrHighLow   43.63  (2.6%)   21.07  (3.4%)  
-51.7% ( -56% -  -46%)
  AndHighMed   42.86  (2.6%)   20.75  (3.4%)  
-51.6% ( -56% -  -46%)
OrNotHighMed   39.23  (2.0%)   19.93  (3.3%)  
-49.2% ( -53% -  -44%)
  Fuzzy2   38.49  (2.3%)   19.76  (3.3%)  
-48.6% ( -53% -  -44%)

[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors

2013-11-10 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818482#comment-13818482
 ] 

Dawid Weiss commented on LUCENE-5212:
-

I confirm that this patch fixes the problem. I've tested svn rev. 1523179 
(trunk) against jdk8-b114 with and without Vladimir's patch. Without the patch 
the test sequence ends about 50% of the time in a sigsegv. With the patch all 
executions ended without any errors.

Note that the problem only affects CPUs with the AVX extension. A workaround 
for affected VMs is to disable vectorization with -XX:-UseSuperWord.

 java 7u40 causes sigsegv and corrupt term vectors
 -

 Key: LUCENE-5212
 URL: https://issues.apache.org/jira/browse/LUCENE-5212
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: crashFaster.patch, crashFaster2.0.patch, 
 hs_err_pid32714.log, jenkins.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement


[ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818494#comment-13818494
 ] 

Shai Erera commented on LUCENE-5316:


Gilad still hasn't uploaded a new patch w/ the bugfix.

About the results, again the absolute QPS differs a lot? I compared that run to 
the one before.

 Taxonomy tree traversing improvement
 

 Key: LUCENE-5316
 URL: https://issues.apache.org/jira/browse/LUCENE-5316
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Gilad Barkai
Priority: Minor
 Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch


 The taxonomy traversing is done today utilizing the 
 {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
 which hold for each ordinal it's (array #1) youngest child and (array #2) 
 older sibling.
 This is a compact way of holding the tree information in memory, but it's not 
 perfect:
 * Large (8 bytes per ordinal in memory)
 * Exposes internal implementation
 * Utilizing these arrays for tree traversing is not straight forward
 * Lose reference locality while traversing (the array is accessed in 
 increasing only entries, but they may be distant from one another)
 * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
 This issue is about making the traversing more easy, the code more readable, 
 and open it for future improvements (i.e memory footprint and NRT cost) - 
 without changing any of the internals. 
 A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 978 - Failure!

2013-11-10 Thread Uwe Schindler

It looks like the whole set of all class files in complete is too much on 
MacOSX with default heap size... Does anybody know the default heap size on 
MacOSX? Is this anywhere documented?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de]
 Sent: Sunday, November 10, 2013 2:13 PM
 To: dev@lucene.apache.org
 Subject: [JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 978 -
 Failure!
 
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/978/
 Java: 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseParallelGC
 
 All tests passed
 
 Build Log:
 [...truncated 22541 lines...]
 BUILD FAILED
 /Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/build.xml:428: The
 following error occurred while executing this line:
 /Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/build.xml:67: The
 following error occurred while executing this line:
 /Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/build.xml:188:
 java.lang.OutOfMemoryError: Java heap space
   at java.util.Arrays.copyOfRange(Arrays.java:3209)
   at java.lang.String.init(String.java:215)
   at java.lang.StringBuffer.toString(StringBuffer.java:585)
   at
 de.thetaphi.forbiddenapis.asm.commons.Method.toString(Unknown
 Source)
   at java.lang.String.valueOf(String.java:2826)
   at java.lang.StringBuilder.append(StringBuilder.java:115)
   at
 de.thetaphi.forbiddenapis.Checker$1$1.checkMethodAccess(Checker.java:4
 75)
   at
 de.thetaphi.forbiddenapis.Checker$1$1.visitMethodInsn(Checker.java:527)
   at de.thetaphi.forbiddenapis.asm.ClassReader.a(Unknown Source)
   at de.thetaphi.forbiddenapis.asm.ClassReader.b(Unknown Source)
   at de.thetaphi.forbiddenapis.asm.ClassReader.accept(Unknown
 Source)
   at de.thetaphi.forbiddenapis.asm.ClassReader.accept(Unknown
 Source)
   at de.thetaphi.forbiddenapis.Checker.checkClass(Checker.java:378)
   at de.thetaphi.forbiddenapis.Checker.run(Checker.java:563)
   at de.thetaphi.forbiddenapis.AntTask.execute(AntTask.java:166)
   at
 org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
   at sun.reflect.GeneratedMethodAccessor459.invoke(Unknown
 Source)
   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
 sorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at
 org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
   at org.apache.tools.ant.Task.perform(Task.java:348)
   at org.apache.tools.ant.Target.execute(Target.java:390)
   at org.apache.tools.ant.Target.performTasks(Target.java:411)
   at
 org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
   at
 org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleChe
 ckExecutor.java:38)
   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
   at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:442)
   at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:302)
   at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:221)
   at
 org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
   at sun.reflect.GeneratedMethodAccessor459.invoke(Unknown
 Source)
   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
 sorImpl.java:25)
 
 Total time: 95 minutes 39 seconds
 Build step 'Invoke Ant' marked build as failure Description set: Java:
 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseParallelGC Archiving
 artifacts Recording test results Email was triggered for: Failure Sending 
 email
 for trigger: Failure
 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4753) Make forbidden API checks per-module


 [ 
https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4753:
--

Fix Version/s: 4.6

 Make forbidden API checks per-module
 

 Key: LUCENE-4753
 URL: https://issues.apache.org/jira/browse/LUCENE-4753
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 4.6


 After the forbidden API checker was released separately from Lucene as a 
 Google Code project (forked and improved), including Maven support, the 
 checks on Lucene should be changed to work per-module.
 The reason for this is: The improved checker is more picky about e.g. 
 extending classes that are forbidden or overriding methods and calling 
 super.method() if they are on the forbidden signatures list. For these 
 checks, it is not enough to have the class files and the rt.jar, you need the 
 whole classpath. The forbidden APIs 1.0 now by default complains if classes 
 are missing from the classpath.
 It is very hard with the module architecture of Lucene/Solr, to make a 
 uber-classpath, instead the checks should be done per module, so the default 
 compile/test classpath of the module can be used and no crazy path statements 
 with **/*.jar are needed. This needs some refactoring in the exclusion lists, 
 but the Lucene checks could be done by a macro in common-build, that allows 
 custom exclusion lists for specific modules.
 Currently, the strict checking is disabled for Solr, so the checker only 
 complains about missing classes but does not fail the build:
 {noformat}
 -check-forbidden-java-apis:
 [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
 [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt
 [forbidden-apis] Loading classes to check...
 [forbidden-apis] Scanning for API signatures and dependencies...
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
 the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden 
 API invocations (in 1.80s), 0 error(s).
 {noformat}
 I added almost all missing jars, but those do not seem to be in the solr part 
 of the source tree (i think they are only copied when building artifacts). 
 With making the whole thing per module, we can use the default classpath of 
 the module which makes it much easier.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4753) Make forbidden API checks per-module


 [ 
https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4753:
--

Priority: Critical  (was: Major)

Recently on MacOSX, with the default heap size, we get OOMs while running 
forbidden-checker. So we should really do this now.

My proposal: Move the forbidden targets into commons-build.xml on Lucene and 
Solr. Inside commons-build, also define some properties to make some excludes, 
so we can define per module, which patterns/filesets should be checked.

 Make forbidden API checks per-module
 

 Key: LUCENE-4753
 URL: https://issues.apache.org/jira/browse/LUCENE-4753
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 4.6


 After the forbidden API checker was released separately from Lucene as a 
 Google Code project (forked and improved), including Maven support, the 
 checks on Lucene should be changed to work per-module.
 The reason for this is: The improved checker is more picky about e.g. 
 extending classes that are forbidden or overriding methods and calling 
 super.method() if they are on the forbidden signatures list. For these 
 checks, it is not enough to have the class files and the rt.jar, you need the 
 whole classpath. The forbidden APIs 1.0 now by default complains if classes 
 are missing from the classpath.
 It is very hard with the module architecture of Lucene/Solr, to make a 
 uber-classpath, instead the checks should be done per module, so the default 
 compile/test classpath of the module can be used and no crazy path statements 
 with **/*.jar are needed. This needs some refactoring in the exclusion lists, 
 but the Lucene checks could be done by a macro in common-build, that allows 
 custom exclusion lists for specific modules.
 Currently, the strict checking is disabled for Solr, so the checker only 
 complains about missing classes but does not fail the build:
 {noformat}
 -check-forbidden-java-apis:
 [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
 [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt
 [forbidden-apis] Loading classes to check...
 [forbidden-apis] Scanning for API signatures and dependencies...
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
 the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden 
 API invocations (in 1.80s), 0 error(s).
 {noformat}
 I added almost all missing jars, but those do not seem to be in the solr part 
 of the source tree (i think they are only copied when building artifacts). 
 With making the whole thing per module, we can use the default classpath of 
 the module which makes it much easier.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4753) Make forbidden API checks per-module


[ 
https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818499#comment-13818499
 ] 

Uwe Schindler commented on LUCENE-4753:
---

The Maven builds are already per module! So we should get the file patterns and 
targets also synchronized with the definitions in the maven POMs - I have to 
say: in this case, the maven build is better than our ANT build :-( Thanks 
[~steve_rowe]!!!

 Make forbidden API checks per-module
 

 Key: LUCENE-4753
 URL: https://issues.apache.org/jira/browse/LUCENE-4753
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 4.6


 After the forbidden API checker was released separately from Lucene as a 
 Google Code project (forked and improved), including Maven support, the 
 checks on Lucene should be changed to work per-module.
 The reason for this is: The improved checker is more picky about e.g. 
 extending classes that are forbidden or overriding methods and calling 
 super.method() if they are on the forbidden signatures list. For these 
 checks, it is not enough to have the class files and the rt.jar, you need the 
 whole classpath. The forbidden APIs 1.0 now by default complains if classes 
 are missing from the classpath.
 It is very hard with the module architecture of Lucene/Solr, to make a 
 uber-classpath, instead the checks should be done per module, so the default 
 compile/test classpath of the module can be used and no crazy path statements 
 with **/*.jar are needed. This needs some refactoring in the exclusion lists, 
 but the Lucene checks could be done by a macro in common-build, that allows 
 custom exclusion lists for specific modules.
 Currently, the strict checking is disabled for Solr, so the checker only 
 complains about missing classes but does not fail the build:
 {noformat}
 -check-forbidden-java-apis:
 [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
 [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt
 [forbidden-apis] Loading classes to check...
 [forbidden-apis] Scanning for API signatures and dependencies...
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
 the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden 
 API invocations (in 1.80s), 0 error(s).
 {noformat}
 I added almost all missing jars, but those do not seem to be in the solr part 
 of the source tree (i think they are only copied when building artifacts). 
 With making the whole thing per module, we can use the default classpath of 
 the module which makes it much easier.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 978 - Failure!

2013-11-10 Thread Dawid Weiss

 Does anybody know the default heap size on MacOSX? Is this anywhere 
 documented?

I think it's a heuristic that depends on the environment (didn't
inspect openjdk sources)? We could just dump memory limits via mx bean
-- it'd provide an interesting insight into the defaults on different
systems...

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5333) Support sparse faceting for heterogeneous indices

[
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-5333:
---

Attachment: LUCENE-5333.patch

Here's the simple way I thought about - AllFacetsAccumulator takes no requests,
has two ctors - one for SSDV and another for TaxoReader and initializes the
proper FA underneath. To which it delegates the .accumulate, and later filters
out any FRes with no children.

It's just a means for showing how I think it should be done. Still need to
integrate it into FA.create, if we want to simplify an app's life even more,
though I'd prefer to wait for some feedback from anyone that actually uses it
first.

Support sparse faceting for heterogeneous indices
-

Key: LUCENE-5333
URL: https://issues.apache.org/jira/browse/LUCENE-5333
Project: Lucene - Core
Issue Type: New Feature
Components: modules/facet
Reporter: Michael McCandless
Attachments: LUCENE-5333.patch, LUCENE-5333.patch

In some search apps, e.g. a large e-commerce site, the index can have
a mix of wildly different product categories and facet dimensions, and
the number of dimensions could be huge.
E.g. maybe the index has shirts, computer memory, hard drives, etc.,
and each of these many categories has different attributes.
In such an index, when someone searches for so dimm, which should
match a bunch of laptop memory modules, you can't (easily) know up
front which facet dimensions will be important.
But, I think this is very easy for the facet module, since ords are
stored row stride (each doc lists all facet labels it has), we could
simply count all facets that the hits actually saw, and then in the
end see which ones got traction and return facet results for these
top dims.
I'm not sure what the API would look like, but conceptually this
should work very well, because of how the facet module works.
You shouldn't have to state up front exactly which facet dimensions
to count...

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5333) Support sparse faceting for heterogeneous indices


[ 
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818525#comment-13818525
 ] 

Shai Erera commented on LUCENE-5333:


There's also a third option:

* We add the getDimensions API to SSDVReaderState
* We put an example under demo/ExploreFacetsExample (or better name)
** We basically demonstrate how to create a ListFR for all available 
dimensions using either TaxoReader or SSDVReaderState
** And we show how to filter out the empty ones

If one day someone will ask how to do it and the example won't be enough, we 
can think about porting it as an FA or inlined into the other FAs. But until 
then, it's really a simple example.

 Support sparse faceting for heterogeneous indices
 -

 Key: LUCENE-5333
 URL: https://issues.apache.org/jira/browse/LUCENE-5333
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/facet
Reporter: Michael McCandless
 Attachments: LUCENE-5333.patch, LUCENE-5333.patch


 In some search apps, e.g. a large e-commerce site, the index can have
 a mix of wildly different product categories and facet dimensions, and
 the number of dimensions could be huge.
 E.g. maybe the index has shirts, computer memory, hard drives, etc.,
 and each of these many categories has different attributes.
 In such an index, when someone searches for so dimm, which should
 match a bunch of laptop memory modules, you can't (easily) know up
 front which facet dimensions will be important.
 But, I think this is very easy for the facet module, since ords are
 stored row stride (each doc lists all facet labels it has), we could
 simply count all facets that the hits actually saw, and then in the
 end see which ones got traction and return facet results for these
 top dims.
 I'm not sure what the API would look like, but conceptually this
 should work very well, because of how the facet module works.
 You shouldn't have to state up front exactly which facet dimensions
 to count...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 979 - Still Failing!

2013-11-10 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/979/
Java: 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 22550 lines...]
BUILD FAILED
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/build.xml:428: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/build.xml:67: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/build.xml:188: 
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
at java.lang.StringBuilder.append(StringBuilder.java:203)
at 
de.thetaphi.forbiddenapis.Checker$1$1.checkMethodAccess(Checker.java:475)
at 
de.thetaphi.forbiddenapis.Checker$1$1.visitMethodInsn(Checker.java:527)
at de.thetaphi.forbiddenapis.asm.ClassReader.a(Unknown Source)
at de.thetaphi.forbiddenapis.asm.ClassReader.b(Unknown Source)
at de.thetaphi.forbiddenapis.asm.ClassReader.accept(Unknown Source)
at de.thetaphi.forbiddenapis.asm.ClassReader.accept(Unknown Source)
at de.thetaphi.forbiddenapis.Checker.checkClass(Checker.java:378)
at de.thetaphi.forbiddenapis.Checker.run(Checker.java:563)
at de.thetaphi.forbiddenapis.AntTask.execute(AntTask.java:166)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor462.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:390)
at org.apache.tools.ant.Target.performTasks(Target.java:411)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
at 
org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:442)
at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:302)
at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:221)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor462.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)

Total time: 108 minutes 2 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.6.0 -XX:+UseCompressedOops 
-XX:+UseConcMarkSweepGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5434) Create minimal solrcloud example directory

2013-11-10 Thread Alan Woodward (JIRA)

Alan Woodward created SOLR-5434:
---

 Summary: Create minimal solrcloud example directory
 Key: SOLR-5434
 URL: https://issues.apache.org/jira/browse/SOLR-5434
 Project: Solr
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Minor
 Fix For: 4.6, 5.0


The various intro to solr cloud pages (for example 
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud)
 currently tell new users to use the example/ directory as a basis for setting 
up new cloud instances.  These directories contain, under the default solr/ 
solr home directory, a single core, defined to point to the collection1 
collection.

It's not at all obvious that, to change the name of your collection, you have 
to go and edit the core.properties file underneath the solr/ directory.  A lot 
of users on the mailing list also seem to get confused by having to include 
bootstrap_confdir and numShards the first time they run solr, but not 
afterwards.  So here's a suggestion:

* Have a new solrcloud/ directory in the example webapp that just contains a 
solr.xml file
* Change the startup example code to just include -Dsolr.solr.home and -DzkRun
* Tell the user to then run zkcli to bootstrap their configuration (solr 
startup and configuration loading are kept separate)
* Tell the users to use the collections API to create a new collection, naming 
it however they want (confignames, collection names and core names are all kept 
separate)

This way, there's a lot less 'magic' and hidden defaults involved, and all the 
steps to get a cloud up and running (start processes, upload configuration, 
create collection) are made distinguishable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4753) Make forbidden API checks per-module


 [ 
https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4753:
--

Attachment: LUCENE-4753.patch

Patch. I will commit this soon if nobody objects.

There is still room for improvements (e.g. we can now enable servlet-api checks 
in some Lucene modules that use servlets, or enable commons-io checks for 
lucene modules that use commons-io).

 Make forbidden API checks per-module
 

 Key: LUCENE-4753
 URL: https://issues.apache.org/jira/browse/LUCENE-4753
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 4.6

 Attachments: LUCENE-4753.patch


 After the forbidden API checker was released separately from Lucene as a 
 Google Code project (forked and improved), including Maven support, the 
 checks on Lucene should be changed to work per-module.
 The reason for this is: The improved checker is more picky about e.g. 
 extending classes that are forbidden or overriding methods and calling 
 super.method() if they are on the forbidden signatures list. For these 
 checks, it is not enough to have the class files and the rt.jar, you need the 
 whole classpath. The forbidden APIs 1.0 now by default complains if classes 
 are missing from the classpath.
 It is very hard with the module architecture of Lucene/Solr, to make a 
 uber-classpath, instead the checks should be done per module, so the default 
 compile/test classpath of the module can be used and no crazy path statements 
 with **/*.jar are needed. This needs some refactoring in the exclusion lists, 
 but the Lucene checks could be done by a macro in common-build, that allows 
 custom exclusion lists for specific modules.
 Currently, the strict checking is disabled for Solr, so the checker only 
 complains about missing classes but does not fail the build:
 {noformat}
 -check-forbidden-java-apis:
 [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
 [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt
 [forbidden-apis] Loading classes to check...
 [forbidden-apis] Scanning for API signatures and dependencies...
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
 the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden 
 API invocations (in 1.80s), 0 error(s).
 {noformat}
 I added almost all missing jars, but those do not seem to be in the solr part 
 of the source tree (i think they are only copied when building artifacts). 
 With making the whole thing per module, we can use the default classpath of 
 the module which makes it much easier.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5152) EdgeNGramFilterFactory deletes token


 [ 
https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated SOLR-5152:


Attachment: SOLR-5152.patch

 EdgeNGramFilterFactory deletes token
 

 Key: SOLR-5152
 URL: https://issues.apache.org/jira/browse/SOLR-5152
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.4
Reporter: Christoph Lingg
 Attachments: SOLR-5152.patch


 I am using EdgeNGramFilterFactory in my schema.xml
 {code:xml}fieldType name=text class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 !-- ... --
 filter class=solr.EdgeNGramFilterFactory minGramSize=2 
 maxGramSize=10 side=front /
   /analyzer
 /fieldType{code}
 Some tokens in my index only consist of one character, let's say {{R}}. 
 minGramSize is set to 2 and is bigger than the length of the token. I 
 expected the NGramFilter to left {{R}} unchanged but in fact it is deleting 
 the token.
 For my use case this interpretation is undesirable, and probably for most use 
 cases too!?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5152) EdgeNGramFilterFactory deletes token


[ 
https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818583#comment-13818583
 ] 

Furkan KAMACI commented on SOLR-5152:
-

I've added preserveOriginal capability to EdgeNGramFilterFactory and attached a 
patch.

 EdgeNGramFilterFactory deletes token
 

 Key: SOLR-5152
 URL: https://issues.apache.org/jira/browse/SOLR-5152
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.4
Reporter: Christoph Lingg
 Attachments: SOLR-5152.patch


 I am using EdgeNGramFilterFactory in my schema.xml
 {code:xml}fieldType name=text class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 !-- ... --
 filter class=solr.EdgeNGramFilterFactory minGramSize=2 
 maxGramSize=10 side=front /
   /analyzer
 /fieldType{code}
 Some tokens in my index only consist of one character, let's say {{R}}. 
 minGramSize is set to 2 and is bigger than the length of the token. I 
 expected the NGramFilter to left {{R}} unchanged but in fact it is deleting 
 the token.
 For my use case this interpretation is undesirable, and probably for most use 
 cases too!?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4753) Make forbidden API checks per-module


 [ 
https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4753:
--

Attachment: LUCENE-4753.patch

New patch, removed useless dependency.

 Make forbidden API checks per-module
 

 Key: LUCENE-4753
 URL: https://issues.apache.org/jira/browse/LUCENE-4753
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 4.6

 Attachments: LUCENE-4753.patch, LUCENE-4753.patch


 After the forbidden API checker was released separately from Lucene as a 
 Google Code project (forked and improved), including Maven support, the 
 checks on Lucene should be changed to work per-module.
 The reason for this is: The improved checker is more picky about e.g. 
 extending classes that are forbidden or overriding methods and calling 
 super.method() if they are on the forbidden signatures list. For these 
 checks, it is not enough to have the class files and the rt.jar, you need the 
 whole classpath. The forbidden APIs 1.0 now by default complains if classes 
 are missing from the classpath.
 It is very hard with the module architecture of Lucene/Solr, to make a 
 uber-classpath, instead the checks should be done per module, so the default 
 compile/test classpath of the module can be used and no crazy path statements 
 with **/*.jar are needed. This needs some refactoring in the exclusion lists, 
 but the Lucene checks could be done by a macro in common-build, that allows 
 custom exclusion lists for specific modules.
 Currently, the strict checking is disabled for Solr, so the checker only 
 complains about missing classes but does not fail the build:
 {noformat}
 -check-forbidden-java-apis:
 [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
 [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt
 [forbidden-apis] Loading classes to check...
 [forbidden-apis] Scanning for API signatures and dependencies...
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
 the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden 
 API invocations (in 1.80s), 0 error(s).
 {noformat}
 I added almost all missing jars, but those do not seem to be in the solr part 
 of the source tree (i think they are only copied when building artifacts). 
 With making the whole thing per module, we can use the default classpath of 
 the module which makes it much easier.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818596#comment-13818596
]

Furkan KAMACI commented on SOLR-5332:
-

This issue can be marked as duplicated because of that issue:
https://issues.apache.org/jira/browse/SOLR-5152

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Reporter: Alexander S.

Hi, as described here:
http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
the problem is in that if you have these 2 strings to index:
1. facebook.com/someuser.1
2. facebook.com/someveryandverylongusername
and the edge ngram filter factory with min and max gram size settings 2 and
25, search requests for these urls will fail.
But search requests for:
1. facebook.com/someuser
2. facebook.com/someveryandverylonguserna
will work properly.
It's because first url has 1 at the end, which is lover than the allowed
min gram size. In the second url the user name is longer than the max gram
size (27 characters).
Would be good to have a preserve original option, that will add the
original string to the index if it does not fit the allowed gram size, so
that 1 and someveryandverylongusername tokens will also be added to the
index.
Best,
Alex

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818593#comment-13818593
]

Furkan KAMACI commented on SOLR-5332:
-

I've added preserveOriginal capability to EdgeNGramFilterFactory and attached a
patch to SOLR-5152. I want to make clear something about the problem that is
pointed at this issue. The schema that is described at here:
http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
uses LowerCaseFilterFactory before EdgeNGramFilterFactory. There is an
explanation about it:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory
and says that: Creates tokens by lowercasing all letters and dropping
non-letters. So non-letters will be dropped before tokens are retrieved by
EdgeNGramFilterFactory.

My patch preserves original token if preserveOriginal is set to true and token
length is less than minGramSize or greater than maxGramSize.

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Reporter: Alexander S.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5152) EdgeNGramFilterFactory deletes token


 [ 
https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated SOLR-5152:


Attachment: (was: SOLR-5152.patch)

 EdgeNGramFilterFactory deletes token
 

 Key: SOLR-5152
 URL: https://issues.apache.org/jira/browse/SOLR-5152
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.4
Reporter: Christoph Lingg
 Attachments: SOLR-5152.patch


 I am using EdgeNGramFilterFactory in my schema.xml
 {code:xml}fieldType name=text class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 !-- ... --
 filter class=solr.EdgeNGramFilterFactory minGramSize=2 
 maxGramSize=10 side=front /
   /analyzer
 /fieldType{code}
 Some tokens in my index only consist of one character, let's say {{R}}. 
 minGramSize is set to 2 and is bigger than the length of the token. I 
 expected the NGramFilter to left {{R}} unchanged but in fact it is deleting 
 the token.
 For my use case this interpretation is undesirable, and probably for most use 
 cases too!?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5152) EdgeNGramFilterFactory deletes token


 [ 
https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated SOLR-5152:


Attachment: SOLR-5152.patch

 EdgeNGramFilterFactory deletes token
 

 Key: SOLR-5152
 URL: https://issues.apache.org/jira/browse/SOLR-5152
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.4
Reporter: Christoph Lingg
 Attachments: SOLR-5152.patch


 I am using EdgeNGramFilterFactory in my schema.xml
 {code:xml}fieldType name=text class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 !-- ... --
 filter class=solr.EdgeNGramFilterFactory minGramSize=2 
 maxGramSize=10 side=front /
   /analyzer
 /fieldType{code}
 Some tokens in my index only consist of one character, let's say {{R}}. 
 minGramSize is set to 2 and is bigger than the length of the token. I 
 expected the NGramFilter to left {{R}} unchanged but in fact it is deleting 
 the token.
 For my use case this interpretation is undesirable, and probably for most use 
 cases too!?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4753) Make forbidden API checks per-module

2013-11-10 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4753:
--

Attachment: LUCENE-4753.patch

Final patch. Will commit in a moment.

 Make forbidden API checks per-module
 

 Key: LUCENE-4753
 URL: https://issues.apache.org/jira/browse/LUCENE-4753
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 4.6

 Attachments: LUCENE-4753.patch, LUCENE-4753.patch, LUCENE-4753.patch


 After the forbidden API checker was released separately from Lucene as a 
 Google Code project (forked and improved), including Maven support, the 
 checks on Lucene should be changed to work per-module.
 The reason for this is: The improved checker is more picky about e.g. 
 extending classes that are forbidden or overriding methods and calling 
 super.method() if they are on the forbidden signatures list. For these 
 checks, it is not enough to have the class files and the rt.jar, you need the 
 whole classpath. The forbidden APIs 1.0 now by default complains if classes 
 are missing from the classpath.
 It is very hard with the module architecture of Lucene/Solr, to make a 
 uber-classpath, instead the checks should be done per module, so the default 
 compile/test classpath of the module can be used and no crazy path statements 
 with **/*.jar are needed. This needs some refactoring in the exclusion lists, 
 but the Lucene checks could be done by a macro in common-build, that allows 
 custom exclusion lists for specific modules.
 Currently, the strict checking is disabled for Solr, so the checker only 
 complains about missing classes but does not fail the build:
 {noformat}
 -check-forbidden-java-apis:
 [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
 [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt
 [forbidden-apis] Loading classes to check...
 [forbidden-apis] Scanning for API signatures and dependencies...
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
 the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden 
 API invocations (in 1.80s), 0 error(s).
 {noformat}
 I added almost all missing jars, but those do not seem to be in the solr part 
 of the source tree (i think they are only copied when building artifacts). 
 With making the whole thing per module, we can use the default classpath of 
 the module which makes it much easier.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Adding preserveOriginal Capability to EdgeNGramFilterFactory

2013-11-10 Thread Furkan KAMACI

Hi;

There were two issues about adding preserveOriginal capability to
EdgeNGramFilterFactory and I've made a patch about it. You can check and
test it from here: https://issues.apache.org/jira/browse/SOLR-5152 This is
the related issue that can be marked as duplicated:
https://issues.apache.org/jira/browse/SOLR-5332

Thanks;
Furkan KAMACI

[jira] [Commented] (LUCENE-4753) Make forbidden API checks per-module


[ 
https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818603#comment-13818603
 ] 

ASF subversion and git services commented on LUCENE-4753:
-

Commit 1540573 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1540573 ]

LUCENE-4753: Run forbidden-apis Ant task per module. This allows more 
improvements and prevents OOMs after the number of class files raised recently

 Make forbidden API checks per-module
 

 Key: LUCENE-4753
 URL: https://issues.apache.org/jira/browse/LUCENE-4753
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 4.6

 Attachments: LUCENE-4753.patch, LUCENE-4753.patch, LUCENE-4753.patch


 After the forbidden API checker was released separately from Lucene as a 
 Google Code project (forked and improved), including Maven support, the 
 checks on Lucene should be changed to work per-module.
 The reason for this is: The improved checker is more picky about e.g. 
 extending classes that are forbidden or overriding methods and calling 
 super.method() if they are on the forbidden signatures list. For these 
 checks, it is not enough to have the class files and the rt.jar, you need the 
 whole classpath. The forbidden APIs 1.0 now by default complains if classes 
 are missing from the classpath.
 It is very hard with the module architecture of Lucene/Solr, to make a 
 uber-classpath, instead the checks should be done per module, so the default 
 compile/test classpath of the module can be used and no crazy path statements 
 with **/*.jar are needed. This needs some refactoring in the exclusion lists, 
 but the Lucene checks could be done by a macro in common-build, that allows 
 custom exclusion lists for specific modules.
 Currently, the strict checking is disabled for Solr, so the checker only 
 complains about missing classes but does not fail the build:
 {noformat}
 -check-forbidden-java-apis:
 [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
 [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt
 [forbidden-apis] Loading classes to check...
 [forbidden-apis] Scanning for API signatures and dependencies...
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
 the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden 
 API invocations (in 1.80s), 0 error(s).
 {noformat}
 I added almost all missing jars, but those do not seem to be in the solr part 
 of the source tree (i think they are only copied when building artifacts). 
 With making the whole thing per module, we can use the default classpath of 
 the module which makes it much easier.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4753) Make forbidden API checks per-module

2013-11-10 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-4753.
---

Resolution: Fixed

 Make forbidden API checks per-module
 

 Key: LUCENE-4753
 URL: https://issues.apache.org/jira/browse/LUCENE-4753
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 4.6

 Attachments: LUCENE-4753.patch, LUCENE-4753.patch, LUCENE-4753.patch


 After the forbidden API checker was released separately from Lucene as a 
 Google Code project (forked and improved), including Maven support, the 
 checks on Lucene should be changed to work per-module.
 The reason for this is: The improved checker is more picky about e.g. 
 extending classes that are forbidden or overriding methods and calling 
 super.method() if they are on the forbidden signatures list. For these 
 checks, it is not enough to have the class files and the rt.jar, you need the 
 whole classpath. The forbidden APIs 1.0 now by default complains if classes 
 are missing from the classpath.
 It is very hard with the module architecture of Lucene/Solr, to make a 
 uber-classpath, instead the checks should be done per module, so the default 
 compile/test classpath of the module can be used and no crazy path statements 
 with **/*.jar are needed. This needs some refactoring in the exclusion lists, 
 but the Lucene checks could be done by a macro in common-build, that allows 
 custom exclusion lists for specific modules.
 Currently, the strict checking is disabled for Solr, so the checker only 
 complains about missing classes but does not fail the build:
 {noformat}
 -check-forbidden-java-apis:
 [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
 [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt
 [forbidden-apis] Loading classes to check...
 [forbidden-apis] Scanning for API signatures and dependencies...
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
 the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden 
 API invocations (in 1.80s), 0 error(s).
 {noformat}
 I added almost all missing jars, but those do not seem to be in the solr part 
 of the source tree (i think they are only copied when building artifacts). 
 With making the whole thing per module, we can use the default classpath of 
 the module which makes it much easier.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4753) Make forbidden API checks per-module


[ 
https://issues.apache.org/jira/browse/LUCENE-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818609#comment-13818609
 ] 

ASF subversion and git services commented on LUCENE-4753:
-

Commit 1540575 from [~thetaphi] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1540575 ]

Merged revision(s) 1540573 from lucene/dev/trunk:
LUCENE-4753: Run forbidden-apis Ant task per module. This allows more 
improvements and prevents OOMs after the number of class files raised recently

 Make forbidden API checks per-module
 

 Key: LUCENE-4753
 URL: https://issues.apache.org/jira/browse/LUCENE-4753
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 4.6

 Attachments: LUCENE-4753.patch, LUCENE-4753.patch, LUCENE-4753.patch


 After the forbidden API checker was released separately from Lucene as a 
 Google Code project (forked and improved), including Maven support, the 
 checks on Lucene should be changed to work per-module.
 The reason for this is: The improved checker is more picky about e.g. 
 extending classes that are forbidden or overriding methods and calling 
 super.method() if they are on the forbidden signatures list. For these 
 checks, it is not enough to have the class files and the rt.jar, you need the 
 whole classpath. The forbidden APIs 1.0 now by default complains if classes 
 are missing from the classpath.
 It is very hard with the module architecture of Lucene/Solr, to make a 
 uber-classpath, instead the checks should be done per module, so the default 
 compile/test classpath of the module can be used and no crazy path statements 
 with **/*.jar are needed. This needs some refactoring in the exclusion lists, 
 but the Lucene checks could be done by a macro in common-build, that allows 
 custom exclusion lists for specific modules.
 Currently, the strict checking is disabled for Solr, so the checker only 
 complains about missing classes but does not fail the build:
 {noformat}
 -check-forbidden-java-apis:
 [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
 [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt
 [forbidden-apis] Reading API signatures: C:\Users\Uwe 
 Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\servlet-api.txt
 [forbidden-apis] Loading classes to check...
 [forbidden-apis] Scanning for API signatures and dependencies...
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
 the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] Scanned 2177 (and 1222 related) class file(s) for forbidden 
 API invocations (in 1.80s), 0 error(s).
 {noformat}
 I added almost all missing jars, but those do not seem to be in the solr part 
 of the source tree (i think they are only copied when building artifacts). 
 With making the whole thing per module, we can use the default classpath of 
 the module which makes it much easier.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4753) Make forbidden API checks per-module