[JENKINS-EA] Lucene-Solr-5.x-Linux (64bit/jdk1.9.0-ea-b85) - Build # 14480 - Failure!

2015-11-03 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/14480/
Java: 64bit/jdk1.9.0-ea-b85 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

3 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.cloud.TestAuthenticationFramework

Error Message:
16 threads leaked from SUITE scope at 
org.apache.solr.cloud.TestAuthenticationFramework: 1) Thread[id=2692, 
name=qtp1203989803-2692, state=TIMED_WAITING, 
group=TGRP-TestAuthenticationFramework] at sun.misc.Unsafe.park(Native 
Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:389) 
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:531)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.access$700(QueuedThreadPool.java:47)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:590) 
at java.lang.Thread.run(Thread.java:747)2) Thread[id=2781, 
name=OverseerStateUpdate-94800266310123529-127.0.0.1:57611_solr-n_04, 
state=TIMED_WAITING, group=Overseer state updater.] at 
java.lang.Thread.sleep(Native Method) at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:108)   
  at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:76)
 at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
 at 
org.apache.solr.cloud.Overseer$ClusterStateUpdater.amILeader(Overseer.java:411) 
at 
org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:143)   
  at java.lang.Thread.run(Thread.java:747)3) Thread[id=2782, 
name=OverseerCollectionConfigSetProcessor-94800266310123529-127.0.0.1:57611_solr-n_04,
 state=TIMED_WAITING, group=Overseer collection creation process.] at 
java.lang.Thread.sleep(Native Method) at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:108)   
  at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:76)
 at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
 at 
org.apache.solr.cloud.OverseerTaskProcessor.amILeader(OverseerTaskProcessor.java:355)
 at 
org.apache.solr.cloud.OverseerTaskProcessor.run(OverseerTaskProcessor.java:172) 
at java.lang.Thread.run(Thread.java:747)4) Thread[id=2695, 
name=org.eclipse.jetty.server.session.HashSessionManager@68167ed7Timer, 
state=TIMED_WAITING, group=TGRP-TestAuthenticationFramework] at 
sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:747)5) Thread[id=2691, 
name=qtp1203989803-2691, state=TIMED_WAITING, 
group=TGRP-TestAuthenticationFramework] at sun.misc.Unsafe.park(Native 
Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:389) 
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:531)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.access$700(QueuedThreadPool.java:47)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:590) 
at java.lang.Thread.run(Thread.java:747)6) Thread[id=2762, 
name=Thread-1022, state=WAITING, group=TGRP-TestAuthenticationFramework]
 at java.lang.Object.wait(Native Method) at 
java.lang.Object.wait(Object.java:516) at 
org.apache.solr.core.CloserThread.run(CoreContainer.java:1155)7) 
Thread[id=2783, 
name=OverseerHdfsCoreFailoverThread-94800266310123529-127.0.0.1:57611_solr-n_04,
 state=TIMED_WAITING, group=Overseer Hdfs SolrCore Failover Thread.] at 
java.lang.Thread.sleep(Native Method) at 
org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.run(OverseerAutoReplicaFailover

[jira] [Commented] (LUCENE-6879) Allow to define custom CharTokenizer using Java 8 Lambdas/Method references

2015-11-03 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986883#comment-14986883
 ] 

Uwe Schindler commented on LUCENE-6879:
---

bq. I always thought hell was about slow and endless suffering?

Ähm, yes :-)

But this video tells you different: https://www.youtube.com/watch?v=Uqa8MFSXZHM
If you need to burn fat, fast as hell: 
http://www.amazon.com/ULTIMATE-CUTS-SECRETS-English-Edition-ebook/dp/B00HMQS8TA

> Allow to define custom CharTokenizer using Java 8 Lambdas/Method references
> ---
>
> Key: LUCENE-6879
> URL: https://issues.apache.org/jira/browse/LUCENE-6879
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: Trunk
>Reporter: Uwe Schindler
> Fix For: Trunk
>
> Attachments: LUCENE-6879.patch
>
>
> As a followup from LUCENE-6874, I thought about how to generate custom 
> CharTokenizers wthout subclassing. I had this quite often and I was a bit 
> annoyed, that you had to create a subclass every time.
> This issue is using the pattern like ThreadLocal or many collection methods 
> in Java 8: You have the (abstract) base class and you define a factory method 
> named {{fromXxxPredicate}} (like {{ThreadLocal.fromInitial(() -> value}}).
> {code:java}
> public static CharTokenizer fromPredicate(java.util.function.IntPredicate 
> predicate)
> {code}
> This would allow to define a new CharTokenizer with a single line statement 
> using any predicate:
> {code:java}
> // long variant with lambda:
> Tokenizer tok = CharTokenizer.fromTokenCharPredicate(c -> 
> !UCharacter.isUWhiteSpace(c));
> // method reference for separator char predicate + normalization by 
> uppercasing:
> Tokenizer tok = 
> CharTokenizer.fromSeparatorCharPredicate(UCharacter::isUWhiteSpace, 
> Character::toUpperCase);
> // method reference to custom function:
> private boolean myTestFunction(int c) {
>  return (cracy condition);
> }
> Tokenizer tok = CharTokenizer.fromTokenCharPredicate(this::myTestFunction);
> {code}
> I know this would not help Solr users that want to define the Tokenizer in a 
> config file, but for real Lucene users the Java 8-like way would be the 
> following static method on CharTokenizer without subclassing. It is fast as 
> hell, as it is just a reference to a method and Java 8 is optimized for that.
> The inverted factories {{fromSeparatorCharPredicate()}} are provided to allow 
> quick definition without lambdas using method references. In lots of cases, 
> like WhitespaceTokenizer, predicates are on the separator chars 
> ({{isWhitespace(int)}}, so using the 2nd set of factories you can define them 
> without the counter-intuitive negation. Internally it just uses 
> {{Predicate#negate()}}.
> The factories also allow to give the normalization function, e.g. to 
> Lowercase, you may just give {{Character::toLowerCase}} as 
> {{IntUnaryOperator}} reference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6879) Allow to define custom CharTokenizer using Java 8 Lambdas/Method references

2015-11-03 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-6879:
--
Description: 
As a followup from LUCENE-6874, I thought about how to generate custom 
CharTokenizers wthout subclassing. I had this quite often and I was a bit 
annoyed, that you had to create a subclass every time.

This issue is using the pattern like ThreadLocal or many collection methods in 
Java 8: You have the (abstract) base class and you define a factory method 
named {{fromXxxPredicate}} (like {{ThreadLocal.withInitial(() -> value}}).

{code:java}
public static CharTokenizer 
fromTokenCharPredicate(java.util.function.IntPredicate predicate)
{code}

This would allow to define a new CharTokenizer with a single line statement 
using any predicate:

{code:java}
// long variant with lambda:
Tokenizer tok = CharTokenizer.fromTokenCharPredicate(c -> 
!UCharacter.isUWhiteSpace(c));

// method reference for separator char predicate + normalization by uppercasing:
Tokenizer tok = 
CharTokenizer.fromSeparatorCharPredicate(UCharacter::isUWhiteSpace, 
Character::toUpperCase);

// method reference to custom function:
private boolean myTestFunction(int c) {
 return (cracy condition);
}
Tokenizer tok = CharTokenizer.fromTokenCharPredicate(this::myTestFunction);
{code}

I know this would not help Solr users that want to define the Tokenizer in a 
config file, but for real Lucene users this Java 8-like way would be easy and 
elegant to use. It is fast as hell, as it is just a reference to a method and 
Java 8 is optimized for that.

The inverted factories {{fromSeparatorCharPredicate()}} are provided to allow 
quick definition without lambdas using method references. In lots of cases, 
like WhitespaceTokenizer, predicates are on the separator chars 
({{isWhitespace(int)}}, so using the 2nd set of factories you can define them 
without the counter-intuitive negation. Internally it just uses 
{{Predicate#negate()}}.

The factories also allow to give the normalization function, e.g. to Lowercase, 
you may just give {{Character::toLowerCase}} as {{IntUnaryOperator}} reference.

  was:
As a followup from LUCENE-6874, I thought about how to generate custom 
CharTokenizers wthout subclassing. I had this quite often and I was a bit 
annoyed, that you had to create a subclass every time.

This issue is using the pattern like ThreadLocal or many collection methods in 
Java 8: You have the (abstract) base class and you define a factory method 
named {{fromXxxPredicate}} (like {{ThreadLocal.fromInitial(() -> value}}).

{code:java}
public static CharTokenizer fromPredicate(java.util.function.IntPredicate 
predicate)
{code}

This would allow to define a new CharTokenizer with a single line statement 
using any predicate:

{code:java}
// long variant with lambda:
Tokenizer tok = CharTokenizer.fromTokenCharPredicate(c -> 
!UCharacter.isUWhiteSpace(c));

// method reference for separator char predicate + normalization by uppercasing:
Tokenizer tok = 
CharTokenizer.fromSeparatorCharPredicate(UCharacter::isUWhiteSpace, 
Character::toUpperCase);

// method reference to custom function:
private boolean myTestFunction(int c) {
 return (cracy condition);
}
Tokenizer tok = CharTokenizer.fromTokenCharPredicate(this::myTestFunction);
{code}

I know this would not help Solr users that want to define the Tokenizer in a 
config file, but for real Lucene users the Java 8-like way would be the 
following static method on CharTokenizer without subclassing. It is fast as 
hell, as it is just a reference to a method and Java 8 is optimized for that.

The inverted factories {{fromSeparatorCharPredicate()}} are provided to allow 
quick definition without lambdas using method references. In lots of cases, 
like WhitespaceTokenizer, predicates are on the separator chars 
({{isWhitespace(int)}}, so using the 2nd set of factories you can define them 
without the counter-intuitive negation. Internally it just uses 
{{Predicate#negate()}}.

The factories also allow to give the normalization function, e.g. to Lowercase, 
you may just give {{Character::toLowerCase}} as {{IntUnaryOperator}} reference.


> Allow to define custom CharTokenizer using Java 8 Lambdas/Method references
> ---
>
> Key: LUCENE-6879
> URL: https://issues.apache.org/jira/browse/LUCENE-6879
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: Trunk
>Reporter: Uwe Schindler
> Fix For: Trunk
>
> Attachments: LUCENE-6879.patch
>
>
> As a followup from LUCENE-6874, I thought about how to generate custom 
> CharTokenizers wthout subclassing. I had this quite often and I was a bit 
> annoyed, that you had to create a subclass every time.
> This issue is using the pattern like Thre

Re: TopDocs.merge PriorityQueue usage

2015-11-03 Thread Adrien Grand
Hi Daniel,

I saw the JIRA issue, thanks! Unfortunately Mike's benchmarks won't help as
they try to track performance of indexing/searching a single index, while
the code that you modified is about merging TopDocs from several indices.
When I said micro benchmarks, I don't think we need anything fancy like
JMH, I would be fine with just some java code that creates some random
TopDocs instances and measures how long was spent running TopDocs.merge on
them with System.nanotime. If we want to avoid the pitfalls of benchmarks,
something else that would be interesting to know is how many times the
PriorityQueue.lessThan method is called on trunk vs. your patch, I suspect
it will be less with your patch.

Le mar. 3 nov. 2015 à 00:25, Daniel Jeliński  a
écrit :

> Hi Adrien,
> LUCENE-6878 created.
> This method is called by some of IndexSearcher's search overrides. I'm
> going to try out Mike's benchmark
>  first, and learn to write
> micro benchmarks in the meantime.
> Regards,
> Daniel
>
> 2015-11-02 0:08 GMT+01:00 Adrien Grand :
>
>> Hi Daniel,
>>
>> Your patch could indeed make things more efficient when merging top hits
>> from many shards, and the code is still easy to read, so +1 to create a
>> JIRA issue. I'm not surprised that ant test did not get faster as we rarely
>> call this method when running tests, maybe you can try to write a simple
>> micro benchmark from randomly generated TopDocs instances?
>>
>>
>> Le dim. 1 nov. 2015 à 23:39, Daniel Jeliński  a
>> écrit :
>>
>>> Hello all,
>>> The function TopDocs.merge uses PriorityQueue in a pattern: pop, update
>>> value (ref.hitIndex++), add. JavaDocs for PriorityQueue.updateTop
>>> 
>>> say that using this function instead should be at least twice as fast.
>>> Would a patch like the one attached be acceptable? Should I create a JIRA
>>> issue for it?
>>> I tried comparing the time taken to run ant test before and after the
>>> patch was applied, but apparently it was affected by random factors more
>>> than it was affected by the patch, so I don't have any performance numbers
>>> to show if / how much it changed. Is there any standard way of benchmarking?
>>> Regards,
>>> Daniel
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>


[jira] [Created] (LUCENE-6881) Cutover all BKD tree implementations to the codec

2015-11-03 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-6881:
--

 Summary: Cutover all BKD tree implementations to the codec
 Key: LUCENE-6881
 URL: https://issues.apache.org/jira/browse/LUCENE-6881
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: Trunk


This is phase 4 for enabling indexing dimensional values in Lucene
... follow-on from LUCENE-6861.

This issue removes the 3 pre-existing specialized experimental BKD
implementations (BKD* in sandbox module for 2D lat/lon geo, BKD3D* in
spatial3d module for 3D x/y/z geo, and range tree in sandbox module)
and instead switches over to having the codec index the dimensional
values.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6871) Move SpanQueries out of .spans package

2015-11-03 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987031#comment-14987031
 ] 

Alan Woodward commented on LUCENE-6871:
---

Right, but SpanNot and SpanOr will still be separate queries from BooleanQuery. 
 Moving packages just means that some queries (like Term and Phrase) that can 
conceptually be both a standard query and a span query can actually be that.

> Move SpanQueries out of .spans package
> --
>
> Key: LUCENE-6871
> URL: https://issues.apache.org/jira/browse/LUCENE-6871
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: Trunk, 5.4
>Reporter: Alan Woodward
> Attachments: LUCENE-6871.patch
>
>
> SpanQueries are now essentially the same as a standard query, restricted to a 
> single field and with an extra scorer type returned by getSpans().  There are 
> a number of existing queries that fit this contract, including TermQuery and 
> PhraseQuery, and it should be possible to make them SpanQueries as well 
> without impacting their existing performance.  However, we can't do this 
> while SpanQuery and its associated Weight and Spans classes are in their own 
> package.
> I'd like to remove the o.a.l.search.spans package entirely, in a few stages:
> 1) Move SpanQuery, SpanWeight, Spans, SpanCollector and FilterSpans to 
> o.a.l.search
> 2) Remove SpanTermQuery and merge its functionality into TermQuery
> 3) Move SpanNear, SpanNot, SpanOr and SpanMultiTermQueryWrapper to 
> o.a.l.search
> 4) Move the remaining SpanQueries to the queries package
> Then we can look at, eg, making PhraseQuery a SpanQuery, removing 
> SpanMTQWrapper and making MultiTermQuery a SpanQuery, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6881) Cutover all BKD tree implementations to the codec

2015-11-03 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-6881:
---
Attachment: LUCENE-6881.patch

Initial patch, plenty of nocommits too, and some tests fail, but I think it's 
close ...

> Cutover all BKD tree implementations to the codec
> -
>
> Key: LUCENE-6881
> URL: https://issues.apache.org/jira/browse/LUCENE-6881
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: Trunk
>
> Attachments: LUCENE-6881.patch
>
>
> This is phase 4 for enabling indexing dimensional values in Lucene
> ... follow-on from LUCENE-6861.
> This issue removes the 3 pre-existing specialized experimental BKD
> implementations (BKD* in sandbox module for 2D lat/lon geo, BKD3D* in
> spatial3d module for 3D x/y/z geo, and range tree in sandbox module)
> and instead switches over to having the codec index the dimensional
> values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6878) TopDocs.merge should use updateTop instead of pop / add

2015-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987082#comment-14987082
 ] 

Michael McCandless commented on LUCENE-6878:


+1

I know we are discussing how to benchmark this change but I don't think that's 
needed before committing ... this is a good change ... it's only needed to 
satisfy curiosity :)

> TopDocs.merge should use updateTop instead of pop / add
> ---
>
> Key: LUCENE-6878
> URL: https://issues.apache.org/jira/browse/LUCENE-6878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: Trunk
>Reporter: Daniel Jelinski
>Assignee: Adrien Grand
>Priority: Trivial
> Attachments: LUCENE-6878.patch
>
>
> The function TopDocs.merge uses PriorityQueue in a pattern: pop, update value 
> (ref.hitIndex++), add. JavaDocs for PriorityQueue.updateTop say that using 
> this function instead should be at least twice as fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8035) Move solr/webapp to solr/server/solr-webapp

2015-11-03 Thread Upayavira (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987086#comment-14987086
 ] 

Upayavira commented on SOLR-8035:
-

Just to note, I'm mulling on whether we might actually *need* a build process.

We have two conflicting needs:
 * development time: change a file - commit it
 * build time: minify/merge/etc

The latter currently isn't being done at all. There are many JS tools out there 
that do the equivalent of Ivy dependency management, and others that minify 
CSS/JS/HTML and then merge those files into a smaller number of files, such 
that the overall HTTP payload of the UI could be substantially reduced. For 
local loads this has never been an issue, but I have been to visit hosted Solr 
servers where the UI fails to load due to poor network connectivity.

Ideally, "ant run" would run the UI in place, whereas "ant server" would 
execute a build process and produce minfied files/etc.

> Move solr/webapp to solr/server/solr-webapp
> ---
>
> Key: SOLR-8035
> URL: https://issues.apache.org/jira/browse/SOLR-8035
> Project: Solr
>  Issue Type: Bug
>  Components: UI
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
>Priority: Critical
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-8035.patch
>
>
> Let's move solr/webapp *source* files to their final actual distro 
> destination.  This facilitates folks editing the UI in real-time (save 
> change, refresh in browser) rather than having to "stop, ant server, restart" 
> to see a change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6659) Remove IndexWriterConfig.get/setMaxThreadStates

2015-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987090#comment-14987090
 ] 

Michael McCandless commented on LUCENE-6659:


bq. Going forward, how can we be limit the number of threads that can write to 
an index?

You should fix your application to limit the threads that are allowed to be 
inside IndexWriter concurrently ... e.g. use Semaphore.

> Remove IndexWriterConfig.get/setMaxThreadStates
> ---
>
> Key: LUCENE-6659
> URL: https://issues.apache.org/jira/browse/LUCENE-6659
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.3, Trunk
>
> Attachments: LUCENE-6659.patch
>
>
> Ever since LUCENE-5644, IndexWriter will aggressively reuse its internal 
> thread states across threads, whenever one is free.
> I think this means we can safely remove the sneaky maxThreadStates limit 
> (default 8) that we have today: IW will only ever allocate as many thread 
> states as there are actual concurrent threads running through it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6847) Test3DPointField and TestBKDTree failures: .bkd file can't be deleted

2015-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987091#comment-14987091
 ] 

Michael McCandless commented on LUCENE-6847:


Thanks [~steve_rowe], I'll look.

> Test3DPointField and TestBKDTree failures: .bkd file can't be deleted
> -
>
> Key: LUCENE-6847
> URL: https://issues.apache.org/jira/browse/LUCENE-6847
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/sandbox, modules/spatial3d
>Reporter: Steve Rowe
>Assignee: Michael McCandless
> Fix For: Trunk
>
>
> My Jenkins found seeds for {{TestGeo3DPointField}} and {{TestBKDTree}} tests 
> that cause them to fail reliably, because a {{.bkd}} file can't be deleted 
> because "a virus scanner has it open".
> {noformat}
>[junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField
> [...] 
>[junit4]   2> thg 10 19, 2015 9:18:51 CH 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
>[junit4]   2> WARNING: Uncaught exception in thread: Thread[Lucene Merge 
> Thread #0,5,TGRP-TestGeo3DPointField]
>[junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: 
> java.io.IOException: cannot delete file: _41767748444.sort, a virus scanner 
> has it open
>[junit4]   2>at 
> __randomizedtesting.SeedInfo.seed([61ED8059BBF9CF1D]:0)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:668)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:648)
>[junit4]   2> Caused by: java.io.IOException: cannot delete file: 
> _41767748444.sort, a virus scanner has it open
>[junit4]   2>at 
> org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:523)
>[junit4]   2>at 
> org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:471)
>[junit4]   2>at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.deleteFile(LockValidatingDirectoryWrapper.java:38)
>[junit4]   2>at 
> org.apache.lucene.store.FilterDirectory.deleteFile(FilterDirectory.java:62)
>[junit4]   2>at 
> org.apache.lucene.store.TrackingDirectoryWrapper.deleteFile(TrackingDirectoryWrapper.java:37)
>[junit4]   2>at 
> org.apache.lucene.bkdtree3d.BKD3DTreeWriter.sort(BKD3DTreeWriter.java:344)
>[junit4]   2>at 
> org.apache.lucene.bkdtree3d.BKD3DTreeWriter.finish(BKD3DTreeWriter.java:398)
>[junit4]   2>at 
> org.apache.lucene.bkdtree3d.Geo3DDocValuesConsumer.addBinaryField(Geo3DDocValuesConsumer.java:131)
>[junit4]   2>at 
> org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:116)
>[junit4]   2>at 
> org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:333)
>[junit4]   2>at 
> org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:185)
>[junit4]   2>at 
> org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:150)
>[junit4]   2>at 
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4054)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3634)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
> [...]
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomBig 
> -Dtests.seed=61ED8059BBF9CF1D -Dtests.multiplier=2 -Dtests.nightly=true 
> -Dtests.slow=true 
> -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt 
> -Dtests.locale=vi_VN -Dtests.timezone=US/Arizona -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   20.6s | TestGeo3DPointField.testRandomBig <<<
>[junit4]> Throwable #1: java.io.IOException: background merge hit 
> exception: _0(6.0.0):c262144/2626:delGen=1 _1(6.0.0):c262144/773:delGen=1 
> _2(6.0.0):c262144/492:delGen=1 _3(6.0.0):c142742/106:delGen=1 into _4 
> [maxNumSegments=1]
>[junit4]>at 
> __randomizedtesting.SeedInfo.seed([61ED8059BBF9CF1D:E6BAFDD62AA0B39D]:0)
>[junit4]>at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1765)
>[junit4]>at 
> org.apache.lucene.index.IndexWriter.forceMerge(Index

[jira] [Commented] (LUCENE-6847) Test3DPointField and TestBKDTree failures: .bkd file can't be deleted

2015-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987095#comment-14987095
 ] 

ASF subversion and git services commented on LUCENE-6847:
-

Commit 1712253 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1712253 ]

LUCENE-6847: no virus checker

> Test3DPointField and TestBKDTree failures: .bkd file can't be deleted
> -
>
> Key: LUCENE-6847
> URL: https://issues.apache.org/jira/browse/LUCENE-6847
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/sandbox, modules/spatial3d
>Reporter: Steve Rowe
>Assignee: Michael McCandless
> Fix For: Trunk
>
>
> My Jenkins found seeds for {{TestGeo3DPointField}} and {{TestBKDTree}} tests 
> that cause them to fail reliably, because a {{.bkd}} file can't be deleted 
> because "a virus scanner has it open".
> {noformat}
>[junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField
> [...] 
>[junit4]   2> thg 10 19, 2015 9:18:51 CH 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
>[junit4]   2> WARNING: Uncaught exception in thread: Thread[Lucene Merge 
> Thread #0,5,TGRP-TestGeo3DPointField]
>[junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: 
> java.io.IOException: cannot delete file: _41767748444.sort, a virus scanner 
> has it open
>[junit4]   2>at 
> __randomizedtesting.SeedInfo.seed([61ED8059BBF9CF1D]:0)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:668)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:648)
>[junit4]   2> Caused by: java.io.IOException: cannot delete file: 
> _41767748444.sort, a virus scanner has it open
>[junit4]   2>at 
> org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:523)
>[junit4]   2>at 
> org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:471)
>[junit4]   2>at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.deleteFile(LockValidatingDirectoryWrapper.java:38)
>[junit4]   2>at 
> org.apache.lucene.store.FilterDirectory.deleteFile(FilterDirectory.java:62)
>[junit4]   2>at 
> org.apache.lucene.store.TrackingDirectoryWrapper.deleteFile(TrackingDirectoryWrapper.java:37)
>[junit4]   2>at 
> org.apache.lucene.bkdtree3d.BKD3DTreeWriter.sort(BKD3DTreeWriter.java:344)
>[junit4]   2>at 
> org.apache.lucene.bkdtree3d.BKD3DTreeWriter.finish(BKD3DTreeWriter.java:398)
>[junit4]   2>at 
> org.apache.lucene.bkdtree3d.Geo3DDocValuesConsumer.addBinaryField(Geo3DDocValuesConsumer.java:131)
>[junit4]   2>at 
> org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:116)
>[junit4]   2>at 
> org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:333)
>[junit4]   2>at 
> org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:185)
>[junit4]   2>at 
> org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:150)
>[junit4]   2>at 
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4054)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3634)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
> [...]
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomBig 
> -Dtests.seed=61ED8059BBF9CF1D -Dtests.multiplier=2 -Dtests.nightly=true 
> -Dtests.slow=true 
> -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt 
> -Dtests.locale=vi_VN -Dtests.timezone=US/Arizona -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   20.6s | TestGeo3DPointField.testRandomBig <<<
>[junit4]> Throwable #1: java.io.IOException: background merge hit 
> exception: _0(6.0.0):c262144/2626:delGen=1 _1(6.0.0):c262144/773:delGen=1 
> _2(6.0.0):c262144/492:delGen=1 _3(6.0.0):c142742/106:delGen=1 into _4 
> [maxNumSegments=1]
>[junit4]>at 
> __randomizedtesting.SeedInfo.seed([61ED8059BBF9CF1D:E6BAFDD62AA0B39D]:0)
>[junit4]>at 
> org.apache.lucene.index.IndexWriter.fo

[jira] [Commented] (LUCENE-6847) Test3DPointField and TestBKDTree failures: .bkd file can't be deleted

2015-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987105#comment-14987105
 ] 

Michael McCandless commented on LUCENE-6847:


I committed a fix.

> Test3DPointField and TestBKDTree failures: .bkd file can't be deleted
> -
>
> Key: LUCENE-6847
> URL: https://issues.apache.org/jira/browse/LUCENE-6847
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/sandbox, modules/spatial3d
>Reporter: Steve Rowe
>Assignee: Michael McCandless
> Fix For: Trunk
>
>
> My Jenkins found seeds for {{TestGeo3DPointField}} and {{TestBKDTree}} tests 
> that cause them to fail reliably, because a {{.bkd}} file can't be deleted 
> because "a virus scanner has it open".
> {noformat}
>[junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField
> [...] 
>[junit4]   2> thg 10 19, 2015 9:18:51 CH 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
>[junit4]   2> WARNING: Uncaught exception in thread: Thread[Lucene Merge 
> Thread #0,5,TGRP-TestGeo3DPointField]
>[junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: 
> java.io.IOException: cannot delete file: _41767748444.sort, a virus scanner 
> has it open
>[junit4]   2>at 
> __randomizedtesting.SeedInfo.seed([61ED8059BBF9CF1D]:0)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:668)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:648)
>[junit4]   2> Caused by: java.io.IOException: cannot delete file: 
> _41767748444.sort, a virus scanner has it open
>[junit4]   2>at 
> org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:523)
>[junit4]   2>at 
> org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:471)
>[junit4]   2>at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.deleteFile(LockValidatingDirectoryWrapper.java:38)
>[junit4]   2>at 
> org.apache.lucene.store.FilterDirectory.deleteFile(FilterDirectory.java:62)
>[junit4]   2>at 
> org.apache.lucene.store.TrackingDirectoryWrapper.deleteFile(TrackingDirectoryWrapper.java:37)
>[junit4]   2>at 
> org.apache.lucene.bkdtree3d.BKD3DTreeWriter.sort(BKD3DTreeWriter.java:344)
>[junit4]   2>at 
> org.apache.lucene.bkdtree3d.BKD3DTreeWriter.finish(BKD3DTreeWriter.java:398)
>[junit4]   2>at 
> org.apache.lucene.bkdtree3d.Geo3DDocValuesConsumer.addBinaryField(Geo3DDocValuesConsumer.java:131)
>[junit4]   2>at 
> org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:116)
>[junit4]   2>at 
> org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:333)
>[junit4]   2>at 
> org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:185)
>[junit4]   2>at 
> org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:150)
>[junit4]   2>at 
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4054)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3634)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
>[junit4]   2>at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
> [...]
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomBig 
> -Dtests.seed=61ED8059BBF9CF1D -Dtests.multiplier=2 -Dtests.nightly=true 
> -Dtests.slow=true 
> -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt 
> -Dtests.locale=vi_VN -Dtests.timezone=US/Arizona -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   20.6s | TestGeo3DPointField.testRandomBig <<<
>[junit4]> Throwable #1: java.io.IOException: background merge hit 
> exception: _0(6.0.0):c262144/2626:delGen=1 _1(6.0.0):c262144/773:delGen=1 
> _2(6.0.0):c262144/492:delGen=1 _3(6.0.0):c142742/106:delGen=1 into _4 
> [maxNumSegments=1]
>[junit4]>at 
> __randomizedtesting.SeedInfo.seed([61ED8059BBF9CF1D:E6BAFDD62AA0B39D]:0)
>[junit4]>at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1765)
>[junit4]>at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:17

[jira] [Commented] (SOLR-8224) Wild card query do not return result

2015-11-03 Thread Mridul Srvastava (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987114#comment-14987114
 ] 

Mridul Srvastava commented on SOLR-8224:


Thank you Erick Erickson for your valuable response,

I got to know the behaviour of "PorterStemFilter" after analysing some token 
via  admin/analysis. After removing  "PorterStemFilter" it is working fine.

Please let me know are filters  not apply on wild card query in solr.

> Wild card query do not return result 
> -
>
> Key: SOLR-8224
> URL: https://issues.apache.org/jira/browse/SOLR-8224
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.9
>Reporter: Mridul Srvastava
>
> Hi, 
> My search query returns below result :
> fulladdress:(cypress* reserve) - "numFound": 217
> but
> fulladdress:(cypress* reserve*) - "numFound": 0
> fulladdress:(reserve*) - "numFound": 0
> Configuration in Schema.xml  is like below:
> 
>   
> 
>   
>   
>
>words="stopwords.txt" />
>  generateWordParts="1" 
>   generateNumberParts="1" 
>   catenateWords="0" 
>   catenateNumbers="0" 
>   catenateAll="0" 
>   splitOnCaseChange="0"
>   splitOnNumerics="0"
>   stemEnglishPossessive="1"
>   preserveOriginal="1"
>   />
>protected="protwords.txt"/>
> 
>   
>   
> 
>   
>   
>
>words="stopwords.txt" />
>  generateWordParts="1" 
>   generateNumberParts="1" 
>   catenateWords="0" 
>   catenateNumbers="0" 
>   catenateAll="0" 
>   splitOnCaseChange="0"
>   splitOnNumerics="0"
>   stemEnglishPossessive="1"
>   preserveOriginal="0"
>   />
>protected="protwords.txt"/>
> 
>   
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6882) java.lang.NoClassDefFoundError: org/apache/lucene/codecs/lucene54/Lucene54Codec

2015-11-03 Thread Martin Gainty (JIRA)
Martin Gainty created LUCENE-6882:
-

 Summary: java.lang.NoClassDefFoundError: 
org/apache/lucene/codecs/lucene54/Lucene54Codec
 Key: LUCENE-6882
 URL: https://issues.apache.org/jira/browse/LUCENE-6882
 Project: Lucene - Core
  Issue Type: Bug
  Components: -tools
Affects Versions: 5.3
 Environment: maven 3.2.5
JDK 1.8
Reporter: Martin Gainty


---
Test set: org.apache.lucene.analysis.ar.TestArabicAnalyzer
---
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.159 sec <<< 
FAILURE! - in org.apache.lucene.analysis.ar.TestArabicAnalyzer
org.apache.lucene.analysis.ar.TestArabicAnalyzer  Time elapsed: 0.156 sec  <<< 
ERROR!
java.lang.NoClassDefFoundError: org/apache/lucene/codecs/lucene54/Lucene54Codec
at 
org.apache.lucene.util.LuceneTestCase.(LuceneTestCase.java:606)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:581)
Caused by: java.lang.ClassNotFoundException: 
org.apache.lucene.codecs.lucene54.Lucene54Codec
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at 
org.apache.lucene.util.LuceneTestCase.(LuceneTestCase.java:606)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:581)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8199) Text specifying which UI a user is looking at is incorrect

2015-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987122#comment-14987122
 ] 

ASF subversion and git services commented on SOLR-8199:
---

Commit 1712258 from [~upayavira] in branch 'dev/trunk'
[ https://svn.apache.org/r1712258 ]

SOLR-8199 add word 'try' to new UI link

> Text specifying which UI a user is looking at is incorrect
> --
>
> Key: SOLR-8199
> URL: https://issues.apache.org/jira/browse/SOLR-8199
> Project: Solr
>  Issue Type: Bug
>  Components: UI
>Reporter: Youssef Chaker
>Assignee: Upayavira
>Priority: Trivial
> Attachments: Screen_Shot_2015-10-24_at_10_21_08_AM.png, 
> Screen_Shot_2015-10-24_at_10_21_41_AM.png
>
>
> In the top right corner of the admin UI, a text is available to indicate 
> whether the user is looking at the original UI or the new one.
> But it currently says "New UI" for http://localhost:8983/solr/#/ and 
> "Original UI" for http://localhost:8983/solr/index.html#/ when it should be 
> the other way around.
> This issue is tied to #SOLR-7666



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7666) Umbrella ticket for Angular JS post-5.2.1 issues

2015-11-03 Thread Upayavira (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Upayavira updated SOLR-7666:

Attachment: SOLR-7666-5.patch

Small patch that:
 * analysis: makes verbose checkbox update query string
 * query: executes query if request param contains query text

> Umbrella ticket for Angular JS post-5.2.1 issues
> 
>
> Key: SOLR-7666
> URL: https://issues.apache.org/jira/browse/SOLR-7666
> Project: Solr
>  Issue Type: New Feature
>  Components: UI
>Affects Versions: 5.3, Trunk
>Reporter: Erick Erickson
>Assignee: Upayavira
> Attachments: SOLR-7666-3.patch, SOLR-7666-4.patch, SOLR-7666-5.patch, 
> SOLR-7666-part2.patch, SOLR-7666-part2.patch, SOLR-7666.patch, 
> admin-ui-7666.zip
>
>
> As of Solr 5.2.1, there's a new admin UI available that has been written 
> almost entirely by Upayavira (thanks!) over the last several months. It's 
> written in Angular JS with an eye towards enhancement/maintainability. The 
> default UI is still the old version, but you can access the new version by 
> going to http:///solr/index.html. There are a couple of fixes 
> between 5.2.0 and 5.2.1, so please use either a fresh 5x checkout, trunk, or 
> 5.2.1
> The expectation is that in Solr 5.3, the new code will become the default 
> with the old UI deprecated and eventually removed.
> So it would be a great help if volunteers could give the new UI a spin. It 
> should look much the same as the current one at the start, but evolve into 
> something much more interesting and more cloud-friendly. Of course all the 
> new UI code will always be available on trunk/6.0 too, and the up-to-date 
> code will always be on both the trunk and 5x branches.
> Please provide feedback on the user's (or dev) lists about anything you find 
> that doesn't work, or enhancements you'd like to see (or, even better, 
> contribute). If you raise a JIRA, please link it to this one so I can keep 
> track of what needs to be committed. If linking JIRAs is a mystery just add a 
> comment to this JIRA referencing the new JIRA and we can take care of it.
> Please do _not_ attach patches to this JIRA, it'll be much easier to keep 
> track of everything if the patches are attached to sub-JIRAs.
> And a big thanks to Upayavira for this work!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7858) Make Angular UI default

2015-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987137#comment-14987137
 ] 

ASF subversion and git services commented on SOLR-7858:
---

Commit 1712260 from [~upayavira] in branch 'dev/trunk'
[ https://svn.apache.org/r1712260 ]

SOLR-7858 Make default URL=/

> Make Angular UI default
> ---
>
> Key: SOLR-7858
> URL: https://issues.apache.org/jira/browse/SOLR-7858
> Project: Solr
>  Issue Type: Bug
>  Components: web gui
>Reporter: Upayavira
>Assignee: Upayavira
>Priority: Minor
> Attachments: SOLR-7858-2.patch, SOLR-7858-3.patch, SOLR-7858-4.patch, 
> SOLR-7858-fix.patch, SOLR-7858.patch, new ui link.png, original UI link.png
>
>
> Angular UI is very close to feature complete. Once SOLR-7856 is dealt with, 
> it should function well in most cases. I propose that, as soon as 5.3 has 
> been released, we make the Angular UI default, ready for the 5.4 release. We 
> can then fix any more bugs as they are found, but more importantly start 
> working on the features that were the reason for doing this work in the first 
> place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8210) Admin UI menu does not scroll

2015-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987143#comment-14987143
 ] 

ASF subversion and git services commented on SOLR-8210:
---

Commit 1712263 from [~upayavira] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1712263 ]

SOLR-8210 Scroll menu when browser window is small

> Admin UI menu does not scroll
> -
>
> Key: SOLR-8210
> URL: https://issues.apache.org/jira/browse/SOLR-8210
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 5.3
>Reporter: Upayavira
>Assignee: Upayavira
>Priority: Minor
>
> When you view the UI on a projector - or a small screen (e.g. with dev tools 
> open), some menu options might be obscured at the bottom of the screen. The 
> menu doesn't scroll though, meaning the only way to get to these entries is 
> to use another screen, or change the text size in the browser temporarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8210) Admin UI menu does not scroll

2015-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987140#comment-14987140
 ] 

ASF subversion and git services commented on SOLR-8210:
---

Commit 1712262 from [~upayavira] in branch 'dev/trunk'
[ https://svn.apache.org/r1712262 ]

SOLR-8210 Scroll menu when browser window is small

> Admin UI menu does not scroll
> -
>
> Key: SOLR-8210
> URL: https://issues.apache.org/jira/browse/SOLR-8210
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 5.3
>Reporter: Upayavira
>Assignee: Upayavira
>Priority: Minor
>
> When you view the UI on a projector - or a small screen (e.g. with dev tools 
> open), some menu options might be obscured at the bottom of the screen. The 
> menu doesn't scroll though, meaning the only way to get to these entries is 
> to use another screen, or change the text size in the browser temporarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6878) TopDocs.merge should use updateTop instead of pop / add

2015-11-03 Thread Toke Eskildsen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987149#comment-14987149
 ] 

Toke Eskildsen commented on LUCENE-6878:


In light of my own recent experiments with PriorityQueue (SOLR-6828), I'll note 
that microbenchmarks are exceedingly simple to screw up, especially in Java. I 
ended up doing comparative testing with pre-generated test inputs, multiple 
runs, discarding the first runs, alternating between the implementation 
multiple times and removing outliers. And the results are still not very stable.

> TopDocs.merge should use updateTop instead of pop / add
> ---
>
> Key: LUCENE-6878
> URL: https://issues.apache.org/jira/browse/LUCENE-6878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: Trunk
>Reporter: Daniel Jelinski
>Assignee: Adrien Grand
>Priority: Trivial
> Attachments: LUCENE-6878.patch
>
>
> The function TopDocs.merge uses PriorityQueue in a pattern: pop, update value 
> (ref.hitIndex++), add. JavaDocs for PriorityQueue.updateTop say that using 
> this function instead should be at least twice as fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8139) Provide a way for the admin UI to utilize managed schema functionality

2015-11-03 Thread Upayavira (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Upayavira updated SOLR-8139:

Attachment: SOLR-8139.patch

Managed schema patch ready to apply - only trivial tweaks since last patch. 

This diff has been generated correctly via SVN to clearly show changes on top 
of file moves.

> Provide a way for the admin UI to utilize managed schema functionality
> --
>
> Key: SOLR-8139
> URL: https://issues.apache.org/jira/browse/SOLR-8139
> Project: Solr
>  Issue Type: Improvement
>  Components: UI
>Reporter: Erick Erickson
>Assignee: Upayavira
> Attachments: SOLR-8139.patch, SOLR-8139.patch, 
> add-field-with-errors.png, add-field-with-omit-open.png, add-field.png
>
>
> See the discussion at the related SOLR-8131. The suggestion there is to make 
> managed schema the default in 6.0. To make the new-user experience much 
> smoother in that setup, it would be great if the admin UI had a simple 
> wrapper around the managed schema API.
> It would be a fine thing to have a way of bypassing the whole "find the magic 
> config set, edit it in your favorite editor, figure out how to upload it via 
> zkcli then reload the collection" current paradigm and instead be able to 
> update the schema via the admin UI.
> This should bypass the issues with uploading arbitrary XML to the server that 
> shot down one of the other attempts to edit the schema from the admin UI.
> This is mostly a marker. This could be a significant differentiator between 
> the old and new admin UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6875) New Serbian Filter

2015-11-03 Thread Nikola Smolenski (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987157#comment-14987157
 ] 

Nikola Smolenski commented on LUCENE-6875:
--

This is so ubiquitous that I can't find a reference. The official orthography 
of Serbian lists the two alphabets, but doesn't explicitly specify how to 
convert between them. You can see that various other software projects use the 
same conversion, for example GNU GetText 
http://cvs.savannah.gnu.org/viewvc/gettext/gettext-tools/src/filter-sr-latin.c?revision=1.4&root=gettext&view=markup
 or MediaWiki 
https://phabricator.wikimedia.org/diffusion/MW/browse/master/languages/classes/LanguageSr.php

I have never seen ISO 9 used in practice, and it wouldn't be useful here 
anyway, since no one would enter the queries in ISO 9.

> New Serbian Filter
> --
>
> Key: LUCENE-6875
> URL: https://issues.apache.org/jira/browse/LUCENE-6875
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Nikola Smolenski
>Priority: Minor
> Attachments: Lucene-Serbian-Regular.patch
>
>
> This is a new Serbian filter that works with regular Latin text (the current 
> filter works with "bald" Latin). I described in detail what does it do and 
> why is it necessary at the wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8139) Provide a way for the admin UI to utilize managed schema functionality

2015-11-03 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987167#comment-14987167
 ] 

Alexandre Rafalovitch commented on SOLR-8139:
-

Please note that we are hiding the managed-file schema in the file browser as 
per SOLR-6992. Which may cause some confusion to people looking at the 
directory structure via the browser. Unless the whole file tab UI is going away 
of course.

> Provide a way for the admin UI to utilize managed schema functionality
> --
>
> Key: SOLR-8139
> URL: https://issues.apache.org/jira/browse/SOLR-8139
> Project: Solr
>  Issue Type: Improvement
>  Components: UI
>Reporter: Erick Erickson
>Assignee: Upayavira
> Attachments: SOLR-8139.patch, SOLR-8139.patch, 
> add-field-with-errors.png, add-field-with-omit-open.png, add-field.png
>
>
> See the discussion at the related SOLR-8131. The suggestion there is to make 
> managed schema the default in 6.0. To make the new-user experience much 
> smoother in that setup, it would be great if the admin UI had a simple 
> wrapper around the managed schema API.
> It would be a fine thing to have a way of bypassing the whole "find the magic 
> config set, edit it in your favorite editor, figure out how to upload it via 
> zkcli then reload the collection" current paradigm and instead be able to 
> update the schema via the admin UI.
> This should bypass the issues with uploading arbitrary XML to the server that 
> shot down one of the other attempts to edit the schema from the admin UI.
> This is mostly a marker. This could be a significant differentiator between 
> the old and new admin UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8139) Provide a way for the admin UI to utilize managed schema functionality

2015-11-03 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987167#comment-14987167
 ] 

Alexandre Rafalovitch edited comment on SOLR-8139 at 11/3/15 12:12 PM:
---

Please note that we are hiding the managed-schema file in the *file* browser as 
per SOLR-6992. Which may cause some confusion to people looking at the 
directory structure via the browser. Unless the whole file tab UI is going away 
of course.


was (Author: arafalov):
Please note that we are hiding the managed-file schema in the file browser as 
per SOLR-6992. Which may cause some confusion to people looking at the 
directory structure via the browser. Unless the whole file tab UI is going away 
of course.

> Provide a way for the admin UI to utilize managed schema functionality
> --
>
> Key: SOLR-8139
> URL: https://issues.apache.org/jira/browse/SOLR-8139
> Project: Solr
>  Issue Type: Improvement
>  Components: UI
>Reporter: Erick Erickson
>Assignee: Upayavira
> Attachments: SOLR-8139.patch, SOLR-8139.patch, 
> add-field-with-errors.png, add-field-with-omit-open.png, add-field.png
>
>
> See the discussion at the related SOLR-8131. The suggestion there is to make 
> managed schema the default in 6.0. To make the new-user experience much 
> smoother in that setup, it would be great if the admin UI had a simple 
> wrapper around the managed schema API.
> It would be a fine thing to have a way of bypassing the whole "find the magic 
> config set, edit it in your favorite editor, figure out how to upload it via 
> zkcli then reload the collection" current paradigm and instead be able to 
> update the schema via the admin UI.
> This should bypass the issues with uploading arbitrary XML to the server that 
> shot down one of the other attempts to edit the schema from the admin UI.
> This is mostly a marker. This could be a significant differentiator between 
> the old and new admin UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8224) Wild card query do not return result

2015-11-03 Thread Mridul Srvastava (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987114#comment-14987114
 ] 

Mridul Srvastava edited comment on SOLR-8224 at 11/3/15 12:24 PM:
--

Thank you Erick Erickson for your valuable response,

I got to know the behaviour of "PorterStemFilter" after analysing some token 
via  admin/analysis. After removing  "PorterStemFilter" it is working fine.

Please let me know are filters  not apply on wild card query in solr. As per 
one answer 
"http://stackoverflow.com/questions/8240329/solr-case-insensitive-search";.
Wildcard queries does not undergo analysis.


was (Author: srimridul):
Thank you Erick Erickson for your valuable response,

I got to know the behaviour of "PorterStemFilter" after analysing some token 
via  admin/analysis. After removing  "PorterStemFilter" it is working fine.

Please let me know are filters  not apply on wild card query in solr.

> Wild card query do not return result 
> -
>
> Key: SOLR-8224
> URL: https://issues.apache.org/jira/browse/SOLR-8224
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.9
>Reporter: Mridul Srvastava
>
> Hi, 
> My search query returns below result :
> fulladdress:(cypress* reserve) - "numFound": 217
> but
> fulladdress:(cypress* reserve*) - "numFound": 0
> fulladdress:(reserve*) - "numFound": 0
> Configuration in Schema.xml  is like below:
> 
>   
> 
>   
>   
>
>words="stopwords.txt" />
>  generateWordParts="1" 
>   generateNumberParts="1" 
>   catenateWords="0" 
>   catenateNumbers="0" 
>   catenateAll="0" 
>   splitOnCaseChange="0"
>   splitOnNumerics="0"
>   stemEnglishPossessive="1"
>   preserveOriginal="1"
>   />
>protected="protwords.txt"/>
> 
>   
>   
> 
>   
>   
>
>words="stopwords.txt" />
>  generateWordParts="1" 
>   generateNumberParts="1" 
>   catenateWords="0" 
>   catenateNumbers="0" 
>   catenateAll="0" 
>   splitOnCaseChange="0"
>   splitOnNumerics="0"
>   stemEnglishPossessive="1"
>   preserveOriginal="0"
>   />
>protected="protwords.txt"/>
> 
>   
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7858) Make Angular UI default

2015-11-03 Thread Upayavira (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987183#comment-14987183
 ] 

Upayavira commented on SOLR-7858:
-

Notes for reference on tasks to make UI default in 5x:

 * move admin.html to old.html
 * fix links in index.html to point to old.html
 * remove "warning" message with a 'go to original UI' link
 * set the ROOT_URL="/" in app.js
 * update web.xml (ref to admin.html, and welcome-file)
 * probably more...

> Make Angular UI default
> ---
>
> Key: SOLR-7858
> URL: https://issues.apache.org/jira/browse/SOLR-7858
> Project: Solr
>  Issue Type: Bug
>  Components: web gui
>Reporter: Upayavira
>Assignee: Upayavira
>Priority: Minor
> Attachments: SOLR-7858-2.patch, SOLR-7858-3.patch, SOLR-7858-4.patch, 
> SOLR-7858-fix.patch, SOLR-7858.patch, new ui link.png, original UI link.png
>
>
> Angular UI is very close to feature complete. Once SOLR-7856 is dealt with, 
> it should function well in most cases. I propose that, as soon as 5.3 has 
> been released, we make the Angular UI default, ready for the 5.4 release. We 
> can then fix any more bugs as they are found, but more importantly start 
> working on the features that were the reason for doing this work in the first 
> place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8231) Make UI popups more modal

2015-11-03 Thread Upayavira (JIRA)
Upayavira created SOLR-8231:
---

 Summary: Make UI popups more modal
 Key: SOLR-8231
 URL: https://issues.apache.org/jira/browse/SOLR-8231
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 5.4
Reporter: Upayavira
Assignee: Upayavira
Priority: Minor


SOLR-8139 adds managed schema buttons to the schema tab in the UI.

One small tweak included in this ticket is that, if you open a modal dialog 
then press escape, the modal will disappear.

This feature should be added to other modals in the UI, notably collections UI 
and core admin UI. Doing so is pretty trivial (adding two HTML attributes for 
each modal).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8199) Text specifying which UI a user is looking at is incorrect

2015-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987214#comment-14987214
 ] 

ASF subversion and git services commented on SOLR-8199:
---

Commit 1712282 from [~upayavira] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1712282 ]

SOLR-8199 Add 'try' to new UI link

> Text specifying which UI a user is looking at is incorrect
> --
>
> Key: SOLR-8199
> URL: https://issues.apache.org/jira/browse/SOLR-8199
> Project: Solr
>  Issue Type: Bug
>  Components: UI
>Reporter: Youssef Chaker
>Assignee: Upayavira
>Priority: Trivial
> Attachments: Screen_Shot_2015-10-24_at_10_21_08_AM.png, 
> Screen_Shot_2015-10-24_at_10_21_41_AM.png
>
>
> In the top right corner of the admin UI, a text is available to indicate 
> whether the user is looking at the original UI or the new one.
> But it currently says "New UI" for http://localhost:8983/solr/#/ and 
> "Original UI" for http://localhost:8983/solr/index.html#/ when it should be 
> the other way around.
> This issue is tied to #SOLR-7666



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [JENKINS-EA] Lucene-Solr-trunk-Linux (32bit/jdk1.9.0-ea-b85) - Build # 14764 - Failure!

2015-11-03 Thread Uwe Schindler
Hi,

It is updated to 8u66 and 9-ea-b90.
I have seen a second failure this night, so seems to be caused by a recent 
commit. If it still happens with b90, we should report bug.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Dawid Weiss [mailto:dawid.we...@gmail.com]
> Sent: Tuesday, November 03, 2015 8:49 AM
> To: dev@lucene.apache.org
> Subject: Re: [JENKINS-EA] Lucene-Solr-trunk-Linux (32bit/jdk1.9.0-ea-b85) -
> Build # 14764 - Failure!
> 
> Thanks Uwe.
> 
> D.
> 
> On Tue, Nov 3, 2015 at 12:37 AM, Uwe Schindler  wrote:
> > Hi,
> >
> > I will update the Java versions tomorrow, so we should maybe wait before
> reporting. Maybe it is already fixed.
> > I will also update to latest Java 8 bugfix builds (8u66).
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >> -Original Message-
> >> From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de]
> >> Sent: Monday, November 02, 2015 11:18 PM
> >> To: yo...@apache.org; dev@lucene.apache.org
> >> Subject: [JENKINS-EA] Lucene-Solr-trunk-Linux (32bit/jdk1.9.0-ea-b85)
> >> - Build # 14764 - Failure!
> >> Importance: Low
> >>
> >> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/14764/
> >> Java: 32bit/jdk1.9.0-ea-b85 -server -XX:+UseParallelGC
> >>
> >> All tests passed
> >>
> >> Build Log:
> >> [...truncated 5937 lines...]
> >>[junit4] JVM J2: stdout was not empty, see:
> >> /home/jenkins/workspace/Lucene-Solr-trunk-
> >> Linux/lucene/build/codecs/test/temp/junit4-J2-
> >> 20151102_201735_933.sysout
> >>[junit4] >>> JVM J2: stdout (verbatim) 
> >>[junit4] #
> >>[junit4] # A fatal error has been detected by the Java Runtime
> >> Environment:
> >>[junit4] #
> >>[junit4] #  SIGSEGV (0xb) at pc=0xf6da511f, pid=16709, tid=16779
> >>[junit4] #
> >>[junit4] # JRE version: Java(TM) SE Runtime Environment (9.0-b85)
> >> (build
> >> 1.9.0-ea-b85)
> >>[junit4] # Java VM: Java HotSpot(TM) Server VM (1.9.0-ea-b85,
> >> mixed mode, tiered, parallel gc, linux-x86)
> >>[junit4] # Problematic frame:
> >>[junit4] # V  [libjvm.so+0x8c211f][thread 16873 also had an error]
> >>[junit4]   SuperWord::get_pre_loop_end(CountedLoopNode*)+0xaf
> >>[junit4] #
> >>[junit4] # No core dump will be written. Core dumps have been
> >> disabled. To enable core dumping, try "ulimit -c unlimited" before starting
> Java again
> >>[junit4] #
> >>[junit4] # An error report file with more information is saved as:
> >>[junit4] # /home/jenkins/workspace/Lucene-Solr-trunk-
> >> Linux/lucene/build/codecs/test/J2/hs_err_pid16709.log
> >>[junit4] [thread 16876 also had an error]
> >>[junit4] # [ timer expired, abort... ]
> >>[junit4] <<< JVM J2: EOF 
> >>
> >> [...truncated 3 lines...]
> >>[junit4] ERROR: JVM J2 ended with an exception, command line:
> >> /home/jenkins/tools/java/32bit/jdk1.9.0-ea-b85/bin/java -server -
> >> XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError -
> >> XX:HeapDumpPath=/home/jenkins/workspace/Lucene-Solr-trunk-
> >> Linux/heapdumps -ea -esa -Dtests.prefix=tests -
> >> Dtests.seed=3C2AF9D3347D8042 -Xmx512M -Dtests.iters= -
> >> Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -
> >> Dtests.postingsformat=random -Dtests.docvaluesformat=random -
> >> Dtests.locale=random -Dtests.timezone=random -
> Dtests.directory=random
> >> - Dtests.linedocsfile=europarl.lines.txt.gz
> >> -Dtests.luceneMatchVersion=6.0.0 - Dtests.cleanthreads=perMethod -
> >> Djava.util.logging.config.file=/home/jenkins/workspace/Lucene-Solr-tr
> >> unk- Linux/lucene/tools/junit4/logging.properties
> >> -Dtests.nightly=false - Dtests.weekly=false -Dtests.monster=false
> >> -Dtests.slow=true - Dtests.asserts=true -Dtests.multiplier=3
> >> -DtempDir=./temp - Djava.io.tmpdir=./temp -
> >> Djunit4.tempDir=/home/jenkins/workspace/Lucene-Solr-trunk-
> >> Linux/lucene/build/codecs/test/temp -
> >> Dcommon.dir=/home/jenkins/workspace/Lucene-Solr-trunk-
> Linux/lucene -
> >> Dclover.db.dir=/home/jenkins/workspace/Lucene-Solr-trunk-
> >> Linux/lucene/build/clover/db -
> >> Djava.security.policy=/home/jenkins/workspace/Lucene-Solr-trunk-
> >> Linux/lucene/tools/junit4/tests.policy -Dtests.LUCENE_VERSION=6.0.0 -
> >> Djetty.testMode=1 -Djetty.insecurerandom=1 -
> >> Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -
> >> Djava.awt.headless=true -Djdk.map.althashing.threshold=0 -
> >> Djunit4.childvm.cwd=/home/jenkins/workspace/Lucene-Solr-trunk-
> >> Linux/lucene/build/codecs/test/J2 -Djunit4.childvm.id=2 -
> >> Djunit4.childvm.count=3 -Dtests.leaveTemporary=false -
> >> Dtests.filterstacks=true -Dtests.disableHdfs=true -
> >> Djava.security.manager=org.apache.lucene.util.TestSecurityManager -
> >> Dfile.encoding=UTF-8 -classpath /home/jenkins/workspace/Lucene-Solr-
> >> trunk-
> >>
>

[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 841 - Still Failing

2015-11-03 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/841/

1 tests failed.
FAILED:  
org.apache.solr.cloud.LeaderInitiatedRecoveryOnShardRestartTest.testRestartWithAllInLIR

Error Message:
Captured an uncaught exception in thread: Thread[id=63513, 
name=coreZkRegister-4791-thread-1, state=RUNNABLE, 
group=TGRP-LeaderInitiatedRecoveryOnShardRestartTest]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=63513, name=coreZkRegister-4791-thread-1, 
state=RUNNABLE, group=TGRP-LeaderInitiatedRecoveryOnShardRestartTest]
Caused by: java.lang.AssertionError
at __randomizedtesting.SeedInfo.seed([EAFA91E638D68228]:0)
at 
org.apache.solr.cloud.ZkController.updateLeaderInitiatedRecoveryState(ZkController.java:2126)
at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:433)
at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:197)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:157)
at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:346)
at 
org.apache.solr.cloud.ZkController.joinElection(ZkController.java:1113)
at org.apache.solr.cloud.ZkController.register(ZkController.java:926)
at org.apache.solr.cloud.ZkController.register(ZkController.java:881)
at org.apache.solr.core.ZkContainer$2.run(ZkContainer.java:183)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)




Build Log:
[...truncated 11075 lines...]
   [junit4] Suite: 
org.apache.solr.cloud.LeaderInitiatedRecoveryOnShardRestartTest
   [junit4]   2> Creating dataDir: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-trunk/solr/build/solr-core/test/J0/temp/solr.cloud.LeaderInitiatedRecoveryOnShardRestartTest_EAFA91E638D68228-001/init-core-data-001
   [junit4]   2> 2564342 INFO  
(SUITE-LeaderInitiatedRecoveryOnShardRestartTest-seed#[EAFA91E638D68228]-worker)
 [] o.a.s.BaseDistributedSearchTestCase Setting hostContext system 
property: /m_ovg/ey
   [junit4]   2> 2564344 INFO  
(TEST-LeaderInitiatedRecoveryOnShardRestartTest.testRestartWithAllInLIR-seed#[EAFA91E638D68228])
 [] o.a.s.c.ZkTestServer STARTING ZK TEST SERVER
   [junit4]   2> 2564344 INFO  (Thread-54620) [] o.a.s.c.ZkTestServer 
client port:0.0.0.0/0.0.0.0:0
   [junit4]   2> 2564344 INFO  (Thread-54620) [] o.a.s.c.ZkTestServer 
Starting server
   [junit4]   2> 256 INFO  
(TEST-LeaderInitiatedRecoveryOnShardRestartTest.testRestartWithAllInLIR-seed#[EAFA91E638D68228])
 [] o.a.s.c.ZkTestServer start zk server on port:58570
   [junit4]   2> 256 INFO  
(TEST-LeaderInitiatedRecoveryOnShardRestartTest.testRestartWithAllInLIR-seed#[EAFA91E638D68228])
 [] o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider
   [junit4]   2> 2564445 INFO  
(TEST-LeaderInitiatedRecoveryOnShardRestartTest.testRestartWithAllInLIR-seed#[EAFA91E638D68228])
 [] o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper
   [junit4]   2> 2564447 INFO  (zkCallback-2554-thread-1) [] 
o.a.s.c.c.ConnectionManager Watcher 
org.apache.solr.common.cloud.ConnectionManager@3b1539b4 
name:ZooKeeperConnection Watcher:127.0.0.1:58570 got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None
   [junit4]   2> 2564447 INFO  
(TEST-LeaderInitiatedRecoveryOnShardRestartTest.testRestartWithAllInLIR-seed#[EAFA91E638D68228])
 [] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
   [junit4]   2> 2564447 INFO  
(TEST-LeaderInitiatedRecoveryOnShardRestartTest.testRestartWithAllInLIR-seed#[EAFA91E638D68228])
 [] o.a.s.c.c.SolrZkClient Using default ZkACLProvider
   [junit4]   2> 2564447 INFO  
(TEST-LeaderInitiatedRecoveryOnShardRestartTest.testRestartWithAllInLIR-seed#[EAFA91E638D68228])
 [] o.a.s.c.c.SolrZkClient makePath: /solr
   [junit4]   2> 2564450 INFO  
(TEST-LeaderInitiatedRecoveryOnShardRestartTest.testRestartWithAllInLIR-seed#[EAFA91E638D68228])
 [] o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider
   [junit4]   2> 2564450 INFO  
(TEST-LeaderInitiatedRecoveryOnShardRestartTest.testRestartWithAllInLIR-seed#[EAFA91E638D68228])
 [] o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper
   [junit4]   2> 2564451 INFO  (zkCallback-2555-thread-1) [] 
o.a.s.c.c.ConnectionManager Watcher 
org.apache.solr.common.cloud.ConnectionManager@2d4756df 
name:ZooKeeperConnection Watcher:127.0.0.1:58570/solr got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None
   [junit4]   2> 2564451 INFO  
(TEST-LeaderInitiatedRecoveryOnSha

[jira] [Commented] (LUCENE-6880) Add document oriented collector for NRTSuggester

2015-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987327#comment-14987327
 ] 

Michael McCandless commented on LUCENE-6880:


The javadoc for TopSuggestDocs.SuggestScoreDoc needs to be fixed (it implies 
there's only one key now).

When a SuggestScoreDocs has multiple keys/contexts/scores, is there any 
implication to order?  Is it always sorted "best to worst" score?

I wonder if instead of 3 parallel lists, we should just have a list of 
SuggestScoreDoc (as it is in trunk today) for each doc hit?  In fact, this is 
really like grouping?  Maybe it should be a TopGroups?

It should be "fewer" not "less" in here :) : {{// This can still lead to 
collecting less paths then needed...}}


> Add document oriented collector for NRTSuggester
> 
>
> Key: LUCENE-6880
> URL: https://issues.apache.org/jira/browse/LUCENE-6880
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Areek Zillur
>Assignee: Areek Zillur
> Fix For: Trunk, 5.4
>
> Attachments: LUCENE-6880.patch
>
>
> Currently NRTSuggester collects completions iteratively as they are accepted 
> by the TopNSearcher, implying that a document can be collected more than 
> once. In case of indexing a completion with multiple context values, the 
> completion leads to {{num_context}} paths in the underlying FST for the same 
> docId and gets collected {{num_context}} times, when a query matches all its 
> contexts. 
> Ideally, a document-oriented collector will collect top N documents instead 
> of top N completions by handling the docId deduplication while collecting the 
> completions. This could be used to collect n unique documents that matched a 
> completion query. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6878) TopDocs.merge should use updateTop instead of pop / add

2015-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987328#comment-14987328
 ] 

ASF subversion and git services commented on LUCENE-6878:
-

Commit 1712298 from [~jpountz] in branch 'dev/trunk'
[ https://svn.apache.org/r1712298 ]

LUCENE-6878: Speed up TopDocs.merge.

> TopDocs.merge should use updateTop instead of pop / add
> ---
>
> Key: LUCENE-6878
> URL: https://issues.apache.org/jira/browse/LUCENE-6878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: Trunk
>Reporter: Daniel Jelinski
>Assignee: Adrien Grand
>Priority: Trivial
> Attachments: LUCENE-6878.patch
>
>
> The function TopDocs.merge uses PriorityQueue in a pattern: pop, update value 
> (ref.hitIndex++), add. JavaDocs for PriorityQueue.updateTop say that using 
> this function instead should be at least twice as fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6882) java.lang.NoClassDefFoundError: org/apache/lucene/codecs/lucene54/Lucene54Codec

2015-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987329#comment-14987329
 ] 

Michael McCandless commented on LUCENE-6882:


Maybe something is wrong with your classpath: Lucene 5.3 doesn't even have a 
Lucene54Codec?

> java.lang.NoClassDefFoundError: 
> org/apache/lucene/codecs/lucene54/Lucene54Codec
> ---
>
> Key: LUCENE-6882
> URL: https://issues.apache.org/jira/browse/LUCENE-6882
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: -tools
>Affects Versions: 5.3
> Environment: maven 3.2.5
> JDK 1.8
>Reporter: Martin Gainty
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> ---
> Test set: org.apache.lucene.analysis.ar.TestArabicAnalyzer
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.159 sec <<< 
> FAILURE! - in org.apache.lucene.analysis.ar.TestArabicAnalyzer
> org.apache.lucene.analysis.ar.TestArabicAnalyzer  Time elapsed: 0.156 sec  
> <<< ERROR!
> java.lang.NoClassDefFoundError: 
> org/apache/lucene/codecs/lucene54/Lucene54Codec
>   at 
> org.apache.lucene.util.LuceneTestCase.(LuceneTestCase.java:606)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Unknown Source)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:581)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.lucene.codecs.lucene54.Lucene54Codec
>   at java.net.URLClassLoader.findClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at 
> org.apache.lucene.util.LuceneTestCase.(LuceneTestCase.java:606)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Unknown Source)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:581)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6878) TopDocs.merge should use updateTop instead of pop / add

2015-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987350#comment-14987350
 ] 

ASF subversion and git services commented on LUCENE-6878:
-

Commit 1712299 from [~jpountz] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1712299 ]

LUCENE-6878: Speed up TopDocs.merge.

> TopDocs.merge should use updateTop instead of pop / add
> ---
>
> Key: LUCENE-6878
> URL: https://issues.apache.org/jira/browse/LUCENE-6878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: Trunk
>Reporter: Daniel Jelinski
>Assignee: Adrien Grand
>Priority: Trivial
> Attachments: LUCENE-6878.patch
>
>
> The function TopDocs.merge uses PriorityQueue in a pattern: pop, update value 
> (ref.hitIndex++), add. JavaDocs for PriorityQueue.updateTop say that using 
> this function instead should be at least twice as fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-6878) TopDocs.merge should use updateTop instead of pop / add

2015-11-03 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-6878.
--
   Resolution: Fixed
Fix Version/s: 5.4
   6.0

bq. I know we are discussing how to benchmark this change but I don't think 
that's needed before committing

Agreed, I just committed the change.

Daniel: I am marking the issue resolved since it was committed, but feel free 
to comment on it about your findings about potential performance improvements.

> TopDocs.merge should use updateTop instead of pop / add
> ---
>
> Key: LUCENE-6878
> URL: https://issues.apache.org/jira/browse/LUCENE-6878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: Trunk
>Reporter: Daniel Jelinski
>Assignee: Adrien Grand
>Priority: Trivial
> Fix For: 6.0, 5.4
>
> Attachments: LUCENE-6878.patch
>
>
> The function TopDocs.merge uses PriorityQueue in a pattern: pop, update value 
> (ref.hitIndex++), add. JavaDocs for PriorityQueue.updateTop say that using 
> this function instead should be at least twice as fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8224) Wild card query do not return result

2015-11-03 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987380#comment-14987380
 ] 

Erick Erickson commented on SOLR-8224:
--

Please ask the question on the user's list. For signup information, see the 
"mailing list" section here:
http://lucene.apache.org/solr/resources.html

> Wild card query do not return result 
> -
>
> Key: SOLR-8224
> URL: https://issues.apache.org/jira/browse/SOLR-8224
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.9
>Reporter: Mridul Srvastava
>
> Hi, 
> My search query returns below result :
> fulladdress:(cypress* reserve) - "numFound": 217
> but
> fulladdress:(cypress* reserve*) - "numFound": 0
> fulladdress:(reserve*) - "numFound": 0
> Configuration in Schema.xml  is like below:
> 
>   
> 
>   
>   
>
>words="stopwords.txt" />
>  generateWordParts="1" 
>   generateNumberParts="1" 
>   catenateWords="0" 
>   catenateNumbers="0" 
>   catenateAll="0" 
>   splitOnCaseChange="0"
>   splitOnNumerics="0"
>   stemEnglishPossessive="1"
>   preserveOriginal="1"
>   />
>protected="protwords.txt"/>
> 
>   
>   
> 
>   
>   
>
>words="stopwords.txt" />
>  generateWordParts="1" 
>   generateNumberParts="1" 
>   catenateWords="0" 
>   catenateNumbers="0" 
>   catenateAll="0" 
>   splitOnCaseChange="0"
>   splitOnNumerics="0"
>   stemEnglishPossessive="1"
>   preserveOriginal="0"
>   />
>protected="protwords.txt"/>
> 
>   
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [JENKINS-EA] Lucene-Solr-trunk-Linux (32bit/jdk1.9.0-ea-b85) - Build # 14767 - Failure!

2015-11-03 Thread Uwe Schindler
Hi,

Same error like before (also Java 9 build 85, 32 bits). So seems reproducible.
But the recent run with build 90 seems to pass on 32 bits: 
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/14773/console

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de]
> Sent: Tuesday, November 03, 2015 4:36 AM
> To: dev@lucene.apache.org
> Subject: [JENKINS-EA] Lucene-Solr-trunk-Linux (32bit/jdk1.9.0-ea-b85) - Build
> # 14767 - Failure!
> Importance: Low
> 
> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/14767/
> Java: 32bit/jdk1.9.0-ea-b85 -server -XX:+UseSerialGC
> 
> All tests passed
> 
> Build Log:
> [...truncated 5713 lines...]
>[junit4] JVM J2: stdout was not empty, see:
> /home/jenkins/workspace/Lucene-Solr-trunk-
> Linux/lucene/build/codecs/test/temp/junit4-J2-
> 20151103_033410_732.sysout
>[junit4] >>> JVM J2: stdout (verbatim) 
>[junit4] #
>[junit4] # A fatal error has been detected by the Java Runtime
> Environment:
>[junit4] #
>[junit4] #  SIGSEGV (0xb) at pc=0xf6dec11f, pid=20378, tid=20412
>[junit4] #
>[junit4] # JRE version: Java(TM) SE Runtime Environment (9.0-b85) (build
> 1.9.0-ea-b85)
>[junit4] # Java VM: Java HotSpot(TM) Server VM (1.9.0-ea-b85, mixed
> mode, tiered, serial gc, linux-x86)
>[junit4] # Problematic frame:
>[junit4] # V  [libjvm.so+0x8c211f]
> SuperWord::get_pre_loop_end(CountedLoopNode*)+0xaf
>[junit4] #
>[junit4] # No core dump will be written. Core dumps have been disabled. To
> enable core dumping, try "ulimit -c unlimited" before starting Java again
>[junit4] #
>[junit4] # An error report file with more information is saved as:
>[junit4] # /home/jenkins/workspace/Lucene-Solr-trunk-
> Linux/lucene/build/codecs/test/J2/hs_err_pid20378.log
>[junit4] #
>[junit4] # Compiler replay data is saved as:
>[junit4] # /home/jenkins/workspace/Lucene-Solr-trunk-
> Linux/lucene/build/codecs/test/J2/replay_pid20378.log
>[junit4] #
>[junit4] # If you would like to submit a bug report, please visit:
>[junit4] #   http://bugreport.java.com/bugreport/crash.jsp
>[junit4] #
>[junit4] <<< JVM J2: EOF 
> 
> [...truncated 58 lines...]
>[junit4] ERROR: JVM J2 ended with an exception, command line:
> /home/jenkins/tools/java/32bit/jdk1.9.0-ea-b85/bin/java -server -
> XX:+UseSerialGC -XX:+HeapDumpOnOutOfMemoryError -
> XX:HeapDumpPath=/home/jenkins/workspace/Lucene-Solr-trunk-
> Linux/heapdumps -ea -esa -Dtests.prefix=tests -
> Dtests.seed=FC6CEA168FFAC75D -Xmx512M -Dtests.iters= -
> Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -
> Dtests.postingsformat=random -Dtests.docvaluesformat=random -
> Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -
> Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=6.0.0 -
> Dtests.cleanthreads=perMethod -
> Djava.util.logging.config.file=/home/jenkins/workspace/Lucene-Solr-trunk-
> Linux/lucene/tools/junit4/logging.properties -Dtests.nightly=false -
> Dtests.weekly=false -Dtests.monster=false -Dtests.slow=true -
> Dtests.asserts=true -Dtests.multiplier=3 -DtempDir=./temp -
> Djava.io.tmpdir=./temp -
> Djunit4.tempDir=/home/jenkins/workspace/Lucene-Solr-trunk-
> Linux/lucene/build/codecs/test/temp -
> Dcommon.dir=/home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene -
> Dclover.db.dir=/home/jenkins/workspace/Lucene-Solr-trunk-
> Linux/lucene/build/clover/db -
> Djava.security.policy=/home/jenkins/workspace/Lucene-Solr-trunk-
> Linux/lucene/tools/junit4/tests.policy -Dtests.LUCENE_VERSION=6.0.0 -
> Djetty.testMode=1 -Djetty.insecurerandom=1 -
> Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -
> Djava.awt.headless=true -Djdk.map.althashing.threshold=0 -
> Djunit4.childvm.cwd=/home/jenkins/workspace/Lucene-Solr-trunk-
> Linux/lucene/build/codecs/test/J2 -Djunit4.childvm.id=2 -
> Djunit4.childvm.count=3 -Dtests.leaveTemporary=false -
> Dtests.filterstacks=true -Dtests.disableHdfs=true -
> Djava.security.manager=org.apache.lucene.util.TestSecurityManager -
> Dfile.encoding=UTF-8 -classpath /home/jenkins/workspace/Lucene-Solr-
> trunk-
> Linux/lucene/build/codecs/classes/test:/home/jenkins/workspace/Lucene-
> Solr-trunk-Linux/lucene/build/test-
> framework/classes/java:/home/jenkins/workspace/Lucene-Solr-trunk-
> Linux/lucene/build/codecs/classes/java:/home/jenkins/workspace/Lucene-
> Solr-trunk-
> Linux/lucene/build/core/classes/java:/home/jenkins/workspace/Lucene-
> Solr-trunk-Linux/lucene/test-framework/lib/junit-
> 4.10.jar:/home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/test-
> framework/lib/randomizedtesting-runner-
> 2.2.0.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/
> lib/ant-launcher.jar:/var/lib/jenkins/.ant/lib/ivy-
> 2.3.0.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/A

[jira] [Commented] (LUCENE-6875) New Serbian Filter

2015-11-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987385#comment-14987385
 ] 

Robert Muir commented on LUCENE-6875:
-

I think the scheme is fine.

in the patch, the "regular" filter actually documents that it goes to "bald". I 
think this is just an accident?


> New Serbian Filter
> --
>
> Key: LUCENE-6875
> URL: https://issues.apache.org/jira/browse/LUCENE-6875
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Nikola Smolenski
>Priority: Minor
> Attachments: Lucene-Serbian-Regular.patch
>
>
> This is a new Serbian filter that works with regular Latin text (the current 
> filter works with "bald" Latin). I described in detail what does it do and 
> why is it necessary at the wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-6874:
---
Attachment: LUCENE-6874-jflex.patch

Patch adding a JFlex-based UnicodeWhitespaceTokenizer(Factory), along with some 
performance testing bits, some of which aren't commitable (hard-coded paths).  
Also includes the SPI entry missing from Uwe's patch for 
{{ICUWhitespaceTokenizerFactory}} in 
{{lucene/icu/src/resources/META-INF/services/o.a.l.analysis.util.TokenizerFactory}},
 as well as a couple bugfixes for {{lucene/benchmark}} (which I'll commit under 
a separate JIRA).  The patch also includes all of Uwe's patch.

I did three performance comparisons on my Macbook Pro with Oracle Java 1.8.0_20 
of the {{Character.isWhitespace()}}-based {{WhitespaceTokenizer}}, Uwe's 
{{ICUWhitespaceTokenizer}}, and the JFlex {{UnicodeWhitespaceTokenizer}}:

1. Using the {{wstok.alg}} in the patch, I ran {{lucene/benchmark}} over 20k 
(English news) Reuters docs: dropping the lowest throughput of 5 rounds and 
averaging the other 4:

||Tokenizer||Avg tok/sec||Throughput compared to {{WhitespaceTokenizer}}||
|{{WhitespaceTokenizer}}|1.515M|N/A|
|{{ICUWhitespaceTokenizer}}|1.447M|{color:red}-5.5%{color}|
|{{UnicodeWhitespaceTokenizer}}|1.514M|{color:red}-0.1%{color}|

2. I concatenated all ~20k Reuters docs into one file, loaded it into memory 
and then ran each tokenizer over it 11 times, discarding info from the first 
and averaging the other 10 (this is {{testReuters()}} in the {{Test*}} files in 
the patch:

||Tokenizer||Avg tok/sec||Throughput compared to {{WhitespaceTokenizer}}||
|{{WhitespaceTokenizer}}|14.47M|N/A|
|{{ICUWhitespaceTokenizer}}|9.26M|{color:red}-36%{color}|
|{{UnicodeWhitespaceTokenizer}}|11.60M|{color:red}-20%{color}|

3. I used a fixed random seed and generated 10k random Unicode strings of at 
most 10k chars using {{TestUtil.randomUnicodeString()}}.  Note that this is 
non-realistic data for tokenization, not least because the average whitespace 
density is very low compared to natural language.  In running this test I 
noticed that {{WhitespaceTokenizer}} was returning many more tokens than the 
other two, and I tracked it down to differences in the definition of whitespace:

* {{Character.isWhitespace()}} returns true for the following while Unicode 
6.3.0 (Lucene's current Unicode version) does not: U+001C, U+001D, U+001E, 
U+001F, U+0180E.  (U+0180E was removed from Unicode's whitespace definition in 
Unicode 6.3.0. Java 8 uses Unicode 6.2.0.)
* Unicode 6.3.0 says the following are whitespace while 
{{Character.isWhitespace()}} does not: U+0085, U+00A0, U+2007, U+202F.  The 
last 3 are documented, but U+0085 {{NEXT LINE (NEL)}} isn't documented anywhere 
I can see; it was added to Unicode's whitespace definition in Unicode 3.0 
(released 2001). 

So in order to be able to directly compare the performance of three tokenizers 
over this data, I replaced all non-consensus whitespace characters with a space 
before running the test.

||Tokenizer||Avg tok/sec||Throughput compared to {{WhitespaceTokenizer}}||
|{{WhitespaceTokenizer}}|897k|N/A|
|{{ICUWhitespaceTokenizer}}|880k|{color:red}-2%{color}|
|{{UnicodeWhitespaceTokenizer}}|1,605k|{color:green}+79%{color}|

One other thing I noticed for this test when I compared 
{{ICUWhitespaceTokenizer}}'s output with that of 
{{UnicodeWhitespaceTokenizer}}'s: they don't always find the same break points. 
 This is because although both forcibly break at the max token length (255 
chars, fixed for {{CharTokenizer}} and the default for Lucene's JFlex 
scanners), {{CharTokenizer}} allows tokens to exceed its max token char length 
of 255 by one char when a surrogate pair would otherwise be broken, while 
Lucene's JFlex scanners break at 254 chars in this case. 

-

Conclusion: for throughput over realistic ASCII data, the original 
{{WhitespaceTokenizer}} performs best, followed by the JFlex-based tokenizer in 
this patch ({{UnicodeWhitespaceTokenizer}}), followed by the ICU-based 
{{ICUWhitespaceTokenizer}} in Uwe's patch.

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace s

[jira] [Commented] (LUCENE-6875) New Serbian Filter

2015-11-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987403#comment-14987403
 ] 

Dawid Weiss commented on LUCENE-6875:
-

It's fine with me as well, I was just curious. I am definitely not the 
authority to tell whether it's good or bad :)

> New Serbian Filter
> --
>
> Key: LUCENE-6875
> URL: https://issues.apache.org/jira/browse/LUCENE-6875
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Nikola Smolenski
>Priority: Minor
> Attachments: Lucene-Serbian-Regular.patch
>
>
> This is a new Serbian filter that works with regular Latin text (the current 
> filter works with "bald" Latin). I described in detail what does it do and 
> why is it necessary at the wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987410#comment-14987410
 ] 

Uwe Schindler commented on LUCENE-6874:
---

Thanks Steve! I just noticed that your patch contains hardcoded filenames of 
your local system, I think those are just leftovers from your testing with 
reuters. Otherwise I am not happy about the size of the generated files, but 
thats how jflex works...

Sorry for forgetting to add the analyzer factory, I was just too fast in 
copypasting code yesterday :-) Thanks for adding.

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987423#comment-14987423
 ] 

Steve Rowe commented on LUCENE-6874:


bq. I just noticed that your patch contains hardcoded filenames of your local 
system, I think those are just leftovers from your testing with reuters.

Yup, patch needs cleanup before it can be committed.  I figured the decision 
about what to do hasn't been made yet, so I'll wait on doing that work until 
then.

bq. Otherwise I am not happy about the size of the generated files, but thats 
how jflex works...

I don't think the generated Java source makes much difference - the thing 
people will deal with is the JFlex source, and it's fairly compact.  I looked 
at the .class file sizes on my system, and I see 13k for the JFlex version and 
2k for the ICU version. 

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6872) IndexWriter OOM handling should be any VirtualMachineError

2015-11-03 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987433#comment-14987433
 ] 

Uwe Schindler commented on LUCENE-6872:
---

+1

> IndexWriter OOM handling should be any VirtualMachineError
> --
>
> Key: LUCENE-6872
> URL: https://issues.apache.org/jira/browse/LUCENE-6872
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-6872.patch
>
>
> IndexWriter is defensive in this case: this error could come from any 
> unexpected place.
> But its superclass VirtualMachineError is the correct one: "Thrown to 
> indicate that the Java Virtual Machine is broken or has run out of resources 
> necessary for it to continue operating."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987455#comment-14987455
 ] 

David Smiley commented on LUCENE-6874:
--

Nice thorough job Steve!

I propose that we consolidate the TokenizerFactories here into one -- the 
existing WhitespaceTokenizerFactory.  I think this is more user friendly.  An 
attribute could select *which* whitespace definition the user wants:  "java" or 
"unicode".  What do you think?

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6872) IndexWriter OOM handling should be any VirtualMachineError

2015-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987457#comment-14987457
 ] 

ASF subversion and git services commented on LUCENE-6872:
-

Commit 1712310 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1712310 ]

LUCENE-6872: IndexWriter OOM handling should be any VirtualMachineError

> IndexWriter OOM handling should be any VirtualMachineError
> --
>
> Key: LUCENE-6872
> URL: https://issues.apache.org/jira/browse/LUCENE-6872
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-6872.patch
>
>
> IndexWriter is defensive in this case: this error could come from any 
> unexpected place.
> But its superclass VirtualMachineError is the correct one: "Thrown to 
> indicate that the Java Virtual Machine is broken or has run out of resources 
> necessary for it to continue operating."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6879) Allow to define custom CharTokenizer using Java 8 Lambdas/Method references

2015-11-03 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987491#comment-14987491
 ] 

David Smiley commented on LUCENE-6879:
--

+1 Nice Uwe.

> Allow to define custom CharTokenizer using Java 8 Lambdas/Method references
> ---
>
> Key: LUCENE-6879
> URL: https://issues.apache.org/jira/browse/LUCENE-6879
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: Trunk
>Reporter: Uwe Schindler
> Fix For: Trunk
>
> Attachments: LUCENE-6879.patch
>
>
> As a followup from LUCENE-6874, I thought about how to generate custom 
> CharTokenizers wthout subclassing. I had this quite often and I was a bit 
> annoyed, that you had to create a subclass every time.
> This issue is using the pattern like ThreadLocal or many collection methods 
> in Java 8: You have the (abstract) base class and you define a factory method 
> named {{fromXxxPredicate}} (like {{ThreadLocal.withInitial(() -> value}}).
> {code:java}
> public static CharTokenizer 
> fromTokenCharPredicate(java.util.function.IntPredicate predicate)
> {code}
> This would allow to define a new CharTokenizer with a single line statement 
> using any predicate:
> {code:java}
> // long variant with lambda:
> Tokenizer tok = CharTokenizer.fromTokenCharPredicate(c -> 
> !UCharacter.isUWhiteSpace(c));
> // method reference for separator char predicate + normalization by 
> uppercasing:
> Tokenizer tok = 
> CharTokenizer.fromSeparatorCharPredicate(UCharacter::isUWhiteSpace, 
> Character::toUpperCase);
> // method reference to custom function:
> private boolean myTestFunction(int c) {
>  return (cracy condition);
> }
> Tokenizer tok = CharTokenizer.fromTokenCharPredicate(this::myTestFunction);
> {code}
> I know this would not help Solr users that want to define the Tokenizer in a 
> config file, but for real Lucene users this Java 8-like way would be easy and 
> elegant to use. It is fast as hell, as it is just a reference to a method and 
> Java 8 is optimized for that.
> The inverted factories {{fromSeparatorCharPredicate()}} are provided to allow 
> quick definition without lambdas using method references. In lots of cases, 
> like WhitespaceTokenizer, predicates are on the separator chars 
> ({{isWhitespace(int)}}, so using the 2nd set of factories you can define them 
> without the counter-intuitive negation. Internally it just uses 
> {{Predicate#negate()}}.
> The factories also allow to give the normalization function, e.g. to 
> Lowercase, you may just give {{Character::toLowerCase}} as 
> {{IntUnaryOperator}} reference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6872) IndexWriter OOM handling should be any VirtualMachineError

2015-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987504#comment-14987504
 ] 

ASF subversion and git services commented on LUCENE-6872:
-

Commit 1712314 from [~rcmuir] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1712314 ]

LUCENE-6872: IndexWriter OOM handling should be any VirtualMachineError

> IndexWriter OOM handling should be any VirtualMachineError
> --
>
> Key: LUCENE-6872
> URL: https://issues.apache.org/jira/browse/LUCENE-6872
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-6872.patch
>
>
> IndexWriter is defensive in this case: this error could come from any 
> unexpected place.
> But its superclass VirtualMachineError is the correct one: "Thrown to 
> indicate that the Java Virtual Machine is broken or has run out of resources 
> necessary for it to continue operating."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-6872) IndexWriter OOM handling should be any VirtualMachineError

2015-11-03 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-6872.
-
   Resolution: Fixed
Fix Version/s: 5.4
   Trunk

> IndexWriter OOM handling should be any VirtualMachineError
> --
>
> Key: LUCENE-6872
> URL: https://issues.apache.org/jira/browse/LUCENE-6872
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: Trunk, 5.4
>
> Attachments: LUCENE-6872.patch
>
>
> IndexWriter is defensive in this case: this error could come from any 
> unexpected place.
> But its superclass VirtualMachineError is the correct one: "Thrown to 
> indicate that the Java Virtual Machine is broken or has run out of resources 
> necessary for it to continue operating."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6877) eclipse generated try/catch discards exception

2015-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987516#comment-14987516
 ] 

ASF subversion and git services commented on LUCENE-6877:
-

Commit 1712316 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1712316 ]

LUCENE-6877: eclipse generated try/catch discards exception

> eclipse generated try/catch discards exception
> --
>
> Key: LUCENE-6877
> URL: https://issues.apache.org/jira/browse/LUCENE-6877
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
> Attachments: LUCENE-6877.patch
>
>
> The (horrible) eclipse default is something like this:
> {noformat}
> // TODO: autogenerated stub
> e.printStackTrace();
> {noformat}
> The current eclipse config does this, which is better, but loses the original 
> exc:
> {noformat}
> throw new RuntimeException()
> {noformat}
> But it should be:
> {noformat}
> throw new RuntimeException(${exception_var})
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-6877) eclipse generated try/catch discards exception

2015-11-03 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-6877.
-
   Resolution: Fixed
Fix Version/s: 5.4
   Trunk

> eclipse generated try/catch discards exception
> --
>
> Key: LUCENE-6877
> URL: https://issues.apache.org/jira/browse/LUCENE-6877
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: Trunk, 5.4
>
> Attachments: LUCENE-6877.patch
>
>
> The (horrible) eclipse default is something like this:
> {noformat}
> // TODO: autogenerated stub
> e.printStackTrace();
> {noformat}
> The current eclipse config does this, which is better, but loses the original 
> exc:
> {noformat}
> throw new RuntimeException()
> {noformat}
> But it should be:
> {noformat}
> throw new RuntimeException(${exception_var})
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6877) eclipse generated try/catch discards exception

2015-11-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987517#comment-14987517
 ] 

ASF subversion and git services commented on LUCENE-6877:
-

Commit 1712317 from [~rcmuir] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1712317 ]

LUCENE-6877: eclipse generated try/catch discards exception

> eclipse generated try/catch discards exception
> --
>
> Key: LUCENE-6877
> URL: https://issues.apache.org/jira/browse/LUCENE-6877
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: Trunk, 5.4
>
> Attachments: LUCENE-6877.patch
>
>
> The (horrible) eclipse default is something like this:
> {noformat}
> // TODO: autogenerated stub
> e.printStackTrace();
> {noformat}
> The current eclipse config does this, which is better, but loses the original 
> exc:
> {noformat}
> throw new RuntimeException()
> {noformat}
> But it should be:
> {noformat}
> throw new RuntimeException(${exception_var})
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Jeffrey Stylos (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987532#comment-14987532
 ] 

Jeffrey Stylos commented on SOLR-8029:
--

Hi, IBM employee here — we use Solr in our Retrieve and Rank service and are 
excited about a Solr v2 API to improve the usability of our service.

Some thoughts on the proposed changes:

/v2/ is more standard than /solr2/ (looks like others agree)

Having a path parameter (/v2/{collection}) at the top-level makes it difficult 
to add new resources or other paths. A more REST-standard approach would be 
preface path parameters with a static path value 
(/v2/collections/{collection}). This would also allow the removal of the _ 
preface on _node and _cluster.

There is some inconsistency around naming style, with a mixture of snake_case, 
hyphen-case, camelCase, unseparatedtext, and abbreviations. A v2 API would be a 
good opportunity to make all of the identifiers use a consistent naming 
convention.

On naming, we’ve found in API usability studies that acronyms and abbreviations 
(like “wt”) make APIs harder to understand.

HOCON is an interesting suggestion, although I have some concerns about it from 
a usability standpoint. In our API usability studies one of the most common 
mistakes using JSON has been attempting to use single quotes instead of double 
quotes — HOCON doesn’t fix this, and an fact can make things worse by resulting 
in unexpected behavior ( https://github.com/akka/akka-meta/issues/2 ). By 
attempting to be more permissive in its parsing, HOCON makes it more difficult 
for a parser to generate helpful error messages (such as in the common 
single-quote scenario). Breaking support for existing pretty printers and 
syntax highlighters is also a concern.

One suggestion for an additional feature: a version date parameter, ala 
FourSquare and Stripe, (?version=2015-11-03) would offer greater flexibility in 
being able to evolve the API without breaking users.

And finally, a v2 would be a good time to question the core object model and 
abstractions of the service.

(For reference, the API guidelines we use for IBM Watson APIs are public at: 
https://github.com/watson-developer-cloud/api-guidelines.)

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Jeffrey Stylos (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987532#comment-14987532
 ] 

Jeffrey Stylos edited comment on SOLR-8029 at 11/3/15 4:12 PM:
---

Hi, IBM employee here — we use Solr in our Retrieve and Rank service and are 
excited about a Solr v2 API to improve the usability of our service.

Some thoughts on the proposed changes:

/v2/ is more standard than /solr2/ (looks like others agree)

Having a path parameter (/v2/[collection]) at the top-level makes it difficult 
to add new resources or other paths. A more REST-standard approach would be 
preface path parameters with a static path value 
(/v2/collections/[collection]). This would also allow the removal of the _ 
preface on _node and _cluster.

There is some inconsistency around naming style, with a mixture of snake_case, 
hyphen-case, camelCase, unseparatedtext, and abbreviations. A v2 API would be a 
good opportunity to make all of the identifiers use a consistent naming 
convention.

On naming, we’ve found in API usability studies that acronyms and abbreviations 
(like “wt”) make APIs harder to understand.

HOCON is an interesting suggestion, although I have some concerns about it from 
a usability standpoint. In our API usability studies one of the most common 
mistakes using JSON has been attempting to use single quotes instead of double 
quotes — HOCON doesn’t fix this, and an fact can make things worse by resulting 
in unexpected behavior ( https://github.com/akka/akka-meta/issues/2 ). By 
attempting to be more permissive in its parsing, HOCON makes it more difficult 
for a parser to generate helpful error messages (such as in the common 
single-quote scenario). Breaking support for existing pretty printers and 
syntax highlighters is also a concern.

One suggestion for an additional feature: a version date parameter, ala 
FourSquare and Stripe, (?version=2015-11-03) would offer greater flexibility in 
being able to evolve the API without breaking users.

And finally, a v2 would be a good time to question the core object model and 
abstractions of the service.

(For reference, the API guidelines we use for IBM Watson APIs are public at: 
https://github.com/watson-developer-cloud/api-guidelines.)


was (Author: jsstylos):
Hi, IBM employee here — we use Solr in our Retrieve and Rank service and are 
excited about a Solr v2 API to improve the usability of our service.

Some thoughts on the proposed changes:

/v2/ is more standard than /solr2/ (looks like others agree)

Having a path parameter (/v2/{collection}) at the top-level makes it difficult 
to add new resources or other paths. A more REST-standard approach would be 
preface path parameters with a static path value 
(/v2/collections/{collection}). This would also allow the removal of the _ 
preface on _node and _cluster.

There is some inconsistency around naming style, with a mixture of snake_case, 
hyphen-case, camelCase, unseparatedtext, and abbreviations. A v2 API would be a 
good opportunity to make all of the identifiers use a consistent naming 
convention.

On naming, we’ve found in API usability studies that acronyms and abbreviations 
(like “wt”) make APIs harder to understand.

HOCON is an interesting suggestion, although I have some concerns about it from 
a usability standpoint. In our API usability studies one of the most common 
mistakes using JSON has been attempting to use single quotes instead of double 
quotes — HOCON doesn’t fix this, and an fact can make things worse by resulting 
in unexpected behavior ( https://github.com/akka/akka-meta/issues/2 ). By 
attempting to be more permissive in its parsing, HOCON makes it more difficult 
for a parser to generate helpful error messages (such as in the common 
single-quote scenario). Breaking support for existing pretty printers and 
syntax highlighters is also a concern.

One suggestion for an additional feature: a version date parameter, ala 
FourSquare and Stripe, (?version=2015-11-03) would offer greater flexibility in 
being able to evolve the API without breaking users.

And finally, a v2 would be a good time to question the core object model and 
abstractions of the service.

(For reference, the API guidelines we use for IBM Watson APIs are public at: 
https://github.com/watson-developer-cloud/api-guidelines.)

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 

[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987545#comment-14987545
 ] 

Mark Miller commented on SOLR-8029:
---

bq. Should the Collection APIs have an explicit "_collection" path component

It really should. Though the leading underscore stuff looks silly to me. Can we 
throw in ` too? :)

Not having something like this now for cores or collections was a big mistake 
IMO.

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7928) Improve CheckIndex to work against HdfsDirectory

2015-11-03 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987553#comment-14987553
 ] 

Mike Drob commented on SOLR-7928:
-

The rebase looks good to me, thanks for taking care of that.

> Improve CheckIndex to work against HdfsDirectory
> 
>
> Key: SOLR-7928
> URL: https://issues.apache.org/jira/browse/SOLR-7928
> Project: Solr
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Mike Drob
>Assignee: Gregory Chanan
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7928.patch, SOLR-7928.patch, SOLR-7928.patch, 
> SOLR-7928.patch, SOLR-7928.patch
>
>
> CheckIndex is very useful for testing an index for corruption. However, it 
> can only work with an index on an FSDirectory, meaning that if you need to 
> check an Hdfs Index, then you have to download it to local disk (which can be 
> very large).
> We should have a way to natively check index on hdfs for corruption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987566#comment-14987566
 ] 

Noble Paul commented on SOLR-8029:
--

bq.A more REST-standard approach would be preface path parameters with a static 
path value (/v2/collections/[collection]).

The argument against this suggestion was that the most common operation on solr 
are read and update . These are collection specific. The cluster and node 
specific operations are rare . So the idea was to make the commonly used 
operations shorter

eg

{{/v2/mycollection/update}}
{{/v2/mycollection/select}}

instead of 

{{/v2/collections/mycollection/update}}
{{/v2/collections/mycollection/select}}

If we are willing to accept that {{_node}} and {{_cluster}} are special 
keywords , then we make the common operations simple which is performed 100's 
of times every second.  

bq.at the top-level makes it difficult to add new resources or other paths

All new resources will be added under the {{/v2/_cluster}} and {{/v2/_node}} 
paths 




> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987577#comment-14987577
 ] 

Mark Miller commented on SOLR-8029:
---

To me it's the same silly shortcut arguments that got us the current API mess. 
If we are redoing it, why make the exact same mistakes?

bq. If we are willing to accept that _node and _cluster are special keywords

And if we are willing to accept that future keywords will keep coming and so we 
are using this prefix '_' as an alternative to the widely accepted URL scoping 
that we should be doing.


> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8232) bin/solr does not rotate console log file

2015-11-03 Thread Upayavira (JIRA)
Upayavira created SOLR-8232:
---

 Summary: bin/solr does not rotate console log file
 Key: SOLR-8232
 URL: https://issues.apache.org/jira/browse/SOLR-8232
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 5.3
Reporter: Upayavira
Priority: Minor


The bin/solr script, when started with bin/solr start, uses this command to 
start Solr:

{code} nohup "$JAVA" "${SOLR_START_OPTS[@]}" $SOLR_ADDL_ARGS -jar start.jar \
"-XX:OnOutOfMemoryError=$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT 
$SOLR_LOGS_DIR" "${SOLR_JETTY_CONFIG[@]}" \
1>"$SOLR_LOGS_DIR/solr-$SOLR_PORT-console.log" 2>&1 & echo $! > 
"$SOLR_PID_DIR/solr-$SOLR_PORT.pid"
{code}

This sends console output to stdout, with no means of rotating the log file, 
meaning it will eventually fill the drive unless restarted.

I would propose that stdout be written to dev/null and we use proper means for 
handling logging, which can do proper log rotation as configured by the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987585#comment-14987585
 ] 

Mark Miller commented on SOLR-8029:
---

bq. All new resources will be added under the /v2/_cluster and /v2/_node paths

It's great that someone thinks that now, but it really won't have any meaning 
to Joe Committer next year. And if it's required over the long haul, it's a 
real weakness to the design. We have a much more flexible, widely used and 
understood alternative.

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987600#comment-14987600
 ] 

Noble Paul commented on SOLR-8029:
--

The argument was that all operations fall under three categories

* cluster specific
* node specific
* collection specific

If a new resource comes up it has to be one of these so it will be a sub path 
under these 

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Jason Gerlowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987631#comment-14987631
 ] 

Jason Gerlowski commented on SOLR-8029:
---

"The argument against this suggestion was that the most common operation on 
solr are read and update . These are collection specific. The cluster and node 
specific operations are rare . So the idea was to make the commonly used 
operations shorter"

What does making the commonly used paths shorter actually get us?

Are we trying to keep things shorter to increase readability?  If so, I'd argue 
that it would have the opposite effect.  I can only really speak for myself, 
but I think that the consistency gained by having a static path value 
(collections, node, cluster) used everywhere outweighs any negatives of having 
to read an extra path segment.

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Jason Gerlowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987631#comment-14987631
 ] 

Jason Gerlowski edited comment on SOLR-8029 at 11/3/15 5:11 PM:


bq. "The argument against this suggestion was that the most common operation on 
solr are read and update . These are collection specific. The cluster and node 
specific operations are rare . So the idea was to make the commonly used 
operations shorter"

What does making the commonly used paths shorter actually get us?

Are we trying to keep things shorter to increase readability?  If so, I'd argue 
that it would have the opposite effect.  I can only really speak for myself, 
but I think that the consistency gained by having a static path value 
(collections, node, cluster) used everywhere outweighs any negatives of having 
to read an extra path segment.


was (Author: gerlowskija):
"The argument against this suggestion was that the most common operation on 
solr are read and update . These are collection specific. The cluster and node 
specific operations are rare . So the idea was to make the commonly used 
operations shorter"

What does making the commonly used paths shorter actually get us?

Are we trying to keep things shorter to increase readability?  If so, I'd argue 
that it would have the opposite effect.  I can only really speak for myself, 
but I think that the consistency gained by having a static path value 
(collections, node, cluster) used everywhere outweighs any negatives of having 
to read an extra path segment.

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6872) IndexWriter OOM handling should be any VirtualMachineError

2015-11-03 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987643#comment-14987643
 ] 

Mike Drob commented on LUCENE-6872:
---

Should we modify the checks in SolrIndexWriter, DirectUpdateHandler2, and other 
solr pieces to also replace OOME with VME? Can do that in a solr JIRA, I 
suppose.

> IndexWriter OOM handling should be any VirtualMachineError
> --
>
> Key: LUCENE-6872
> URL: https://issues.apache.org/jira/browse/LUCENE-6872
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: Trunk, 5.4
>
> Attachments: LUCENE-6872.patch
>
>
> IndexWriter is defensive in this case: this error could come from any 
> unexpected place.
> But its superclass VirtualMachineError is the correct one: "Thrown to 
> indicate that the Java Virtual Machine is broken or has run out of resources 
> necessary for it to continue operating."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987651#comment-14987651
 ] 

Noble Paul commented on SOLR-8029:
--

bq.Are we trying to keep things shorter to increase readability?

Yes.
 I'm not strongly for or against for either. I would like to get others's 
opinions on this

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7495) Unexpected docvalues type NUMERIC when grouping by a int facet

2015-11-03 Thread Anton Khoff (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987661#comment-14987661
 ] 

Anton Khoff commented on SOLR-7495:
---

I confirm we have the same issue starting from 5.1 and it still exists in 
5.3.1. Sounds like a major one to me, hope it can be fixed soon

> Unexpected docvalues type NUMERIC when grouping by a int facet
> --
>
> Key: SOLR-7495
> URL: https://issues.apache.org/jira/browse/SOLR-7495
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3
>Reporter: Fabio Batista da Silva
> Attachments: SOLR-7495.patch
>
>
> Hey All,
> After upgrading from solr 4.10 to 5.1 with solr could
> I'm getting a IllegalStateException when i try to facet a int field.
> IllegalStateException: unexpected docvalues type NUMERIC for field 'year' 
> (expected=SORTED). Use UninvertingReader or index with docvalues.
> schema.xml
> {code}
> 
> 
> 
> 
> 
> 
>  multiValued="false" required="true"/>
>  multiValued="false" required="true"/>
> 
> 
>  stored="true"/>
> 
> 
> 
>  />
>  sortMissingLast="true"/>
>  positionIncrementGap="0"/>
>  positionIncrementGap="0"/>
>  positionIncrementGap="0"/>
>  precisionStep="0" positionIncrementGap="0"/>
>  positionIncrementGap="0"/>
>  positionIncrementGap="100">
> 
> 
>  words="stopwords.txt" />
> 
>  maxGramSize="15"/>
> 
> 
> 
>  words="stopwords.txt" />
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
>  positionIncrementGap="100">
> 
> 
>  words="stopwords.txt" />
> 
>  maxGramSize="15"/>
> 
> 
> 
>  words="stopwords.txt" />
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
>  class="solr.SpatialRecursivePrefixTreeFieldType" geo="true" 
> distErrPct="0.025" maxDistErr="0.09" units="degrees" />
> 
> id
> name
> 
> 
> {code}
> query :
> {code}
> http://solr.dev:8983/solr/my_collection/select?wt=json&fl=id&fq=index_type:foobar&group=true&group.field=year_make_model&group.facet=true&facet=true&facet.field=year
> {code}
> Exception :
> {code}
> ull:org.apache.solr.common.SolrException: Exception during facet.field: year
> at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:627)
> at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:612)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:566)
> at 
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:637)
> at 
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:280)
> at 
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:106)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at 
> o

[jira] [Commented] (LUCENE-6872) IndexWriter OOM handling should be any VirtualMachineError

2015-11-03 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987659#comment-14987659
 ] 

Mark Miller commented on LUCENE-6872:
-

It's really different cases I think. This is about IndexWriter handing 
unrecoverable errors properly. Solr's interest in OOMExceptions is about them 
bubbling up to the JVM so that you can run a script on the first one that does. 
That only works with OOMException, not other VirtualMachineErrors.

> IndexWriter OOM handling should be any VirtualMachineError
> --
>
> Key: LUCENE-6872
> URL: https://issues.apache.org/jira/browse/LUCENE-6872
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: Trunk, 5.4
>
> Attachments: LUCENE-6872.patch
>
>
> IndexWriter is defensive in this case: this error could come from any 
> unexpected place.
> But its superclass VirtualMachineError is the correct one: "Thrown to 
> indicate that the Java Virtual Machine is broken or has run out of resources 
> necessary for it to continue operating."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987651#comment-14987651
 ] 

Noble Paul edited comment on SOLR-8029 at 11/3/15 5:28 PM:
---

bq.Are we trying to keep things shorter to increase readability?

Yes. readability as well as writability
 I'm not strongly for or against for either. I would like to get others's 
opinions on this


was (Author: noble.paul):
bq.Are we trying to keep things shorter to increase readability?

Yes.
 I'm not strongly for or against for either. I would like to get others's 
opinions on this

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Steve Molloy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987762#comment-14987762
 ] 

Steve Molloy commented on SOLR-8029:


I'm +1 for dedicated paths for each resource, in other words, longer paths with 
collections in it and no special keywords. I personally agree that not having 
keywords will make things easier to read then having shorter URLs with special 
keywords.

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Solaris (64bit/jdk1.8.0) - Build # 160 - Failure!

2015-11-03 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Solaris/160/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseSerialGC

1 tests failed.
FAILED:  org.apache.solr.cloud.HttpPartitionTest.test

Error Message:
Error from server at http://127.0.0.1:38405/txt/x/c8n_1x2_leader_session_loss: 
Expected mime type application/octet-stream but got text/html.   
 
Error 404HTTP ERROR: 404 Problem 
accessing /txt/x/c8n_1x2_leader_session_loss/update. Reason: Not 
Found Powered by Jetty://   

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:38405/txt/x/c8n_1x2_leader_session_loss: 
Expected mime type application/octet-stream but got text/html. 


Error 404 


HTTP ERROR: 404
Problem accessing /txt/x/c8n_1x2_leader_session_loss/update. Reason:
Not Found
Powered by Jetty://



at 
__randomizedtesting.SeedInfo.seed([BC7F01540B04305E:342B3E8EA5F85DA6]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:543)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:174)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:139)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:153)
at 
org.apache.solr.cloud.HttpPartitionTest.testLeaderZkSessionLoss(HttpPartitionTest.java:516)
at 
org.apache.solr.cloud.HttpPartitionTest.test(HttpPartitionTest.java:121)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1660)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:866)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:902)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:916)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:777)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:811)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:822)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting

[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987866#comment-14987866
 ] 

Alexandre Rafalovitch commented on SOLR-8029:
-

+1 for consistency (/collection, /cluster) and for things that will make 3rd 
party tools play easier with Solr. 

Integration story is important, so sticking to more standard REST guidelines 
would be more beneficial in long run.

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987885#comment-14987885
 ] 

Shawn Heisey commented on SOLR-8029:


Digging around in my pocket for a couple more pennies...

Consistency is probably the number one goal when an API is redesigned.  There 
better be a REALLY good reason for any deviations that result in special cases, 
special syntax, etc.  I think that /select and /update should NOT have special 
shorter endpoints, and that identifiers should not have leading underscores (or 
any other kind of unusual marking) unless they really are some kind of special 
case that will only be used in highly unusual situations.

Related tangent: Using "/select" as the default query handler has always seemed 
like a strange choice to me.  Is this an opportunity to rename the default 
query handler to /query for the v2 api?

Getting more detailed: It is probably a good idea to have a specific list of 
legal values for the first URL path component after /v2 ... so only a list like 
this is valid:

{code}
/v2/collection
/v2/node
/v2/cluster
/v2/core (might need this, if the implementation needs separation from 
collection)
{code}

Standards for the next path component might be different for cluster than they 
are for collection/node/core ... unless we implement a further abstraction 
where SolrCloud can be a cluster of clusters.

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2015-11-03 Thread Jeffrey Stylos (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987907#comment-14987907
 ] 

Jeffrey Stylos commented on SOLR-8029:
--

One note: for user-created resources, the REST convention is to use the name of 
the resource in plural `/v2/collections/[collection]` as opposed to 
`/v2/collection/[collection]`.

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 3 types of requests in the new API 
> * {{/solr2//*}} : Operations on specific collections 
> * {{/solr2/_cluster/*}} : Cluster-wide operations which are not specific to 
> any collections. 
> * {{/solr2/_node/*}} : Operations on the node receiving the request. This is 
> the counter part of the core admin API
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-trunk-Java8 - Build # 574 - Failure

2015-11-03 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java8/574/

1 tests failed.
FAILED:  org.apache.solr.cloud.CdcrReplicationHandlerTest.doTest

Error Message:
There are still nodes recoverying - waited for 330 seconds

Stack Trace:
java.lang.AssertionError: There are still nodes recoverying - waited for 330 
seconds
at 
__randomizedtesting.SeedInfo.seed([A4ECD6F7E97EA02A:3A86E5384C5B393]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:172)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:133)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:128)
at 
org.apache.solr.cloud.BaseCdcrDistributedZkTest.waitForRecoveriesToFinish(BaseCdcrDistributedZkTest.java:465)
at 
org.apache.solr.cloud.BaseCdcrDistributedZkTest.clearSourceCollection(BaseCdcrDistributedZkTest.java:319)
at 
org.apache.solr.cloud.CdcrReplicationHandlerTest.doTestPartialReplicationWithTruncatedTlog(CdcrReplicationHandlerTest.java:121)
at 
org.apache.solr.cloud.CdcrReplicationHandlerTest.doTest(CdcrReplicationHandlerTest.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1660)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:866)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:902)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:916)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:777)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:811)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:822)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org

[JENKINS] Lucene-Solr-5.x-Solaris (multiarch/jdk1.7.0) - Build # 160 - Failure!

2015-11-03 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Solaris/160/
Java: multiarch/jdk1.7.0 -d32 -client -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  org.apache.solr.cloud.DistributedVersionInfoTest.test

Error Message:
Captured an uncaught exception in thread: Thread[id=21689, name=Thread-10215, 
state=RUNNABLE, group=TGRP-DistributedVersionInfoTest]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=21689, name=Thread-10215, state=RUNNABLE, 
group=TGRP-DistributedVersionInfoTest]
at 
__randomizedtesting.SeedInfo.seed([DFD35E88958BFBF3:578761523B77960B]:0)
Caused by: java.lang.IllegalArgumentException: n must be positive
at __randomizedtesting.SeedInfo.seed([DFD35E88958BFBF3]:0)
at java.util.Random.nextInt(Random.java:300)
at 
org.apache.solr.cloud.DistributedVersionInfoTest$3.run(DistributedVersionInfoTest.java:202)




Build Log:
[...truncated 10839 lines...]
   [junit4] Suite: org.apache.solr.cloud.DistributedVersionInfoTest
   [junit4]   2> Creating dataDir: 
/export/home/jenkins/workspace/Lucene-Solr-5.x-Solaris/solr/build/solr-core/test/J1/temp/solr.cloud.DistributedVersionInfoTest_DFD35E88958BFBF3-001/init-core-data-001
   [junit4]   2> 2387798 INFO  
(SUITE-DistributedVersionInfoTest-seed#[DFD35E88958BFBF3]-worker) [] 
o.a.s.BaseDistributedSearchTestCase Setting hostContext system property: /
   [junit4]   2> 2387801 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.ZkTestServer STARTING ZK TEST SERVER
   [junit4]   2> 2387801 INFO  (Thread-10131) [] o.a.s.c.ZkTestServer 
client port:0.0.0.0/0.0.0.0:0
   [junit4]   2> 2387801 INFO  (Thread-10131) [] o.a.s.c.ZkTestServer 
Starting server
   [junit4]   2> 2387901 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.ZkTestServer start zk server on port:35026
   [junit4]   2> 2387901 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider
   [junit4]   2> 2387902 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper
   [junit4]   2> 2387906 INFO  (zkCallback-3239-thread-1) [] 
o.a.s.c.c.ConnectionManager Watcher 
org.apache.solr.common.cloud.ConnectionManager@17a8e79 name:ZooKeeperConnection 
Watcher:127.0.0.1:35026 got event WatchedEvent state:SyncConnected type:None 
path:null path:null type:None
   [junit4]   2> 2387907 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
   [junit4]   2> 2387907 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.SolrZkClient Using default ZkACLProvider
   [junit4]   2> 2387907 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.SolrZkClient makePath: /solr
   [junit4]   2> 2387914 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider
   [junit4]   2> 2387914 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper
   [junit4]   2> 2387916 INFO  (zkCallback-3240-thread-1) [] 
o.a.s.c.c.ConnectionManager Watcher 
org.apache.solr.common.cloud.ConnectionManager@d98485 name:ZooKeeperConnection 
Watcher:127.0.0.1:35026/solr got event WatchedEvent state:SyncConnected 
type:None path:null path:null type:None
   [junit4]   2> 2387916 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
   [junit4]   2> 2387917 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.SolrZkClient Using default ZkACLProvider
   [junit4]   2> 2387917 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.SolrZkClient makePath: /collections/collection1
   [junit4]   2> 2387920 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.SolrZkClient makePath: /collections/collection1/shards
   [junit4]   2> 2387923 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.SolrZkClient makePath: /collections/control_collection
   [junit4]   2> 2387925 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.c.SolrZkClient makePath: /collections/control_collection/shards
   [junit4]   2> 2387928 INFO  
(TEST-DistributedVersionInfoTest.test-seed#[DFD35E88958BFBF3]) [] 
o.a.s.c.AbstractZkTestCase put 
/export/home/jenkins/workspace/Lucene-Solr-5.x-Solaris/solr/core/src/test-files/solr/collection1/conf/solrconfig-tlog.xml
 to /configs/conf1/solrconfig.xml
   [junit4]   2> 2387928 INFO  
(TEST-DistributedVersionInfoT

[jira] [Commented] (SOLR-7495) Unexpected docvalues type NUMERIC when grouping by a int facet

2015-11-03 Thread Kevin Cunningham (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988029#comment-14988029
 ] 

Kevin Cunningham commented on SOLR-7495:


Changing the type to multi int allowed us to temporarily work around the issue.

> Unexpected docvalues type NUMERIC when grouping by a int facet
> --
>
> Key: SOLR-7495
> URL: https://issues.apache.org/jira/browse/SOLR-7495
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3
>Reporter: Fabio Batista da Silva
> Attachments: SOLR-7495.patch
>
>
> Hey All,
> After upgrading from solr 4.10 to 5.1 with solr could
> I'm getting a IllegalStateException when i try to facet a int field.
> IllegalStateException: unexpected docvalues type NUMERIC for field 'year' 
> (expected=SORTED). Use UninvertingReader or index with docvalues.
> schema.xml
> {code}
> 
> 
> 
> 
> 
> 
>  multiValued="false" required="true"/>
>  multiValued="false" required="true"/>
> 
> 
>  stored="true"/>
> 
> 
> 
>  />
>  sortMissingLast="true"/>
>  positionIncrementGap="0"/>
>  positionIncrementGap="0"/>
>  positionIncrementGap="0"/>
>  precisionStep="0" positionIncrementGap="0"/>
>  positionIncrementGap="0"/>
>  positionIncrementGap="100">
> 
> 
>  words="stopwords.txt" />
> 
>  maxGramSize="15"/>
> 
> 
> 
>  words="stopwords.txt" />
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
>  positionIncrementGap="100">
> 
> 
>  words="stopwords.txt" />
> 
>  maxGramSize="15"/>
> 
> 
> 
>  words="stopwords.txt" />
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
>  class="solr.SpatialRecursivePrefixTreeFieldType" geo="true" 
> distErrPct="0.025" maxDistErr="0.09" units="degrees" />
> 
> id
> name
> 
> 
> {code}
> query :
> {code}
> http://solr.dev:8983/solr/my_collection/select?wt=json&fl=id&fq=index_type:foobar&group=true&group.field=year_make_model&group.facet=true&facet=true&facet.field=year
> {code}
> Exception :
> {code}
> ull:org.apache.solr.common.SolrException: Exception during facet.field: year
> at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:627)
> at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:612)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:566)
> at 
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:637)
> at 
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:280)
> at 
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:106)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection

[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 3712 - Failure

2015-11-03 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3712/

1 tests failed.
FAILED:  org.apache.solr.cloud.HttpPartitionTest.test

Error Message:
Captured an uncaught exception in thread: Thread[id=5912, 
name=SocketProxy-Response-43008:55280, state=RUNNABLE, 
group=TGRP-HttpPartitionTest]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=5912, name=SocketProxy-Response-43008:55280, 
state=RUNNABLE, group=TGRP-HttpPartitionTest]
at 
__randomizedtesting.SeedInfo.seed([325B00BED6F38A97:BA0F3F64780FE76F]:0)
Caused by: java.lang.RuntimeException: java.net.SocketException: Socket is 
closed
at __randomizedtesting.SeedInfo.seed([325B00BED6F38A97]:0)
at 
org.apache.solr.cloud.SocketProxy$Bridge$Pump.run(SocketProxy.java:347)
Caused by: java.net.SocketException: Socket is closed
at java.net.Socket.setSoTimeout(Socket.java:1101)
at 
org.apache.solr.cloud.SocketProxy$Bridge$Pump.run(SocketProxy.java:344)




Build Log:
[...truncated 10311 lines...]
   [junit4] Suite: org.apache.solr.cloud.HttpPartitionTest
   [junit4]   2> Creating dataDir: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/build/solr-core/test/J2/temp/solr.cloud.HttpPartitionTest_325B00BED6F38A97-001/init-core-data-001
   [junit4]   2> 554327 INFO  
(SUITE-HttpPartitionTest-seed#[325B00BED6F38A97]-worker) [] 
o.a.s.BaseDistributedSearchTestCase Setting hostContext system property: /
   [junit4]   2> 554331 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.ZkTestServer STARTING ZK TEST SERVER
   [junit4]   2> 554349 INFO  (Thread-2398) [] o.a.s.c.ZkTestServer client 
port:0.0.0.0/0.0.0.0:0
   [junit4]   2> 554349 INFO  (Thread-2398) [] o.a.s.c.ZkTestServer 
Starting server
   [junit4]   2> 554449 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.ZkTestServer start zk server on port:56687
   [junit4]   2> 554450 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider
   [junit4]   2> 554454 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper
   [junit4]   2> 554468 INFO  (zkCallback-856-thread-1) [] 
o.a.s.c.c.ConnectionManager Watcher 
org.apache.solr.common.cloud.ConnectionManager@73678f31 
name:ZooKeeperConnection Watcher:127.0.0.1:56687 got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None
   [junit4]   2> 554468 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
   [junit4]   2> 554468 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.SolrZkClient Using default ZkACLProvider
   [junit4]   2> 554468 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.SolrZkClient makePath: /solr
   [junit4]   2> 554476 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider
   [junit4]   2> 554484 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper
   [junit4]   2> 554485 INFO  (zkCallback-857-thread-1) [] 
o.a.s.c.c.ConnectionManager Watcher 
org.apache.solr.common.cloud.ConnectionManager@7bda783c 
name:ZooKeeperConnection Watcher:127.0.0.1:56687/solr got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None
   [junit4]   2> 554485 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
   [junit4]   2> 554485 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.SolrZkClient Using default ZkACLProvider
   [junit4]   2> 554486 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.SolrZkClient makePath: /collections/collection1
   [junit4]   2> 554493 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.SolrZkClient makePath: /collections/collection1/shards
   [junit4]   2> 554494 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.SolrZkClient makePath: /collections/control_collection
   [junit4]   2> 554496 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.SolrZkClient makePath: /collections/control_collection/shards
   [junit4]   2> 554497 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.AbstractZkTestCase put 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/solr/core/src/test-files/solr/collection1/conf/solrconfig-tlog.xml
 to /configs/conf1/solrconfig.xml
   [junit4]   2> 554497 INFO  
(TEST-HttpPartitionTest.test-seed#[325B00BED6F38A97]) [] 
o.a.s.c.c.SolrZkClient makePath: /configs/conf1/s

[JENKINS] Lucene-Solr-5.x-Linux (32bit/jdk1.8.0_66) - Build # 14486 - Failure!

2015-11-03 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/14486/
Java: 32bit/jdk1.8.0_66 -client -XX:+UseSerialGC

3 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.cloud.TestAuthenticationFramework

Error Message:
68 threads leaked from SUITE scope at 
org.apache.solr.cloud.TestAuthenticationFramework: 1) Thread[id=10072, 
name=qtp6065673-10072-selector-ServerConnectorManager@608b9c/0, state=RUNNABLE, 
group=TGRP-TestAuthenticationFramework] at 
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at 
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at 
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at 
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) at 
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at 
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101) at 
org.eclipse.jetty.io.SelectorManager$ManagedSelector.select(SelectorManager.java:600)
 at 
org.eclipse.jetty.io.SelectorManager$ManagedSelector.run(SelectorManager.java:549)
 at 
org.eclipse.jetty.util.thread.NonBlockingThread.run(NonBlockingThread.java:52)  
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) 
at java.lang.Thread.run(Thread.java:745)2) Thread[id=10088, 
name=qtp4493341-10088, state=TIMED_WAITING, 
group=TGRP-TestAuthenticationFramework] at sun.misc.Unsafe.park(Native 
Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:389) 
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:531)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.access$700(QueuedThreadPool.java:47)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:590) 
at java.lang.Thread.run(Thread.java:745)3) Thread[id=10065, 
name=qtp7290192-10065-selector-ServerConnectorManager@1faae0e/1, 
state=RUNNABLE, group=TGRP-TestAuthenticationFramework] at 
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at 
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at 
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at 
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) at 
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at 
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101) at 
org.eclipse.jetty.io.SelectorManager$ManagedSelector.select(SelectorManager.java:600)
 at 
org.eclipse.jetty.io.SelectorManager$ManagedSelector.run(SelectorManager.java:549)
 at 
org.eclipse.jetty.util.thread.NonBlockingThread.run(NonBlockingThread.java:52)  
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) 
at java.lang.Thread.run(Thread.java:745)4) Thread[id=10104, 
name=org.eclipse.jetty.server.session.HashSessionManager@1ba3ad9Timer, 
state=TIMED_WAITING, group=TGRP-TestAuthenticationFramework] at 
sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)5) Thread[id=10167, 
name=Thread-4826, state=WAITING, group=TGRP-TestAuthenticationFramework]
 at java.lang.Object.wait(Native Method) at 
java.lang.Object.wait(Object.java:502) at 
org.apache.solr.core.CloserThread.run(CoreContainer.java:1155)6) 
Thread[id=10085, 
name=qtp15976446-10085-acceptor-0@2406f7-ServerConnector@1d459d3{HTTP/1.1}{127.0.0.1:60543},
 state=RUNNABLE, group=TGRP-TestAuthenticationFramework] at 
sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) 
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) 
 

[jira] [Commented] (SOLR-6304) Transforming and Indexing custom JSON data

2015-11-03 Thread Kelly Kagen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988279#comment-14988279
 ] 

Kelly Kagen commented on SOLR-6304:
---

I'm having some difficulty while indexing custom JSON data using v5.3.1. I took 
the same example from the documentation, but it doesn't seem to be working as 
expected. Can someone validate if this is a bug or there's an issue with the 
procedure followed? The below are the scenarios.

Source: [Indexing custom JSON 
data|http://lucidworks.com/blog/2014/08/12/indexing-custom-json-data], 
[Transforming and Indexing Custom 
JSON|https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-TransformingandIndexingCustomJSON]

*Note:* The echo parameter has been added.

*Input:*
{code}
curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/exams'
'&f=first:/first'
'&f=last:/last'
'&f=grade:/grade'
'&f=subject:/exams/subject'
'&f=test:/exams/test'
'&f=marks:/exams/marks'
'&echo=true'
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
  {
"subject": "Maths",
"test"   : "term1",
"marks":90},
{
 "subject": "Biology",
 "test"   : "term1",
 "marks":86}
  ]
}'
{code}

*Output:*
{code}
{
  "error":{
"msg":"Raw data can be stored only if split=/",
"code":400
  }
}
{code}

Say I pass only '/' to the split parameter as reported, but with different 
field mappping, it doesn't seem to index the data per mentioned fields. Notice 
the suffix 'Name' added in the input JSON and also the field mapping.

*Input:*
{code}
curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/'
'&f=first:/firstName'
'&f=last:/lastName'
'&f=grade:/grade'
'&f=subject:/exams/subjectName'
'&f=test:/exams/test'
'&f=marks:/exams/marks'
'&echo=true'
 -H 'Content-type:application/json' -d '
{
  "firstName": "John",
  "lastName": "Doe",
  "grade": 8,
  "exams": [
  {
"subjectName": "Maths",
"test"   : "term1",
"marks":90},
{
 "subject": "Biology",
 "test"   : "term1",
 "marks":86}
  ]
}'
{code}

*Output:*
{code}
{"responseHeader":{"status":0,"QTime":0},"docs":[{"id":"3c5fa5a0-ff71-4fef-b3e9-8e279cc0d724","_src_":"{
  \"firstName\": \"John\",  \"lastName\": \"Doe\",  \"grade\": 8,  \"exams\": [ 
 {\"subjectName\": \"Maths\",\"test\"   : \"term1\",
\"marks\":90},{ \"subject\": \", \"test\"   : 
\"term1\", \"marks\":86}  
]}","text":["John","Doe",8,"Maths",["term1","term1"],[90,86]]}]}
{code}

If there is a field named "id" is present then that reflects in the reponse, 
but all other fields are ignored for some reason.

*Input:*
{code}
curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/'
'&f=first:/firstName'
'&f=id:/lastName'
'&f=grade:/grade'
'&f=subject:/exams/subjectName'
'&f=test:/exams/test'
'&f=marks:/exams/marks'
'&echo=true'
 -H 'Content-type:application/json' -d '
{
  "firstName": "John",
  "lastName": "Doe",
  "grade": 8,
  "exams": [
  {
"subjectName": "Maths",
"test"   : "term1",
"marks":90},
{
 "subject": "Biology",
 "test"   : "term1",
 "marks":86}
  ]
}'
{code}

*Output:*
{code}
{"responseHeader":{"status":0,"QTime":1},"docs":[{"id":"Doe","_src_":"{  
\"firstName\": \"John\",  \"lastName\": \"Doe\",  \"grade\": 8,  \"exams\": [   
   {\"subjectName\": \"Maths\",\"test\"   : \"term1\",
\"marks\":90},{ \"subject\": \", \"test\"   : 
\"term1\", \"marks\":86}  
]}","text":["John","Doe",8,"Maths",["term1","term1"],[90,86]]}]}
{code}

> Transforming and Indexing custom JSON data
> --
>
> Key: SOLR-6304
> URL: https://issues.apache.org/jira/browse/SOLR-6304
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.10, Trunk
>
> Attachments: SOLR-6304.patch, SOLR-6304.patch
>
>
> example
> {noformat}
> curl 
> localhost:8983/update/json/docs?split=/batters/batter&f=recipeId:/id&f=recipeType:/type&f=id:/batters/batter/id&f=type:/batters/batter/type
>  -d '
> {
>   "id": "0001",
>   "type": "donut",
>   "name": "Cake",
>   "ppu": 0.55,
>   "batters": {
>   "batter":
>   [
>   { "id": "1001", "type": 
> "Regular" },
>   { "id": "1002", "type": 
> "Chocolate" },
>   { "id": "1003", "type": 
> "Blueberry" },
> 

[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988290#comment-14988290
 ] 

Steve Rowe commented on LUCENE-6874:


bq. I propose that we consolidate the TokenizerFactories here into one – the 
existing WhitespaceTokenizerFactory. I think this is more user friendly. An 
attribute could select which whitespace definition the user wants: "java" or 
"unicode". What do you think?

Implicitly then, you're nixing ICUWhitespaceTokenizer, since it can't be in 
analyzers-common.

I'm okay with adding a param to WhitespaceTokenizerFactory (not sure what to 
name it though: "authority"/"style"/"definition"?) .  Since the default 
wouldn't change ("java" would be the default I assume), I don't think we need 
to introduce luceneMatchVersion. 

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6304) Transforming and Indexing custom JSON data

2015-11-03 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988312#comment-14988312
 ] 

Alexandre Rafalovitch commented on SOLR-6304:
-

Seems like a conflict with SOLR-6633 feature (store JSON as a blob). Check your 
solrconfig.xml for srcField and remove it. 

[~noble.paul]I can debug, but I can't explain it. Should these two things be 
possible at once? Should we document the interplay somewhere?

> Transforming and Indexing custom JSON data
> --
>
> Key: SOLR-6304
> URL: https://issues.apache.org/jira/browse/SOLR-6304
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.10, Trunk
>
> Attachments: SOLR-6304.patch, SOLR-6304.patch
>
>
> example
> {noformat}
> curl 
> localhost:8983/update/json/docs?split=/batters/batter&f=recipeId:/id&f=recipeType:/type&f=id:/batters/batter/id&f=type:/batters/batter/type
>  -d '
> {
>   "id": "0001",
>   "type": "donut",
>   "name": "Cake",
>   "ppu": 0.55,
>   "batters": {
>   "batter":
>   [
>   { "id": "1001", "type": 
> "Regular" },
>   { "id": "1002", "type": 
> "Chocolate" },
>   { "id": "1003", "type": 
> "Blueberry" },
>   { "id": "1004", "type": 
> "Devil's Food" }
>   ]
>   }
> }'
> {noformat}
> should produce the following output docs
> {noformat}
> { "recipeId":"001", "recipeType":"donut", "id":"1001", "type":"Regular" }
> { "recipeId":"001", "recipeType":"donut", "id":"1002", "type":"Chocolate" }
> { "recipeId":"001", "recipeType":"donut", "id":"1003", "type":"Blueberry" }
> { "recipeId":"001", "recipeType":"donut", "id":"1004", "type":"Devil's food" }
> {noformat}
> the split param is the element in the tree where it should be split into 
> multiple docs. The 'f' are field name mappings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988338#comment-14988338
 ] 

Jack Krupansky commented on LUCENE-6874:


Certainly Solr can update its example schemas to use whatever alternative 
tokenizer or option is decided on so that Solr users, many of whom are not Java 
developers, will no longer fall into this NBSP trap, but... that still feels 
like a less than desirable resolution.

[~thetaphi], could you elaborate more specifically on the existing use case 
that you are trying to preserve? I mean, like in terms of a real-world example. 
Where do some of your NBSPs actually live in the wild?

It seems to me that the vast majority of normal users would not be negatively 
impacted by having "white space" be defined using the Unicode model. I never 
objected to using the Java model, but that's because I had overlooked this 
nuance of NBSP. My concern for Solr users is that NBSP occurs somewhat commonly 
in HTML web pages - as a formatting technique more than an attempt at 
influencing tokenization.


> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-5.x - Build # 1006 - Still Failing

2015-11-03 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/1006/

1 tests failed.
FAILED:  org.apache.lucene.search.TestGeoPointQuery.testRandomBig

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
__randomizedtesting.SeedInfo.seed([F40CBA610A991C8F:735BC7EE9BC0600F]:0)
at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:354)
at 
org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectField.(DirectPostingsFormat.java:363)
at 
org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectFields.(DirectPostingsFormat.java:131)
at 
org.apache.lucene.codecs.memory.DirectPostingsFormat.fieldsProducer(DirectPostingsFormat.java:115)
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:261)
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:341)
at 
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:104)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:65)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:655)
at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:278)
at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:757)
at org.apache.lucene.util.IOUtils.close(IOUtils.java:97)
at org.apache.lucene.util.IOUtils.close(IOUtils.java:84)
at 
org.apache.lucene.util.BaseGeoPointTestCase.verify(BaseGeoPointTestCase.java:766)
at 
org.apache.lucene.util.BaseGeoPointTestCase.doTestRandom(BaseGeoPointTestCase.java:399)
at 
org.apache.lucene.util.BaseGeoPointTestCase.testRandomBig(BaseGeoPointTestCase.java:327)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1660)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:866)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:902)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:916)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)




Build Log:
[...truncated 8503 lines...]
   [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
   [junit4]   2> NOTE: download the large Jenkins line-docs file by running 
'ant get-jenkins-line-docs' in the lucene directory.
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery 
-Dtests.method=testRandomBig -Dtests.seed=F40CBA610A991C8F -Dtests.multiplier=2 
-Dtests.nightly=true -Dtests.slow=true 
-Dtests.linedocsfile=/x1/jenkins/lucene-data/enwiki.random.lines.txt 
-Dtests.locale=pl_PL -Dtests.timezone=WET -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] ERROR961s J1 | TestGeoPointQuery.testRandomBig <<<
   [junit4]> Throwable #1: java.lang.OutOfMemoryError: Java heap space
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([F40CBA610A991C8F:735BC7EE9BC0600F]:0)
   [junit4]>at 
org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:354)
   [junit4]>at 
org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectField.(DirectPostingsFormat.java:363)
   [junit4]>at 
org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectFields.(DirectPostingsFormat.java:131)
   [junit4]>at 
org.apache.lucene.codecs.memory.DirectPostingsFormat.fieldsProducer(DirectPostingsFormat.java:115)
   [junit4]>at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:261)
   [junit4]>at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:341)
   [junit4]>at 
org.apache.lucene.ind

[jira] [Comment Edited] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988404#comment-14988404
 ] 

Steve Rowe edited comment on LUCENE-6874 at 11/3/15 11:10 PM:
--

bq. My concern for Solr users is that NBSP occurs somewhat commonly in HTML web 
pages - as a formatting technique more than an attempt at influencing 
tokenization.

FYI, {{\ }} is converted to U+0020 by {{HTMLStripCharFilter}}.


was (Author: steve_rowe):
bq. My concern for Solr users is that NBSP occurs somewhat commonly in HTML web 
pages - as a formatting technique more than an attempt at influencing 
tokenization.

FYI, {{ }} is converted to U+0020 by {{HTMLStripCharFilter}}.

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988404#comment-14988404
 ] 

Steve Rowe commented on LUCENE-6874:


bq. My concern for Solr users is that NBSP occurs somewhat commonly in HTML web 
pages - as a formatting technique more than an attempt at influencing 
tokenization.

FYI, {{ }} is converted to U+0020 by {{HTMLStripCharFilter}}.

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988404#comment-14988404
 ] 

Steve Rowe edited comment on LUCENE-6874 at 11/3/15 11:11 PM:
--

bq. My concern for Solr users is that NBSP occurs somewhat commonly in HTML web 
pages - as a formatting technique more than an attempt at influencing 
tokenization.

FYI, {{ \;}} is converted to U+0020 by {{HTMLStripCharFilter}}.


was (Author: steve_rowe):
bq. My concern for Solr users is that NBSP occurs somewhat commonly in HTML web 
pages - as a formatting technique more than an attempt at influencing 
tokenization.

FYI, {{\ }} is converted to U+0020 by {{HTMLStripCharFilter}}.

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988404#comment-14988404
 ] 

Steve Rowe edited comment on LUCENE-6874 at 11/3/15 11:12 PM:
--

bq. My concern for Solr users is that NBSP occurs somewhat commonly in HTML web 
pages - as a formatting technique more than an attempt at influencing 
tokenization.

FYI, {{ }} is converted to U+0020 by {{HTMLStripCharFilter}}.


was (Author: steve_rowe):
bq. My concern for Solr users is that NBSP occurs somewhat commonly in HTML web 
pages - as a formatting technique more than an attempt at influencing 
tokenization.

FYI, {{ \;}} is converted to U+0020 by {{HTMLStripCharFilter}}.

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6878) TopDocs.merge should use updateTop instead of pop / add

2015-11-03 Thread Daniel Jelinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Jelinski updated LUCENE-6878:

Attachment: speedtest.tar.gz

I merged 64 ScoreDoc lists, 100k docs each, and took top 100k results, in 3 
different score distributions. For time calculations, each test was repeated 60 
times, and I averaged the results of 10 subsequent runs, discarding any 
outliers. For the number of lessThan calls in random case, I run the test 3 
times and took an average. The number of lessThan calls for case 1 and 2 is 
constant.
I tested score lists generated using 3 different methods.
1) All scores equal to 1. This is the case where the patch made a greatest 
difference, mostly because of tie breaks in lessThan methods. Results:
Without the patch - 2.66 msec per merge call, 1600057 calls to lessThan
With the patch - 0.32 msec per merge call, 200071 calls to lessThan. 
Overall, ~88% savings on time and lessThan calls
2) Each list contains scores 10,9,,1
Without the patch - 3.5 msec per merge call, 1100063 calls to lessThan
With the patch - 2.6 msec per merge call, 1005390 calls to lessThan
Overall, ~25% savings on time, 9% savings on lessThan calls
3) Each list starts with doc with score 10, score of other docs is 
calculated as score of previous doc minus Math.random()
Without the patch - 3.5 msec per merge call, ~1156500 calls to lessThan
With the patch, 2.7 msec per merge call, ~960500 calls to lessThan
Overall, ~23% savings on time, 17% savings on lessThan calls.

In the random case the gain is much less than the advertised double speed, but 
it's still a net improvement.
I attached the code I used to measure the speed, in case anyone is interested. 
Fair warning, it's not pretty.

> TopDocs.merge should use updateTop instead of pop / add
> ---
>
> Key: LUCENE-6878
> URL: https://issues.apache.org/jira/browse/LUCENE-6878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: Trunk
>Reporter: Daniel Jelinski
>Assignee: Adrien Grand
>Priority: Trivial
> Fix For: 6.0, 5.4
>
> Attachments: LUCENE-6878.patch, speedtest.tar.gz
>
>
> The function TopDocs.merge uses PriorityQueue in a pattern: pop, update value 
> (ref.hitIndex++), add. JavaDocs for PriorityQueue.updateTop say that using 
> this function instead should be at least twice as fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7981) term based ValueSourceParsers should support an option to run an analyzer for hte specified field on the input

2015-11-03 Thread Jason Gerlowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988595#comment-14988595
 ] 

Jason Gerlowski commented on SOLR-7981:
---

Haha, funny; I've definitely been there.

I also don't have a huge opinion about adding this option.  I didn't pick this 
up because I wanted the feature in Solr; I just wanted to learn how to work on 
Solr.  And it's been a good first introduction, so "SUCCESS" on that front.  if 
there's a consensus that this is a thing people would like to have, I'm happy 
to keep working on it (should I assign myself on this JIRA? Or is that only for 
commiters?)  If we *do* think this would be useful for people, I could use a 
bit of clarification on what the desired behavior actually is.  If not, should 
I close this JIRA?

Questions about 'Desired' Behavior:

1.) Currently, analysis is only done on things that ValueSourceParser 
identifies as being TextFields.  Are numeric/date/other fields typically 
analyzed?  If so, do we want them to be analyzed here too?  Even among fields 
containing text, this doesn't cover as much as I'd expect.  For example, I was 
writing some tests for this stuff and tried to use a field like:

{{

   

  
  







  

   
}}

but it turns out that it wasn't being analyzed by the current ValueSourceParser 
code.  Maybe this is just me being new to Solr, but I expected this to be 
considered a "TextField" by the code.

2.) Do we care whether the input-value gets analyzed to > 1 token?  The initial 
bug description mentioned error handling for this, but I didn't see any special 
error-handling for this in the default-to-query-analyzer case that's already in 
the code.

Thanks for any clarification anyone can give.  Still getting used to the 
process of working on these things.

> term based ValueSourceParsers should support an option to run an analyzer for 
> hte specified field on the input
> --
>
> Key: SOLR-7981
> URL: https://issues.apache.org/jira/browse/SOLR-7981
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>  Labels: newdev
> Attachments: SOLR-7981.patch
>
>
> The following functions all take exactly 2 arguments: a field name, and a 
> term value...
> * idf
> * termfreq
> * tf
> * totaltermfreq
> ...we should consider adding an optional third argument to indicate if an 
> analyzer for the specified field should be used on the input to find the real 
> "Term" to consider.
> For example, the following might all result in equivilent numeric values for 
> all docs assuming simple plural stemming and lowercasing...
> {noformat}
> termfreq(foo_t,'Bicycles',query) // use the query analyzer for field foo_t on 
> input Bicycles
> termfreq(foo_t,'Bicycles',index) // use the index analyzer for field foo_t on 
> input Bicycles
> termfreq(foo_t,'bicycle',none) // no analyzer used to construct Term
> termfreq(foo_t,'bicycle') // legacy 2 arg syntax, same as 'none'
> {noformat}
> (Special error checking needed if analyzer creates more then one term for the 
> given input string)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7981) term based ValueSourceParsers should support an option to run an analyzer for hte specified field on the input

2015-11-03 Thread Jason Gerlowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988595#comment-14988595
 ] 

Jason Gerlowski edited comment on SOLR-7981 at 11/4/15 12:32 AM:
-

Haha, funny; I've definitely been there.

I also don't have a huge opinion about adding this option.  I didn't pick this 
up because I wanted the feature in Solr; I just wanted to learn how to work on 
Solr.  And it's been a good first introduction, so "SUCCESS" on that front.  if 
there's a consensus that this is a thing people would like to have, I'm happy 
to keep working on it (should I assign myself on this JIRA? Or is that only for 
commiters?)  If we *do* think this would be useful for people, I could use a 
bit of clarification on what the desired behavior actually is.  If not, should 
I close this JIRA?

Questions about 'Desired' Behavior:

1.) Currently, analysis is only done on things that ValueSourceParser 
identifies as being TextFields.  Are numeric/date/other fields typically 
analyzed?  If so, do we want them to be analyzed here too?  Even among fields 
containing text, this doesn't cover as much as I'd expect.  For example, I was 
writing some tests for this stuff and tried to use a field like:

{{

   

  
  







  

   }}

but it turns out that it wasn't being analyzed by the current ValueSourceParser 
code.  Maybe this is just me being new to Solr, but I expected this to be 
considered a "TextField" by the code.

2.) Do we care whether the input-value gets analyzed to > 1 token?  The initial 
bug description mentioned error handling for this, but I didn't see any special 
error-handling for this in the default-to-query-analyzer case that's already in 
the code.

Thanks for any clarification anyone can give.  Still getting used to the 
process of working on these things.


was (Author: gerlowskija):
Haha, funny; I've definitely been there.

I also don't have a huge opinion about adding this option.  I didn't pick this 
up because I wanted the feature in Solr; I just wanted to learn how to work on 
Solr.  And it's been a good first introduction, so "SUCCESS" on that front.  if 
there's a consensus that this is a thing people would like to have, I'm happy 
to keep working on it (should I assign myself on this JIRA? Or is that only for 
commiters?)  If we *do* think this would be useful for people, I could use a 
bit of clarification on what the desired behavior actually is.  If not, should 
I close this JIRA?

Questions about 'Desired' Behavior:

1.) Currently, analysis is only done on things that ValueSourceParser 
identifies as being TextFields.  Are numeric/date/other fields typically 
analyzed?  If so, do we want them to be analyzed here too?  Even among fields 
containing text, this doesn't cover as much as I'd expect.  For example, I was 
writing some tests for this stuff and tried to use a field like:

{{

   

  
  







  

   
}}

but it turns out that it wasn't being analyzed by the current ValueSourceParser 
code.  Maybe this is just me being new to Solr, but I expected this to be 
considered a "TextField" by the code.

2.) Do we care whether the input-value gets analyzed to > 1 token?  The initial 
bug description mentioned error handling for this, but I didn't see any special 
error-handling for this in the default-to-query-analyzer case that's already in 
the code.

Thanks for any clarification anyone can give.  Still getting used to the 
process of working on these things.

> term based ValueSourceParsers should support an option to run an analyzer for 
> hte specified field on the input
> --
>
> Key: SOLR-7981
> URL: https://issues.apache.org/jira/browse/SOLR-7981
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>  Labels: newdev
> Attachments: SOLR-7981.patch
>
>
> The following functions all take exactly 2 arguments: a field name, and a 
> term value...
> * idf
> * termfreq
> * tf
> * totaltermfreq
> ...we should consider adding an optional third argument to indicate if an 
> analyzer for the specified field should be used on the input to find the real 
> "Term" to consider.
> For example, the following might all result in equivilent numeric values for 
> all docs assuming simple plural stemming and lowercasing...
> {noformat}
> termfreq(foo_t,'Bicycles',query) // use the query analyzer for field foo_t on 
> input Bicycles
> termfreq(foo_t,'Bicycles',index) // use the index analyzer for field foo_t on 
> input Bicycles
> termfreq(foo_t,'bicycle',none) // no analyzer used to construct Ter

[jira] [Comment Edited] (SOLR-7981) term based ValueSourceParsers should support an option to run an analyzer for hte specified field on the input

2015-11-03 Thread Jason Gerlowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988595#comment-14988595
 ] 

Jason Gerlowski edited comment on SOLR-7981 at 11/4/15 12:35 AM:
-

Haha, funny; I've definitely been there.

I also don't have a huge opinion about adding this option.  I didn't pick this 
up because I wanted the feature in Solr; I just wanted to learn how to work on 
Solr.  And it's been a good first introduction, so "SUCCESS" on that front.  if 
there's a consensus that this is a thing people would like to have, I'm happy 
to keep working on it (should I assign myself on this JIRA? Or is that only for 
commiters?)  If we *do* think this would be useful for people, I could use a 
bit of clarification on what the desired behavior actually is.  If not, should 
I close this JIRA?

Questions about 'Desired' Behavior:

1.) Currently, analysis is only done on things that ValueSourceParser 
identifies as being TextFields.  Are numeric/date/other fields typically 
analyzed?  If so, do we want them to be analyzed here too?  Even among fields 
containing text, this doesn't cover as much as I'd expect.  For example, I was 
writing some tests for this stuff and tried to use a field like:



   

  
  







  

   

(Sorry, couldn't figure out how to format that as code, I used "{{ }}" but it 
didn't seem to work.

but it turns out that it wasn't being analyzed by the current ValueSourceParser 
code.  Maybe this is just me being new to Solr, but I expected this to be 
considered a "TextField" by the code.

2.) Do we care whether the input-value gets analyzed to > 1 token?  The initial 
bug description mentioned error handling for this, but I didn't see any special 
error-handling for this in the default-to-query-analyzer case that's already in 
the code.

Thanks for any clarification anyone can give.  Still getting used to the 
process of working on these things.


was (Author: gerlowskija):
Haha, funny; I've definitely been there.

I also don't have a huge opinion about adding this option.  I didn't pick this 
up because I wanted the feature in Solr; I just wanted to learn how to work on 
Solr.  And it's been a good first introduction, so "SUCCESS" on that front.  if 
there's a consensus that this is a thing people would like to have, I'm happy 
to keep working on it (should I assign myself on this JIRA? Or is that only for 
commiters?)  If we *do* think this would be useful for people, I could use a 
bit of clarification on what the desired behavior actually is.  If not, should 
I close this JIRA?

Questions about 'Desired' Behavior:

1.) Currently, analysis is only done on things that ValueSourceParser 
identifies as being TextFields.  Are numeric/date/other fields typically 
analyzed?  If so, do we want them to be analyzed here too?  Even among fields 
containing text, this doesn't cover as much as I'd expect.  For example, I was 
writing some tests for this stuff and tried to use a field like:

{{

   

  
  







  

   }}

but it turns out that it wasn't being analyzed by the current ValueSourceParser 
code.  Maybe this is just me being new to Solr, but I expected this to be 
considered a "TextField" by the code.

2.) Do we care whether the input-value gets analyzed to > 1 token?  The initial 
bug description mentioned error handling for this, but I didn't see any special 
error-handling for this in the default-to-query-analyzer case that's already in 
the code.

Thanks for any clarification anyone can give.  Still getting used to the 
process of working on these things.

> term based ValueSourceParsers should support an option to run an analyzer for 
> hte specified field on the input
> --
>
> Key: SOLR-7981
> URL: https://issues.apache.org/jira/browse/SOLR-7981
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>  Labels: newdev
> Attachments: SOLR-7981.patch
>
>
> The following functions all take exactly 2 arguments: a field name, and a 
> term value...
> * idf
> * termfreq
> * tf
> * totaltermfreq
> ...we should consider adding an optional third argument to indicate if an 
> analyzer for the specified field should be used on the input to find the real 
> "Term" to consider.
> For example, the following might all result in equivilent numeric values for 
> all docs assuming simple plural stemming and lowercasing...
> {noformat}
> termfreq(foo_t,'Bicycles',query) // use the query analyzer for field foo_t on 
> input Bicycles
> termfreq(foo_t,'Bicycles',index) // use the index analyzer for field foo_

[jira] [Comment Edited] (SOLR-7981) term based ValueSourceParsers should support an option to run an analyzer for hte specified field on the input

2015-11-03 Thread Jason Gerlowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988595#comment-14988595
 ] 

Jason Gerlowski edited comment on SOLR-7981 at 11/4/15 12:36 AM:
-

Haha, funny; I've definitely been there.

I also don't have a huge opinion about adding this option.  I didn't pick this 
up because I wanted the feature in Solr; I just wanted to learn how to work on 
Solr.  And it's been a good first introduction, so "SUCCESS" on that front.  if 
there's a consensus that this is a thing people would like to have, I'm happy 
to keep working on it (should I assign myself on this JIRA? Or is that only for 
commiters?)  If we *do* think this would be useful for people, I could use a 
bit of clarification on what the desired behavior actually is.  If not, should 
I close this JIRA?

Questions about 'Desired' Behavior:

1.) Currently, analysis is only done on things that ValueSourceParser 
identifies as being TextFields.  Are numeric/date/other fields typically 
analyzed?  If so, do we want them to be analyzed here too?  Even among fields 
containing text, this doesn't cover as much as I'd expect.  For example, I was 
writing some tests for this stuff and tried to use a field like:



   

  
  







  

   

(Sorry, couldn't figure out how to format that as code, I used "{{ }}" but it 
didn't seem to work.)

but it turns out that it wasn't being analyzed by the current ValueSourceParser 
code.  Maybe this is just me being new to Solr, but I expected this to be 
considered a "TextField" by the code.

2.) Do we care whether the input-value gets analyzed to > 1 token?  The initial 
bug description mentioned error handling for this, but I didn't see any special 
error-handling for this in the default-to-query-analyzer case that's already in 
the code.

Thanks for any clarification anyone can give.  Still getting used to the 
process of working on these things.


was (Author: gerlowskija):
Haha, funny; I've definitely been there.

I also don't have a huge opinion about adding this option.  I didn't pick this 
up because I wanted the feature in Solr; I just wanted to learn how to work on 
Solr.  And it's been a good first introduction, so "SUCCESS" on that front.  if 
there's a consensus that this is a thing people would like to have, I'm happy 
to keep working on it (should I assign myself on this JIRA? Or is that only for 
commiters?)  If we *do* think this would be useful for people, I could use a 
bit of clarification on what the desired behavior actually is.  If not, should 
I close this JIRA?

Questions about 'Desired' Behavior:

1.) Currently, analysis is only done on things that ValueSourceParser 
identifies as being TextFields.  Are numeric/date/other fields typically 
analyzed?  If so, do we want them to be analyzed here too?  Even among fields 
containing text, this doesn't cover as much as I'd expect.  For example, I was 
writing some tests for this stuff and tried to use a field like:



   

  
  







  

   

(Sorry, couldn't figure out how to format that as code, I used "{{ }}" but it 
didn't seem to work.

but it turns out that it wasn't being analyzed by the current ValueSourceParser 
code.  Maybe this is just me being new to Solr, but I expected this to be 
considered a "TextField" by the code.

2.) Do we care whether the input-value gets analyzed to > 1 token?  The initial 
bug description mentioned error handling for this, but I didn't see any special 
error-handling for this in the default-to-query-analyzer case that's already in 
the code.

Thanks for any clarification anyone can give.  Still getting used to the 
process of working on these things.

> term based ValueSourceParsers should support an option to run an analyzer for 
> hte specified field on the input
> --
>
> Key: SOLR-7981
> URL: https://issues.apache.org/jira/browse/SOLR-7981
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>  Labels: newdev
> Attachments: SOLR-7981.patch
>
>
> The following functions all take exactly 2 arguments: a field name, and a 
> term value...
> * idf
> * termfreq
> * tf
> * totaltermfreq
> ...we should consider adding an optional third argument to indicate if an 
> analyzer for the specified field should be used on the input to find the real 
> "Term" to consider.
> For example, the following might all result in equivilent numeric values for 
> all docs assuming simple plural stemming and lowercasing...
> {noformat}
> termfreq(foo_t,'Bicycles',query) // use the query analyzer for field foo_

[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988615#comment-14988615
 ] 

Jack Krupansky commented on LUCENE-6874:


Tika is the other (main?) approach to ingesting text from HTML web pages. I 
haven't checked exactly what it does on  .

Maybe [~dsmiley] could elaborate on which use case he was encountering that 
inspired this Jira issue.

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988646#comment-14988646
 ] 

Jack Krupansky commented on LUCENE-6874:


bq. Because WST and WDF should really only be used as a last resort.

Absolutely agreed. From a Solr user perspective we really need a much simpler 
model for semi-standard tokens out of the box without the user having to 
scratch their heads and resorting to WST in the first (last) place. LOL - maybe 
if we could eliminate this need to resort to WST, we wouldn't have to fret as 
much about WST.

bq.  I generally suggest to my users to use ClassicTokenizer

Personally, I've always refrained from recommending CT since I thought ST was 
supposed to replace it and that the email and URL support was considered an 
excess not worth keeping. I've considered CT as if it were deprecated (which it 
is not.) And, I never see anybody else recommending it on the user list. And, 
the fact that it can't handle slashes for product number is a deal killer. I'm 
not sure that I would argue in favor of resurrecting CT as a first-class 
recommendation, especially since it can't handle non-European languages, but...

That said, I do think it is worth separately (from this Jira) considering a 
fresh, new tokenizer that starts with the goodness of ST and adds in an 
approximation of the reasons that people resort to WST. Whether that can be an 
option on ST or has to be a separate tokenizer would need to be debated. I'd 
prefer an option on ST, either to simply allow embedded special characters or 
to specify a list or regex of special character to be allowed or excluded.

People would still need to combine NewT with WDF, but at least the tokenization 
would be more explicit.

Personally I would prefer to see an option for whether to retain or strip 
external punctuation vs. embedded special characters. Trailing periods and 
commas and columns and enclosing parentheses are just the kinds of things we 
had to resort to WDF for when using WST to retain embedded special characters.

And if people really want to be ambitious, a totally new tokenizer that 
subsumed the good parts of WDF would make a lot of lives of Solr users much 
easier.

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8057) Change default Sim to BM25 (w/backcompat config handling)

2015-11-03 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-8057:
---
Attachment: SOLR-8057.patch



*NOTE:* Forgot to mention last time, but the previous patch required the 
following svn copy before it can be applied...

{code}
svn cp 
solr/core/src/java/org/apache/solr/search/similarities/DefaultSimilarityFactory.java
 
solr/core/src/java/org/apache/solr/search/similarities/ClassicSimilarityFactory.java
{code}



Updated patch focusing on fixing test failures & improving coverage...

* straightforward test fixes/improvements to account for new defaults (mostly 
related to brittle score assumptions)...
** TestNonDefinedSimilarityFactory
** TestExtendedDismaxParser
** StatsComponentTest
** QueryElevationComponentTest
** TestReRankQParserPlugin
** TestGroupingSearch
* cloned TestPerFieldSimilarity as TestPerFieldSimilarityClassic
** TestPerFieldSimilarityClassic set's older luceneMatchVersion
** TestPerFieldSimilarity updated to account for new BM25 defaults
* cloned TestDefaultSimilarityFactory as TestDefaultSimilarityFactoryClassic
** TestDefaultSimilarityFactoryClassic set's older luceneMatchVersion
** TestDefaultSimilarityFactory updated to account for new BM25 defaults
** both of these tests currently trip an assert in DefaultSimilarityFactory 
because apparently nothing is calling SolreCoreAware.inform(SolrCore) on any 
per-fieldtype SimilarityFactories that implement SolreCoreAware
* added some logging to ChangedSchemaMergeTest.testOptimizeDiffSchemas to try 
and make sense of it's failure

NOTE: To apply this new patch, you'll first need to copy/move the following 
files...

{code}
svn cp 
solr/core/src/java/org/apache/solr/search/similarities/DefaultSimilarityFactory.java
 
solr/core/src/java/org/apache/solr/search/similarities/ClassicSimilarityFactory.java
svn cp 
solr/core/src/test/org/apache/solr/search/similarities/TestPerFieldSimilarity.java
 
solr/core/src/test/org/apache/solr/search/similarities/TestPerFieldSimilarityClassic.java
svn cp 
solr/core/src/test/org/apache/solr/search/similarities/TestDefaultSimilarityFactory.java
 
solr/core/src/test/org/apache/solr/search/similarities/TestDefaultSimilarityFactoryClassic.java
svn mv solr/core/src/test-files/solr/collection1/conf/schema-tfidf.xml 
solr/core/src/test-files/solr/collection1/conf/schema-sim-default.xml
{code}

Tests still failing with this patch:

* BadIndexSchemaTest.testPerFieldtypeSimButNoSchemaSimFactory
** see previous comment: The javadocs say that "IndexSchema will provide such 
error checking if a non-SchemaAware instance of SimilarityFactory" but as soon 
as i made DefaultSimilarityFactory implement SolrCoreAware (NOT SchemaAware) 
this seems to have broken
** seems like a tangentially related bug uncovered by this change.
* TestDefaultSimilarityFactoryClassic + TestDefaultSimilarityFactory
** Both of these tests currently trip an asssert in DefaultSimilarityFactory 
because aparently nothing is calling SolreCoreAware.inform(SolrCore) on any 
per-fieldtype SimilarityFactories that implement SolreCoreAware
** bug appears independent of these changes -- any schema specifying a 
per-fieldtype similarity that is SolrCoreAware should have same problem
* ChangedSchemaMergeTest.testOptimizeDiffSchemas + TestCloudSchemaless 
(threadleak due to core reload failures)
** something about the IndexSchemaFactory.buildIndexSchema + 
SolrCore.setLatestSchema code path isn't properly calling 
SolrCoreAware.inform(SolrCore) on the default similarity
** bug appears independent of these changes -- i'm pretty sure any schema 
specifying a similarity that is SolrCoreAware should have same problem



> Change default Sim to BM25 (w/backcompat config handling)
> -
>
> Key: SOLR-8057
> URL: https://issues.apache.org/jira/browse/SOLR-8057
> Project: Solr
>  Issue Type: Task
>Reporter: Hoss Man
>Assignee: Hoss Man
>Priority: Blocker
> Fix For: Trunk
>
> Attachments: SOLR-8057.patch, SOLR-8057.patch
>
>
> LUCENE-6789 changed the default similarity for IndexSearcher to BM25 and 
> renamed "DefaultSimilarity" to "ClassicSimilarity"
> Solr needs to be updated accordingly:
> * a "ClassicSimilarityFactory" should exist w/expected behavior/javadocs
> * default behavior (in 6.0) when no similarity is specified in configs should 
> (ultimately) use BM25 depending on luceneMatchVersion
> ** either by assuming BM25SimilarityFactory or by changing the internal 
> behavior of DefaultSimilarityFactory
> * comments in sample configs need updated to reflect new default behavior
> * ref guide needs updated anywhere it mentions/implies that a particular 
> similarity is used (or implies TF-IDF is used by default)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---

[jira] [Commented] (LUCENE-6659) Remove IndexWriterConfig.get/setMaxThreadStates

2015-11-03 Thread Brandon Mintern (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988686#comment-14988686
 ] 

Brandon Mintern commented on LUCENE-6659:
-

OK, thank you.

> Remove IndexWriterConfig.get/setMaxThreadStates
> ---
>
> Key: LUCENE-6659
> URL: https://issues.apache.org/jira/browse/LUCENE-6659
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.3, Trunk
>
> Attachments: LUCENE-6659.patch
>
>
> Ever since LUCENE-5644, IndexWriter will aggressively reuse its internal 
> thread states across threads, whenever one is free.
> I think this means we can safely remove the sneaky maxThreadStates limit 
> (default 8) that we have today: IW will only ever allocate as many thread 
> states as there are actual concurrent threads running through it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988791#comment-14988791
 ] 

David Smiley commented on LUCENE-6874:
--

Jack:  My use-case since you asked:  I've got a document store of content in 
XML that provides various markup around mostly text.  These documents 
occasionally have an NBSP.  I process it outside of Solr to produce the text I 
want indexed/stored -- it's not XML any more.  An NBSP entity, if found, is 
converted to the NBSP character naturally as part of Java's XML libraries (no 
explicit decision on my part).

bq. Implicitly then, you're nixing ICUWhitespaceTokenizer, since it can't be in 
analyzers-common.

Right; ah well.

RE what to name the attribute:  I suggest "definition" or even better: "rule" 
(or "ruleset")

I do think the first-line sentence of these whitespace tokenizers should point 
to what definition of whitespace is chosen.  And that they reference each other 
so that anyone stumbling on them will know of the other.

RE WDF:  I prefer WhitespaceTokenizer with WDF for not just product-id data but 
also full-text.  Full-text might contain product-ids, or have things like 
"wi-fi" and many other words, like say "thread-safe" or "co-worker" that are 
sometimes hyphenated, sometimes not; some of these might be space-separated; 
etc..  WDF is very flexible but if you use a Tokenizer like Standard* or 
Classic* then hyphen will be pre-tokenized before WDF can do its thing, 
neutering part of its benefit.  I wish WDF kept payloads and other attributes; 
but it's not the only offender here, and likewise for the bigger issue of 
positionLength.  Otherwise I'm a WDF fan :-)  Nonetheless I like some of Jack's 
ideas on a better tokenizer that subsumes WDF.

BTW, FWIW if I had to write a WhitespaceTokenizer from scratch, I'd implement 
it as a bitset for characters < 65k (this is 8KB memory).  For the remainder 
I'd use an array that is scanned; but it appears there are none beyond 65k as I 
look at a table of these char's from a quick google search.  Then a 
configurable definition loader could fill named whitespace rules and it might 
be configurable to add or remove certain codes.  But no need to bother; Steve's 
impl is fine :-)

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988805#comment-14988805
 ] 

Yonik Seeley commented on LUCENE-6874:
--

bq. I'd implement it as a bitset for characters < 65k

A single word can be useful for quickly ruling out whitespace and checking for 
common whitespace.  Example:
https://github.com/yonik/noggit/blob/master/src/main/java/org/noggit/JSONParser.java#L241

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_66) - Build # 14779 - Failure!

2015-11-03 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/14779/
Java: 32bit/jdk1.8.0_66 -server -XX:+UseG1GC

1 tests failed.
FAILED:  org.apache.solr.cloud.CdcrReplicationHandlerTest.doTest

Error Message:
Captured an uncaught exception in thread: Thread[id=8237, 
name=RecoveryThread-source_collection_shard1_replica2, state=RUNNABLE, 
group=TGRP-CdcrReplicationHandlerTest]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=8237, 
name=RecoveryThread-source_collection_shard1_replica2, state=RUNNABLE, 
group=TGRP-CdcrReplicationHandlerTest]
Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
at __randomizedtesting.SeedInfo.seed([AB210AC50D39CDD1]:0)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:232)
Caused by: org.apache.solr.common.SolrException: java.io.FileNotFoundException: 
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build/solr-core/test/J1/temp/solr.cloud.CdcrReplicationHandlerTest_AB210AC50D39CDD1-001/jetty-001/cores/source_collection_shard1_replica2/data/tlog/tlog.007.1516878490358513664
 (No such file or directory)
at 
org.apache.solr.update.CdcrTransactionLog.reopenOutputStream(CdcrTransactionLog.java:244)
at 
org.apache.solr.update.CdcrTransactionLog.incref(CdcrTransactionLog.java:173)
at 
org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1079)
at 
org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1579)
at 
org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1610)
at org.apache.solr.core.SolrCore.seedVersionBuckets(SolrCore.java:877)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:534)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:225)
Caused by: java.io.FileNotFoundException: 
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build/solr-core/test/J1/temp/solr.cloud.CdcrReplicationHandlerTest_AB210AC50D39CDD1-001/jetty-001/cores/source_collection_shard1_replica2/data/tlog/tlog.007.1516878490358513664
 (No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.(RandomAccessFile.java:243)
at 
org.apache.solr.update.CdcrTransactionLog.reopenOutputStream(CdcrTransactionLog.java:236)
... 7 more




Build Log:
[...truncated 10890 lines...]
   [junit4] Suite: org.apache.solr.cloud.CdcrReplicationHandlerTest
   [junit4]   2> Creating dataDir: 
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build/solr-core/test/J1/temp/solr.cloud.CdcrReplicationHandlerTest_AB210AC50D39CDD1-001/init-core-data-001
   [junit4]   2> 1263432 INFO  
(SUITE-CdcrReplicationHandlerTest-seed#[AB210AC50D39CDD1]-worker) [] 
o.a.s.SolrTestCaseJ4 Randomized ssl (true) and clientAuth (false)
   [junit4]   2> 1263432 INFO  
(SUITE-CdcrReplicationHandlerTest-seed#[AB210AC50D39CDD1]-worker) [] 
o.a.s.BaseDistributedSearchTestCase Setting hostContext system property: /h/gz
   [junit4]   2> 1263434 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[AB210AC50D39CDD1]) [] 
o.a.s.c.ZkTestServer STARTING ZK TEST SERVER
   [junit4]   2> 1263434 INFO  (Thread-3012) [] o.a.s.c.ZkTestServer client 
port:0.0.0.0/0.0.0.0:0
   [junit4]   2> 1263434 INFO  (Thread-3012) [] o.a.s.c.ZkTestServer 
Starting server
   [junit4]   2> 1263534 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[AB210AC50D39CDD1]) [] 
o.a.s.c.ZkTestServer start zk server on port:35036
   [junit4]   2> 1263534 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[AB210AC50D39CDD1]) [] 
o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider
   [junit4]   2> 1263535 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[AB210AC50D39CDD1]) [] 
o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper
   [junit4]   2> 1263536 INFO  (zkCallback-996-thread-1) [] 
o.a.s.c.c.ConnectionManager Watcher 
org.apache.solr.common.cloud.ConnectionManager@6e1f43 name:ZooKeeperConnection 
Watcher:127.0.0.1:35036 got event WatchedEvent state:SyncConnected type:None 
path:null path:null type:None
   [junit4]   2> 1263536 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[AB210AC50D39CDD1]) [] 
o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
   [junit4]   2> 1263537 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[AB210AC50D39CDD1]) [] 
o.a.s.c.c.SolrZkClient Using default ZkACLProvider
   [junit4]   2> 1263537 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[AB210AC50D39CDD1]) [] 
o.a.s.c.c.SolrZkClient makePath: /solr
   [junit4]   2> 1263538 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[AB210AC50D39CDD1]) [] 
o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider
   [junit4]   2> 1263540 INFO  
(TEST-

[jira] [Updated] (SOLR-7989) Down replica elected leader

2015-11-03 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-7989:
---
Description: 
It is possible that a down replica gets elected as a leader, and that it stays 
down after the election.

Here's how I hit upon this:
* There are 3 replicas: leader, notleader0, notleader1
* Introduced network partition to isolate notleader0, notleader1 from leader 
(leader puts these two in LIR via zk).
* Kill leader, remove partition. Now leader is dead, and both of notleader0 and 
notleader1 are down. There is no leader.
* Remove LIR znodes in zk.
* Wait a while, and there happens a (flawed?) leader election.
* Finally, the state is such that one of notleader0 or notleader1 (which were 
down before) become leader, but stays down.


  was:
It is possible that a down replica gets elected as a leader, and that it stays 
down after the election.

Here's how I hit upon this:
* There are 3 replicas: leader, notleader0, notleader1
* Introduced network partition to isolate notleader0, notleader1 from leader 
(leader puts these two in LIR via zk).
* Kill leader, remove partition. Now leader is dead, and both of notleader0 and 
notleader1 are down. There is no leader.
* Remove LIR znodes in zk.
* Wait a while, and there happens a (flawed?) leader election.
* Finally, the state is such that one of notleader0 or notleader1 (which were 
down before) become leader, but stays down.

>From logs, I see that the recovery fails but yet the replica becomes a leader.


> Down replica elected leader
> ---
>
> Key: SOLR-7989
> URL: https://issues.apache.org/jira/browse/SOLR-7989
> Project: Solr
>  Issue Type: Bug
>Reporter: Ishan Chattopadhyaya
>Priority: Minor
> Attachments: DownLeaderTest.java
>
>
> It is possible that a down replica gets elected as a leader, and that it 
> stays down after the election.
> Here's how I hit upon this:
> * There are 3 replicas: leader, notleader0, notleader1
> * Introduced network partition to isolate notleader0, notleader1 from leader 
> (leader puts these two in LIR via zk).
> * Kill leader, remove partition. Now leader is dead, and both of notleader0 
> and notleader1 are down. There is no leader.
> * Remove LIR znodes in zk.
> * Wait a while, and there happens a (flawed?) leader election.
> * Finally, the state is such that one of notleader0 or notleader1 (which were 
> down before) become leader, but stays down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >