[jira] [Updated] (SOLR-3218) Range faceting support for CurrencyField

2014-02-16 Thread Vitaliy Zhovtyuk (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitaliy Zhovtyuk updated SOLR-3218:
---

Attachment: SOLR-3218.patch

Updated to latest trunk. Added range facet tests to 
org.apache.solr.schema.AbstractCurrencyFieldTest, Moved 
org.apache.solr.schema.CurrencyValue back to separate class from nested class 
org.apache.solr.schema.CurrencyField since CurrencyValue used outside in 
org.apache.solr.request.SimpleFacets and other classes. Probably worth for wrap 
and encapsulate in org.apache.solr.schema.CurrencyField

 Range faceting support for CurrencyField
 

 Key: SOLR-3218
 URL: https://issues.apache.org/jira/browse/SOLR-3218
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Jan Høydahl
 Fix For: 4.7

 Attachments: SOLR-3218-1.patch, SOLR-3218-2.patch, SOLR-3218.patch, 
 SOLR-3218.patch, SOLR-3218.patch, SOLR-3218.patch


 Spinoff from SOLR-2202. Need to add range faceting capabilities for 
 CurrencyField



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: Lucene 5092 pull 1

2014-02-16 Thread PaulElschot
Github user PaulElschot closed the pull request at:

https://github.com/apache/lucene-solr/pull/24


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: Lucene 5092 pull 1

2014-02-16 Thread PaulElschot
Github user PaulElschot commented on the pull request:

https://github.com/apache/lucene-solr/pull/24#issuecomment-35196892
  
Closed because of some conficts after LUCENE-5044.


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5092) join: don't expect all filters to be FixedBitSet instances

2014-02-16 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902714#comment-13902714
 ] 

Paul Elschot commented on LUCENE-5092:
--

As expected, after LUCENE-5440, the patch/pull request has a few conflicts. 
I'll resolve these.

 join: don't expect all filters to be FixedBitSet instances
 --

 Key: LUCENE-5092
 URL: https://issues.apache.org/jira/browse/LUCENE-5092
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5092.patch


 The join module throws exceptions when the parents filter isn't a 
 FixedBitSet. The reason is that the join module relies on prevSetBit to find 
 the first child document given a parent ID.
 As suggested by Uwe and Paul Elschot on LUCENE-5081, we could fix it by 
 exposing methods in the iterators to iterate backwards. When the join modules 
 gets an iterator which isn't able to iterate backwards, it would just need to 
 dump its content into another DocIdSet that supports backward iteration, 
 FixedBitSet for example.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5044) wasted work in AllGroupHeadsCollectorTest.arrayContains()

2014-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902715#comment-13902715
 ] 

ASF GitHub Bot commented on LUCENE-5044:


Github user PaulElschot commented on the pull request:

https://github.com/apache/lucene-solr/pull/24#issuecomment-35196892
  
Closed because of some conficts after LUCENE-5044.


 wasted work in AllGroupHeadsCollectorTest.arrayContains()
 -

 Key: LUCENE-5044
 URL: https://issues.apache.org/jira/browse/LUCENE-5044
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
 Environment: any
Reporter: Adrian Nistor
Priority: Minor
  Labels: patch, performance
 Fix For: 4.4, 5.0

 Attachments: patch.diff, patchAll.diff


 The problem appears in version 4.3.0 and in revision 1490286.  I
 attached a one-line patch that fixes it.
 In method AllGroupHeadsCollectorTest.arrayContains, the loop over
 actual should break immediately after found is set to true.  All
 the iterations after found is set to true do not perform any
 useful work, at best they just set found again to true.
 Method processWord in class CapitalizationFilter has a similar
 loop (over prefix), and this loop breaks immediately after match
 is set to false, just like in the proposed patch.  Other methods
 (e.g., Step.apply, JapaneseTokenizer.computePenalty,
 CompressingStoredFieldsWriter.saveInts, FieldQuery.checkOverlap)
 also have similar loops with similar breaks, just like in the proposed
 patch.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5721) ConnectionManager can become stuck in likeExpired

2014-02-16 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902718#comment-13902718
 ] 

Ramkumar Aiyengar commented on SOLR-5721:
-

Wouldn't using System.currentTimeMillis for timr deltas lead to errors due to 
NTP sync or DST? It's not guaranteed to be monotonic. See

http://stackoverflow.com/questions/2978598/will-sytem-currenttimemillis-always-return-a-value-previous-calls

System.nanoTime seems to provide a better alternative, at least when the OS 
supports a monotonic clock.

 ConnectionManager can become stuck in likeExpired
 -

 Key: SOLR-5721
 URL: https://issues.apache.org/jira/browse/SOLR-5721
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6.1
Reporter: Gregory Chanan
Assignee: Mark Miller
 Fix For: 4.7, 5.0

 Attachments: SOLR-5721.patch, SOLR-5721test.patch


 Here are the sequence of events:
 - we disconnect
 - The disconnect timer beings to run (so no longer scheduled), but doesn't  
 set likelyExpired yet
 - We connect, and set likelyExpired = false
 - The disconnect thread runs and sets likelyExpired to true, and it is never 
 set back to false (note that we cancel the disconnect thread but that only 
 cancels scheduled tasks but not running tasks).
 This is pretty difficult to reproduce without doing more work in the 
 disconnect thread.  It's easy to reproduce by adding sleeps in various places 
 -- I have a test that I'll attach that does that.
 The most straightforward way to solve this would be to grab the 
 synchronization lock on ConnectionManager in the disconnect thread, ensure we 
 aren't actually connected, and only then setting likelyExpired to true.  In 
 code:
 {code}
 synchronized (ConnectionManager.this) {
   if (!connected) likelyExpired = true;
 }
 {code}
 but this is all pretty subtle and error prone.  It's easier to just get rid 
 of the disconnect thread and record the last time we disconnected.  Then, 
 when we check likelyExpired, we just do a quick calculation to see if we are 
 likelyExpired.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5514) atomic update throws exception if the schema contains uuid fields: Invalid UUID String: 'java.util.UUID:e26c4d56-e98d-41de-9b7f-f63192089670'

2014-02-16 Thread Dirk Reuss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902720#comment-13902720
 ] 

Dirk Reuss  commented on SOLR-5514:
---

Here is an other hint which might cause the error. Just seen it in our log 
files.
This time the error occurs when a document is read from the 
RealTimeGet-component. I'm pretty sure we do only store well formed uuid values.
the error happens when i send the follow command:
add overwrite=truedocfield name=.../doc...doc.../doc/add
the add command contains about 100 docs each with about 33 fields.
I have to examine this problem later, may be it is a new error. 

lst name=responseHeaderint name=status500/intint 
name=QTime2/int/lstlst name=errorstr name=msgFor input string: 
000#0;#0;#0;#0;#0;#0;#0;#0;#0;/strstr 
name=tracejava.lang.NumberFormatException: For input string: 
000#0;#0;#0;#0;#0;#0;#0;#0;#0;
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:76)
at java.lang.Long.parseLong(Long.java:452)
at java.lang.Long.valueOf(Long.java:524)
at java.lang.Long.decode(Long.java:676)
at java.util.UUID.fromString(UUID.java:217)
at org.apache.solr.schema.UUIDField.toObject(UUIDField.java:103)
at org.apache.solr.schema.UUIDField.toObject(UUIDField.java:49)
at 
org.apache.solr.handler.component.RealTimeGetComponent.toSolrInputDocument(RealTimeGetComponent.java:263)
at 
org.apache.solr.handler.component.RealTimeGetComponent.getInputDocument(RealTimeGetComponent.java:244)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:726)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:635)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)

 atomic update throws exception if the schema contains uuid fields: Invalid 
 UUID String: 'java.util.UUID:e26c4d56-e98d-41de-9b7f-f63192089670'
 -

 Key: SOLR-5514
 URL: https://issues.apache.org/jira/browse/SOLR-5514
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5.1
 Environment: unix and windows
Reporter: Dirk Reuss 
Assignee: Shalin Shekhar Mangar

 I am updating an exiting document with the statement 
 adddocfield name='name' update='set'newvalue/field
 All fields are stored and I have several UUID fields. About 10-20% of the 
 update commands will fail with the message: (example)
 Invalid UUID String: 'java.util.UUID:532c9353-d391-4a04-8618-dc2fa1ef8b35'
 the point is that java.util.UUID seems to be prepended to the original uuid 
 stored in the field and when the value is written this error occours.
 I tried to check if this specific uuid field was the problem and
 added the uuid 

[jira] [Commented] (SOLR-5721) ConnectionManager can become stuck in likeExpired

2014-02-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902747#comment-13902747
 ] 

Mark Miller commented on SOLR-5721:
---

Yeah, sounds like a good improvement.

 ConnectionManager can become stuck in likeExpired
 -

 Key: SOLR-5721
 URL: https://issues.apache.org/jira/browse/SOLR-5721
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6.1
Reporter: Gregory Chanan
Assignee: Mark Miller
 Fix For: 4.7, 5.0

 Attachments: SOLR-5721.patch, SOLR-5721test.patch


 Here are the sequence of events:
 - we disconnect
 - The disconnect timer beings to run (so no longer scheduled), but doesn't  
 set likelyExpired yet
 - We connect, and set likelyExpired = false
 - The disconnect thread runs and sets likelyExpired to true, and it is never 
 set back to false (note that we cancel the disconnect thread but that only 
 cancels scheduled tasks but not running tasks).
 This is pretty difficult to reproduce without doing more work in the 
 disconnect thread.  It's easy to reproduce by adding sleeps in various places 
 -- I have a test that I'll attach that does that.
 The most straightforward way to solve this would be to grab the 
 synchronization lock on ConnectionManager in the disconnect thread, ensure we 
 aren't actually connected, and only then setting likelyExpired to true.  In 
 code:
 {code}
 synchronized (ConnectionManager.this) {
   if (!connected) likelyExpired = true;
 }
 {code}
 but this is all pretty subtle and error prone.  It's easier to just get rid 
 of the disconnect thread and record the last time we disconnected.  Then, 
 when we check likelyExpired, we just do a quick calculation to see if we are 
 likelyExpired.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1913) QParserPlugin plugin for Search Results Filtering Based on Bitwise Operations on Integer Fields

2014-02-16 Thread Vitaliy Zhovtyuk (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitaliy Zhovtyuk updated SOLR-1913:
---

Attachment: SOLR-1913.patch

Changed packages for BitwiseFIlter: org.apache.lucene.search.BitwiseFilter,
for BitwiseQueryParserPlugin: org.apache.solr.search.BitwiseQueryParserPlugin.

Added Lucene tests for BitwiseFilter, added Solr tests  checking bitwise parser 
queries for BitwiseQueryParserPlugin.

 QParserPlugin plugin for Search Results Filtering Based on Bitwise Operations 
 on Integer Fields
 ---

 Key: SOLR-1913
 URL: https://issues.apache.org/jira/browse/SOLR-1913
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Israel Ekpo
 Fix For: 4.7

 Attachments: SOLR-1913-src.tar.gz, SOLR-1913.bitwise.tar.gz, 
 SOLR-1913.patch, WEB-INF lib.jpg, bitwise_filter_plugin.jar, 
 solr-bitwise-plugin.jar

   Original Estimate: 1h
  Remaining Estimate: 1h

 BitwiseQueryParserPlugin is a org.apache.solr.search.QParserPlugin that 
 allows 
 users to filter the documents returned from a query
 by performing bitwise operations between a particular integer field in the 
 index
 and the specified value.
 This Solr plugin is based on the BitwiseFilter in LUCENE-2460
 See https://issues.apache.org/jira/browse/LUCENE-2460 for more details
 This is the syntax for searching in Solr:
 http://localhost:8983/path/to/solr/select/?q={!bitwise field=fieldname 
 op=OPERATION_NAME source=sourcevalue negate=boolean}remainder of query
 Example :
 http://localhost:8983/solr/bitwise/select/?q={!bitwise field=user_permissions 
 op=AND source=3 negate=true}state:FL
 The negate parameter is optional
 The field parameter is the name of the integer field
 The op parameter is the name of the operation; one of {AND, OR, XOR}
 The source parameter is the specified integer value
 The negate parameter is a boolean indicating whether or not to negate the 
 results of the bitwise operation
 To test out this plugin, simply copy the jar file containing the plugin 
 classes into your $SOLR_HOME/lib directory and then
 add the following to your solrconfig.xml file after the dismax request 
 handler:
 queryParser name=bitwise 
 class=org.apache.solr.bitwise.BitwiseQueryParserPlugin basedOn=dismax /
 Restart your servlet container.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5734) We should use System.nanoTime rather than System.currentTimeMillis when calculating elapsed time.

2014-02-16 Thread Mark Miller (JIRA)
Mark Miller created SOLR-5734:
-

 Summary: We should use System.nanoTime rather than 
System.currentTimeMillis when calculating elapsed time. 
 Key: SOLR-5734
 URL: https://issues.apache.org/jira/browse/SOLR-5734
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.7, 5.0






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5721) ConnectionManager can become stuck in likeExpired

2014-02-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902766#comment-13902766
 ] 

Mark Miller commented on SOLR-5721:
---

I filed SOLR-5734 as this kind of spans the system.

 ConnectionManager can become stuck in likeExpired
 -

 Key: SOLR-5721
 URL: https://issues.apache.org/jira/browse/SOLR-5721
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6.1
Reporter: Gregory Chanan
Assignee: Mark Miller
 Fix For: 4.7, 5.0

 Attachments: SOLR-5721.patch, SOLR-5721test.patch


 Here are the sequence of events:
 - we disconnect
 - The disconnect timer beings to run (so no longer scheduled), but doesn't  
 set likelyExpired yet
 - We connect, and set likelyExpired = false
 - The disconnect thread runs and sets likelyExpired to true, and it is never 
 set back to false (note that we cancel the disconnect thread but that only 
 cancels scheduled tasks but not running tasks).
 This is pretty difficult to reproduce without doing more work in the 
 disconnect thread.  It's easy to reproduce by adding sleeps in various places 
 -- I have a test that I'll attach that does that.
 The most straightforward way to solve this would be to grab the 
 synchronization lock on ConnectionManager in the disconnect thread, ensure we 
 aren't actually connected, and only then setting likelyExpired to true.  In 
 code:
 {code}
 synchronized (ConnectionManager.this) {
   if (!connected) likelyExpired = true;
 }
 {code}
 but this is all pretty subtle and error prone.  It's easier to just get rid 
 of the disconnect thread and record the last time we disconnected.  Then, 
 when we check likelyExpired, we just do a quick calculation to see if we are 
 likelyExpired.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5734) We should use System.nanoTime rather than System.currentTimeMillis when calculating elapsed time.

2014-02-16 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5734:
--

Description: As brought up by [~andyetitmoves] in SOLR-5721.

 We should use System.nanoTime rather than System.currentTimeMillis when 
 calculating elapsed time. 
 --

 Key: SOLR-5734
 URL: https://issues.apache.org/jira/browse/SOLR-5734
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.7, 5.0


 As brought up by [~andyetitmoves] in SOLR-5721.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5688) Allow updating of soft and hard commit parameters using HTTP API

2014-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902767#comment-13902767
 ] 

Rafał Kuć commented on SOLR-5688:
-

A small 'ping' from me. Do we need anything more here? Any comments?

 Allow updating of soft and hard commit parameters using HTTP API
 

 Key: SOLR-5688
 URL: https://issues.apache.org/jira/browse/SOLR-5688
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.6.1
Reporter: Rafał Kuć
 Fix For: 5.0

 Attachments: SOLR-5688-single_api_call.patch, SOLR-5688.patch


 Right now, to update the values (max time and max docs) for hard and soft 
 autocommits one has to alter the configuration and reload the core. I think 
 it may be nice, to expose an API to do that in a way, that the configuration 
 is not updated, so the change is not persistent. 
 There may be various reasons for doing that - for example one may know that 
 the application will send large amount of data and want to prepare for that. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5408) SerializedDVStrategy -- match geometries in DocValues

2014-02-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902783#comment-13902783
 ] 

ASF subversion and git services commented on LUCENE-5408:
-

Commit 1568807 from [~dsmiley] in branch 'dev/trunk'
[ https://svn.apache.org/r1568807 ]

LUCENE-5408: Spatial SerializedDVStrategy

 SerializedDVStrategy -- match geometries in DocValues
 -

 Key: LUCENE-5408
 URL: https://issues.apache.org/jira/browse/LUCENE-5408
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.7

 Attachments: LUCENE-5408_GeometryStrategy.patch, 
 LUCENE-5408_SerializedDVStrategy.patch


 I've started work on a new SpatialStrategy implementation I'm tentatively 
 calling SerializedDVStrategy.  It's similar to the [JtsGeoStrategy in 
 Spatial-Solr-Sandbox|https://github.com/ryantxu/spatial-solr-sandbox/tree/master/LSE/src/main/java/org/apache/lucene/spatial/pending/jts]
  but a little different in the details -- certainly faster.  Using Spatial4j 
 0.4's BinaryCodec, it'll serialize the shape to bytes (for polygons this in 
 internally WKB format) and the strategy will put it in a 
 BinaryDocValuesField.  In practice the shape is likely a polygon but it 
 needn't be.  Then I'll implement a Filter that returns a DocIdSetIterator 
 that evaluates a given document passed via advance(docid)) to see if the 
 query shape matches a shape in DocValues. It's improper usage for it to be 
 used in a situation where it will evaluate every document id via nextDoc().  
 And in practice the DocValues format chosen should be a disk resident one 
 since each value tends to be kind of big.
 This spatial strategy in and of itself has no _index_; it's O(N) where N is 
 the number of documents that get passed thru it.  So it should be placed last 
 in the query/filter tree so that the other queries limit the documents it 
 needs to see.  At a minimum, another query/filter to use in conjunction is 
 another SpatialStrategy like RecursivePrefixTreeStrategy.
 Eventually once the PrefixTree grid encoding has a little bit more metadata, 
 it will be possible to further combine the grid  this strategy in such a way 
 that many documents won't need to be checked against the serialized geometry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.7.0) - Build # 1304 - Still Failing!

2014-02-16 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/1304/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 9912 lines...]
   [junit4] JVM J0: stderr was not empty, see: 
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20140216_193116_238.syserr
   [junit4]  JVM J0: stderr (verbatim) 
   [junit4] java(1575,0x13ce57000) malloc: *** error for object 0x13ce44f10: 
pointer being freed was not allocated
   [junit4] *** set a breakpoint in malloc_error_break to debug
   [junit4]  JVM J0: EOF 

[...truncated 1 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/jre/bin/java 
-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=6B5378DCCDC69524 -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.7 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=4.7-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Djdk.map.althashing.threshold=0 
-Dtests.disableHdfs=true -Dfile.encoding=UTF-8 -classpath 

[jira] [Commented] (SOLR-5688) Allow updating of soft and hard commit parameters using HTTP API

2014-02-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902802#comment-13902802
 ] 

Mark Miller commented on SOLR-5688:
---

Hopefully the guys working a bunch of other rest api stuff can comment. I think 
we want to all pull in the same direction with that - but I haven't followed 
progress closely enough to comment helpfully yet. 

 Allow updating of soft and hard commit parameters using HTTP API
 

 Key: SOLR-5688
 URL: https://issues.apache.org/jira/browse/SOLR-5688
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.6.1
Reporter: Rafał Kuć
 Fix For: 5.0

 Attachments: SOLR-5688-single_api_call.patch, SOLR-5688.patch


 Right now, to update the values (max time and max docs) for hard and soft 
 autocommits one has to alter the configuration and reload the core. I think 
 it may be nice, to expose an API to do that in a way, that the configuration 
 is not updated, so the change is not persistent. 
 There may be various reasons for doing that - for example one may know that 
 the application will send large amount of data and want to prepare for that. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5408) SerializedDVStrategy -- match geometries in DocValues

2014-02-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902805#comment-13902805
 ] 

ASF subversion and git services commented on LUCENE-5408:
-

Commit 1568817 from [~dsmiley] in branch 'dev/trunk'
[ https://svn.apache.org/r1568817 ]

LUCENE-5408: fixed tests; some strategies require DocValues

 SerializedDVStrategy -- match geometries in DocValues
 -

 Key: LUCENE-5408
 URL: https://issues.apache.org/jira/browse/LUCENE-5408
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.7

 Attachments: LUCENE-5408_GeometryStrategy.patch, 
 LUCENE-5408_SerializedDVStrategy.patch


 I've started work on a new SpatialStrategy implementation I'm tentatively 
 calling SerializedDVStrategy.  It's similar to the [JtsGeoStrategy in 
 Spatial-Solr-Sandbox|https://github.com/ryantxu/spatial-solr-sandbox/tree/master/LSE/src/main/java/org/apache/lucene/spatial/pending/jts]
  but a little different in the details -- certainly faster.  Using Spatial4j 
 0.4's BinaryCodec, it'll serialize the shape to bytes (for polygons this in 
 internally WKB format) and the strategy will put it in a 
 BinaryDocValuesField.  In practice the shape is likely a polygon but it 
 needn't be.  Then I'll implement a Filter that returns a DocIdSetIterator 
 that evaluates a given document passed via advance(docid)) to see if the 
 query shape matches a shape in DocValues. It's improper usage for it to be 
 used in a situation where it will evaluate every document id via nextDoc().  
 And in practice the DocValues format chosen should be a disk resident one 
 since each value tends to be kind of big.
 This spatial strategy in and of itself has no _index_; it's O(N) where N is 
 the number of documents that get passed thru it.  So it should be placed last 
 in the query/filter tree so that the other queries limit the documents it 
 needs to see.  At a minimum, another query/filter to use in conjunction is 
 another SpatialStrategy like RecursivePrefixTreeStrategy.
 Eventually once the PrefixTree grid encoding has a little bit more metadata, 
 it will be possible to further combine the grid  this strategy in such a way 
 that many documents won't need to be checked against the serialized geometry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5408) SerializedDVStrategy -- match geometries in DocValues

2014-02-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902806#comment-13902806
 ] 

ASF subversion and git services commented on LUCENE-5408:
-

Commit 1568818 from [~dsmiley] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1568818 ]

LUCENE-5408: Spatial SerializedDVStrategy

 SerializedDVStrategy -- match geometries in DocValues
 -

 Key: LUCENE-5408
 URL: https://issues.apache.org/jira/browse/LUCENE-5408
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.7

 Attachments: LUCENE-5408_GeometryStrategy.patch, 
 LUCENE-5408_SerializedDVStrategy.patch


 I've started work on a new SpatialStrategy implementation I'm tentatively 
 calling SerializedDVStrategy.  It's similar to the [JtsGeoStrategy in 
 Spatial-Solr-Sandbox|https://github.com/ryantxu/spatial-solr-sandbox/tree/master/LSE/src/main/java/org/apache/lucene/spatial/pending/jts]
  but a little different in the details -- certainly faster.  Using Spatial4j 
 0.4's BinaryCodec, it'll serialize the shape to bytes (for polygons this in 
 internally WKB format) and the strategy will put it in a 
 BinaryDocValuesField.  In practice the shape is likely a polygon but it 
 needn't be.  Then I'll implement a Filter that returns a DocIdSetIterator 
 that evaluates a given document passed via advance(docid)) to see if the 
 query shape matches a shape in DocValues. It's improper usage for it to be 
 used in a situation where it will evaluate every document id via nextDoc().  
 And in practice the DocValues format chosen should be a disk resident one 
 since each value tends to be kind of big.
 This spatial strategy in and of itself has no _index_; it's O(N) where N is 
 the number of documents that get passed thru it.  So it should be placed last 
 in the query/filter tree so that the other queries limit the documents it 
 needs to see.  At a minimum, another query/filter to use in conjunction is 
 another SpatialStrategy like RecursivePrefixTreeStrategy.
 Eventually once the PrefixTree grid encoding has a little bit more metadata, 
 it will be possible to further combine the grid  this strategy in such a way 
 that many documents won't need to be checked against the serialized geometry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5727) LBHttpSolrServer should only retry on Connection exceptions when sending updates.

2014-02-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902814#comment-13902814
 ] 

Mark Miller commented on SOLR-5727:
---

There seem to be some unrelated failures that make my patch for this hard to 
test, but once that gets worked out, I'll post a patch and commit. I want to 
get this into jenkins to see the affects on chaosmonkey tests.

I think SOLR-5593 was hiding / protecting against some issues around this. It 
also fits with some fails even before that I was trying to figure out and 
seemed to make no sense unless the client was resending the same update even 
while we were internally retrying to send an update to a leader.

 LBHttpSolrServer should only retry on Connection exceptions when sending 
 updates.
 -

 Key: SOLR-5727
 URL: https://issues.apache.org/jira/browse/SOLR-5727
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.7, 5.0


 You don't know if the request was successful or not and so its better to 
 error to the user than retry, especially because forwards to a shard leader 
 can be retried internally.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2014-02-16 Thread Greg Pendlebury (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902824#comment-13902824
 ] 

Greg Pendlebury commented on SOLR-5722:
---

I don't think it does. It has been a while since we looked into it, and that 
link is currently returning 503 for me, but my understanding was that the 
HyphenatedWordsFilter put two tokens back together when a hyphen was found on 
the end of the first token. The catenateShingles options we are using addresses 
the scenario where multiple hyphens are found internal to a single token.

 Add catenateShingles option to WordDelimiterFilter
 --

 Key: SOLR-5722
 URL: https://issues.apache.org/jira/browse/SOLR-5722
 Project: Solr
  Issue Type: Improvement
Reporter: Greg Pendlebury
Priority: Minor
  Labels: filter, newbie, patch
 Attachments: WDFconcatShingles.patch


 Apologies if I put this in the wrong spot. I'm attaching a patch (against 
 current trunk) that adds support for a 'catenateShingles' option to the 
 WordDelimiterFilter. 
 We (National Library of Australia - NLA) are currently maintaining this as an 
 internal modification to the Filter, but I believe it is generic enough to 
 contribute upstream.
 Description:
 =
 {code}
 /**
  * NLA Modification to the standard word delimiter to support various
  * hyphenation use cases. Primarily driven by requirements for
  * newspapers where words are often broken across line endings.
  *
  *  eg. hyphenated-surname is printed printed across a line ending and
  * turns out like hyphen-ated-surname or hyphenated-sur-name.
  *
  *  In this scenario the stock filter, with 'catenateAll' turned on, will
  *  generate individual tokens plus one combined token, but not
  *  sub-tokens like hyphenated surname and hyphenatedsur name.
  *
  *  So we add a new 'catenateShingles' to achieve this.
 */
 {code}
 Includes unit tests, and as is noted in one of them CATENATE_WORDS and 
 CATENATE_SHINGLES are logically considered mutually exclusive for sensible 
 usage and can cause duplicate tokens (although they should have the same 
 positions etc).
 I'm happy to work on it more if anyone finds problems with it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5440) Add LongFixedBitSet and replace usage of OpenBitSet

2014-02-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902825#comment-13902825
 ] 

ASF subversion and git services commented on LUCENE-5440:
-

Commit 1568824 from [~shaie] in branch 'dev/trunk'
[ https://svn.apache.org/r1568824 ]

LUCENE-5440: fix bug in FacetComponent

 Add LongFixedBitSet and replace usage of OpenBitSet
 ---

 Key: LUCENE-5440
 URL: https://issues.apache.org/jira/browse/LUCENE-5440
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5440-solr.patch, LUCENE-5440-solr.patch, 
 LUCENE-5440-solr.patch, LUCENE-5440.patch, LUCENE-5440.patch, 
 LUCENE-5440.patch, LUCENE-5440.patch, LUCENE-5440.patch


 Spinoff from here: http://lucene.markmail.org/thread/35gw3amo53dsqsqj. I 
 wrote a LongFixedBitSet which behaves like FixedBitSet, only allows managing 
 more than 2.1B bits. It overcome some issues I've encountered with 
 OpenBitSet, such as the use of set/fastSet as well the implementation of 
 DocIdSet. I'll post a patch shortly and describe it in more detail.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1568824 - /lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java

2014-02-16 Thread Mark Miller
Thanks!

- Mark

http://about.me/markrmiller

On Feb 16, 2014, at 3:37 PM, sh...@apache.org wrote:

 Author: shaie
 Date: Sun Feb 16 20:37:28 2014
 New Revision: 1568824
 
 URL: http://svn.apache.org/r1568824
 Log:
 LUCENE-5440: fix bug in FacetComponent
 
 Modified:

 lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
 
 Modified: 
 lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java?rev=1568824r1=1568823r2=1568824view=diff
 ==
 --- 
 lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
  (original)
 +++ 
 lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
  Sun Feb 16 20:37:28 2014
 @@ -451,7 +451,7 @@ public class FacetComponent extends Sear
   long maxCount = sfc.count;
   for (int shardNum=0; shardNumrb.shards.length; shardNum++) {
 FixedBitSet fbs = dff.counted[shardNum];
 -if (fbs!=null  !fbs.get(sfc.termNum)) {  // fbs can be null if 
 a shard request failed
 +if (fbs!=null  (sfc.termNum = fbs.length() || 
 !fbs.get(sfc.termNum))) {  // fbs can be null if a shard request failed
   // if missing from this shard, add the max it could be
   maxCount += dff.maxPossible(sfc,shardNum);
 }
 @@ -466,7 +466,7 @@ public class FacetComponent extends Sear
   // add a query for each shard missing the term that needs refinement
   for (int shardNum=0; shardNumrb.shards.length; shardNum++) {
 FixedBitSet fbs = dff.counted[shardNum];
 -if(fbs!=null  !fbs.get(sfc.termNum)  
 dff.maxPossible(sfc,shardNum)0) {
 +if(fbs!=null  (sfc.termNum = fbs.length() || 
 !fbs.get(sfc.termNum))  dff.maxPossible(sfc,shardNum)0) {
   dff.needRefinements = true;
   ListString lst = dff._toRefine[shardNum];
   if (lst == null) {
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5440) Add LongFixedBitSet and replace usage of OpenBitSet

2014-02-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902832#comment-13902832
 ] 

ASF subversion and git services commented on LUCENE-5440:
-

Commit 1568825 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1568825 ]

LUCENE-5440: fix bug in FacetComponent

 Add LongFixedBitSet and replace usage of OpenBitSet
 ---

 Key: LUCENE-5440
 URL: https://issues.apache.org/jira/browse/LUCENE-5440
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5440-solr.patch, LUCENE-5440-solr.patch, 
 LUCENE-5440-solr.patch, LUCENE-5440.patch, LUCENE-5440.patch, 
 LUCENE-5440.patch, LUCENE-5440.patch, LUCENE-5440.patch


 Spinoff from here: http://lucene.markmail.org/thread/35gw3amo53dsqsqj. I 
 wrote a LongFixedBitSet which behaves like FixedBitSet, only allows managing 
 more than 2.1B bits. It overcome some issues I've encountered with 
 OpenBitSet, such as the use of set/fastSet as well the implementation of 
 DocIdSet. I'll post a patch shortly and describe it in more detail.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5735) ChaosMonkey test timeouts.

2014-02-16 Thread Mark Miller (JIRA)
Mark Miller created SOLR-5735:
-

 Summary: ChaosMonkey test timeouts.
 Key: SOLR-5735
 URL: https://issues.apache.org/jira/browse/SOLR-5735
 Project: Solr
  Issue Type: Task
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.7, 5.0


This started showing up in jenkins runs a while back.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Mark Miller
Hey everybody!

The Lucene PMC is happy to welcome Anshum Gupta as a committer on the Lucene /
Solr project.  Anshum has contributed to a number of issues for the
project, especially around SolrCloud.

Welcome Anshum!

It's tradition to introduce yourself with a short bio :)

-- 
- Mark

http://about.me/markrmiller


Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Dawid Weiss
Welcome Anshum!

Dawid

On Sun, Feb 16, 2014 at 11:33 PM, Mark Miller markrmil...@gmail.com wrote:
 Hey everybody!

 The Lucene PMC is happy to welcome Anshum Gupta as a committer on the Lucene
 / Solr project.  Anshum has contributed to a number of issues for the
 project, especially around SolrCloud.

 Welcome Anshum!

 It's tradition to introduce yourself with a short bio :)

 --
 - Mark

 http://about.me/markrmiller

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: LUCENE-5092, 2nd try

2014-02-16 Thread PaulElschot
GitHub user PaulElschot opened a pull request:

https://github.com/apache/lucene-solr/pull/33

LUCENE-5092, 2nd try

In core introduce DocBlocksIterator.
Use this in FixedBitSet, in EliasFanoDocIdSet and in join module ToChild... 
and ToParent...
Also change BaseDocIdSetTestCase to test 
DocBlocksIterator.advanceToJustBefore.

This was simplified a lot by LUCENE-5441 and LUCENE-5440.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/lucene-solr LUCENE-5092-pull-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/33.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #33


commit 4f8eae48ff0441b86a0fdb130e564f646dffcc43
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-02-16T22:31:58Z

Squashed commit for LUCENE-5092




If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Uwe Schindler
Hey Anshum, welcome aboard!

Uwe

On 16. Februar 2014 23:33:11 MEZ, Mark Miller markrmil...@gmail.com wrote:
Hey everybody!

The Lucene PMC is happy to welcome Anshum Gupta as a committer on the
Lucene /
Solr project.  Anshum has contributed to a number of issues for the
project, especially around SolrCloud.

Welcome Anshum!

It's tradition to introduce yourself with a short bio :)

-- 
- Mark

http://about.me/markrmiller

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de

[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet

2014-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902865#comment-13902865
 ] 

ASF GitHub Bot commented on LUCENE-5441:


GitHub user PaulElschot opened a pull request:

https://github.com/apache/lucene-solr/pull/33

LUCENE-5092, 2nd try

In core introduce DocBlocksIterator.
Use this in FixedBitSet, in EliasFanoDocIdSet and in join module ToChild... 
and ToParent...
Also change BaseDocIdSetTestCase to test 
DocBlocksIterator.advanceToJustBefore.

This was simplified a lot by LUCENE-5441 and LUCENE-5440.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/lucene-solr LUCENE-5092-pull-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/33.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #33


commit 4f8eae48ff0441b86a0fdb130e564f646dffcc43
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-02-16T22:31:58Z

Squashed commit for LUCENE-5092




 Decouple DocIdSet from OpenBitSet and FixedBitSet
 -

 Key: LUCENE-5441
 URL: https://issues.apache.org/jira/browse/LUCENE-5441
 Project: Lucene - Core
  Issue Type: Task
  Components: core/other
Affects Versions: 4.6.1
Reporter: Uwe Schindler
 Fix For: 5.0

 Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch


 Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow 
 kept the stupid filters can return a BitSet directly in the code. So lots 
 of Filters return just FixedBitSet, because this is the superclass (ideally 
 interface) of FixedBitSet.
 We should decouple that and *not* implement that abstract interface directly 
 by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters 
 in a wrong way, just because it was always returning Bitsets. But some 
 filters actually don't do this.
 I propose to let FixedBitSet (only in trunk, because that a major backwards 
 break) just have a method {{asDocIdSet()}}, that returns an anonymous 
 instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() 
 returns a new Iterator (like it always did) and the cost/cacheable methods 
 return static values.
 Filters in trunk would need to be changed like that:
 {code:java}
 FixedBitSet bits = 
 ...
 return bits;
 {code}
 gets:
 {code:java}
 FixedBitSet bits = 
 ...
 return bits.asDocIdSet();
 {code}
 As this methods returns an anonymous DocIdSet, calling code can no longer 
 rely or check if the implementation behind is a FixedBitSet.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet

2014-02-16 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902866#comment-13902866
 ] 

Paul Elschot commented on LUCENE-5441:
--

I'm sorry that pull request #33 ended up here, I think I should have mentioned 
LUCENE-5092 as the first issue in the comment body at the pull request.

 Decouple DocIdSet from OpenBitSet and FixedBitSet
 -

 Key: LUCENE-5441
 URL: https://issues.apache.org/jira/browse/LUCENE-5441
 Project: Lucene - Core
  Issue Type: Task
  Components: core/other
Affects Versions: 4.6.1
Reporter: Uwe Schindler
 Fix For: 5.0

 Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch


 Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow 
 kept the stupid filters can return a BitSet directly in the code. So lots 
 of Filters return just FixedBitSet, because this is the superclass (ideally 
 interface) of FixedBitSet.
 We should decouple that and *not* implement that abstract interface directly 
 by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters 
 in a wrong way, just because it was always returning Bitsets. But some 
 filters actually don't do this.
 I propose to let FixedBitSet (only in trunk, because that a major backwards 
 break) just have a method {{asDocIdSet()}}, that returns an anonymous 
 instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() 
 returns a new Iterator (like it always did) and the cost/cacheable methods 
 return static values.
 Filters in trunk would need to be changed like that:
 {code:java}
 FixedBitSet bits = 
 ...
 return bits;
 {code}
 gets:
 {code:java}
 FixedBitSet bits = 
 ...
 return bits.asDocIdSet();
 {code}
 As this methods returns an anonymous DocIdSet, calling code can no longer 
 rely or check if the implementation behind is a FixedBitSet.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Adrien Grand
Welcome Anshum !

On Sun, Feb 16, 2014 at 11:46 PM, Uwe Schindler u...@thetaphi.de wrote:
 Hey Anshum, welcome aboard!

 Uwe


 On 16. Februar 2014 23:33:11 MEZ, Mark Miller markrmil...@gmail.com wrote:

 Hey everybody!

 The Lucene PMC is happy to welcome Anshum Gupta as a committer on the
 Lucene / Solr project.  Anshum has contributed to a number of issues for the
 project, especially around SolrCloud.

 Welcome Anshum!

 It's tradition to introduce yourself with a short bio :)


 --
 Uwe Schindler
 H.-H.-Meier-Allee 63, 28213 Bremen
 http://www.thetaphi.de



-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5092) join: don't expect all filters to be FixedBitSet instances

2014-02-16 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902867#comment-13902867
 ] 

Paul Elschot commented on LUCENE-5092:
--

A new pull request is here:
https://github.com/apache/lucene-solr/pull/33

The automated message for the pull request ended up at LUCENE-5441.

 join: don't expect all filters to be FixedBitSet instances
 --

 Key: LUCENE-5092
 URL: https://issues.apache.org/jira/browse/LUCENE-5092
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5092.patch


 The join module throws exceptions when the parents filter isn't a 
 FixedBitSet. The reason is that the join module relies on prevSetBit to find 
 the first child document given a parent ID.
 As suggested by Uwe and Paul Elschot on LUCENE-5081, we could fix it by 
 exposing methods in the iterators to iterate backwards. When the join modules 
 gets an iterator which isn't able to iterate backwards, it would just need to 
 dump its content into another DocIdSet that supports backward iteration, 
 FixedBitSet for example.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Christian Moen
Congrats!

On Feb 17, 2014, at 7:33 AM, Mark Miller markrmil...@gmail.com wrote:

 Hey everybody!
 
 The Lucene PMC is happy to welcome Anshum Gupta as a committer on the Lucene 
 / Solr project.  Anshum has contributed to a number of issues for the 
 project, especially around SolrCloud.
 
 Welcome Anshum!
 
 It's tradition to introduce yourself with a short bio :)
 
 -- 
 - Mark
 
 http://about.me/markrmiller



[jira] [Commented] (SOLR-5727) LBHttpSolrServer should only retry on Connection exceptions when sending updates.

2014-02-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902878#comment-13902878
 ] 

ASF subversion and git services commented on SOLR-5727:
---

Commit 1568859 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1568859 ]

SOLR-5727: LBHttpSolrServer should only retry on Connection exceptions when 
sending updates. Affects CloudSolrServer.

 LBHttpSolrServer should only retry on Connection exceptions when sending 
 updates.
 -

 Key: SOLR-5727
 URL: https://issues.apache.org/jira/browse/SOLR-5727
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.7, 5.0


 You don't know if the request was successful or not and so its better to 
 error to the user than retry, especially because forwards to a shard leader 
 can be retried internally.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5727) LBHttpSolrServer should only retry on Connection exceptions when sending updates.

2014-02-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902876#comment-13902876
 ] 

ASF subversion and git services commented on SOLR-5727:
---

Commit 1568857 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1568857 ]

SOLR-5727: LBHttpSolrServer should only retry on Connection exceptions when 
sending updates. Affects CloudSolrServer.

 LBHttpSolrServer should only retry on Connection exceptions when sending 
 updates.
 -

 Key: SOLR-5727
 URL: https://issues.apache.org/jira/browse/SOLR-5727
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.7, 5.0


 You don't know if the request was successful or not and so its better to 
 error to the user than retry, especially because forwards to a shard leader 
 can be retried internally.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Steve Rowe
Welcome Anshum!


On Sun, Feb 16, 2014 at 5:55 PM, Christian Moen c...@atilika.com wrote:

 Congrats!

 On Feb 17, 2014, at 7:33 AM, Mark Miller markrmil...@gmail.com wrote:

 Hey everybody!

 The Lucene PMC is happy to welcome Anshum Gupta as a committer on the
 Lucene / Solr project.  Anshum has contributed to a number of issues for
 the project, especially around SolrCloud.

 Welcome Anshum!

 It's tradition to introduce yourself with a short bio :)

 --
 - Mark

 http://about.me/markrmiller





Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Michael McCandless
Welcome Anshum!

Mike McCandless

http://blog.mikemccandless.com


On Sun, Feb 16, 2014 at 5:33 PM, Mark Miller markrmil...@gmail.com wrote:
 Hey everybody!

 The Lucene PMC is happy to welcome Anshum Gupta as a committer on the Lucene
 / Solr project.  Anshum has contributed to a number of issues for the
 project, especially around SolrCloud.

 Welcome Anshum!

 It's tradition to introduce yourself with a short bio :)

 --
 - Mark

 http://about.me/markrmiller

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Koji Sekiguchi

Congrats Anshum!

koji
--
http://soleami.com/blog/mahout-and-machine-learning-training-course-is-here.html

(14/02/17 7:33), Mark Miller wrote:

Hey everybody!

The Lucene PMC is happy to welcome Anshum Gupta as a committer on the Lucene /
Solr project.  Anshum has contributed to a number of issues for the
project, especially around SolrCloud.

Welcome Anshum!

It's tradition to introduce yourself with a short bio :)






-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Han Jiang
Welcome Anshum!


On Mon, Feb 17, 2014 at 6:33 AM, Mark Miller markrmil...@gmail.com wrote:

 Hey everybody!

 The Lucene PMC is happy to welcome Anshum Gupta as a committer on the
 Lucene / Solr project.  Anshum has contributed to a number of issues for
 the project, especially around SolrCloud.

 Welcome Anshum!

 It's tradition to introduce yourself with a short bio :)

 --
 - Mark

 http://about.me/markrmiller




-- 
Han Jiang

Team of Search Engine and Web Mining,
School of Electronic Engineering and Computer Science,
Peking University, China


Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Erick Erickson
Welcome Anshum!


On Sun, Feb 16, 2014 at 5:02 PM, Han Jiang h...@apache.org wrote:

 Welcome Anshum!


 On Mon, Feb 17, 2014 at 6:33 AM, Mark Miller markrmil...@gmail.comwrote:

 Hey everybody!

 The Lucene PMC is happy to welcome Anshum Gupta as a committer on the
 Lucene / Solr project.  Anshum has contributed to a number of issues for
 the project, especially around SolrCloud.

 Welcome Anshum!

 It's tradition to introduce yourself with a short bio :)

 --
 - Mark

 http://about.me/markrmiller




 --
 Han Jiang

 Team of Search Engine and Web Mining,
 School of Electronic Engineering and Computer Science,
 Peking University, China



Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Joel Bernstein
Welcome Anshum!

Joel Bernstein
Search Engineer at Heliosearch


On Sun, Feb 16, 2014 at 8:03 PM, Erick Erickson erickerick...@gmail.comwrote:

 Welcome Anshum!


 On Sun, Feb 16, 2014 at 5:02 PM, Han Jiang h...@apache.org wrote:

 Welcome Anshum!


 On Mon, Feb 17, 2014 at 6:33 AM, Mark Miller markrmil...@gmail.comwrote:

 Hey everybody!

 The Lucene PMC is happy to welcome Anshum Gupta as a committer on the
 Lucene / Solr project.  Anshum has contributed to a number of issues
 for the project, especially around SolrCloud.

 Welcome Anshum!

 It's tradition to introduce yourself with a short bio :)

 --
 - Mark

 http://about.me/markrmiller




 --
 Han Jiang

 Team of Search Engine and Web Mining,
 School of Electronic Engineering and Computer Science,
 Peking University, China





Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Robert Muir
Welcome Anshum!


On Sun, Feb 16, 2014 at 5:33 PM, Mark Miller markrmil...@gmail.com wrote:

 Hey everybody!

 The Lucene PMC is happy to welcome Anshum Gupta as a committer on the
 Lucene / Solr project.  Anshum has contributed to a number of issues for
 the project, especially around SolrCloud.

 Welcome Anshum!

 It's tradition to introduce yourself with a short bio :)

 --
 - Mark

 http://about.me/markrmiller



Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Anshum Gupta
Thanks Mark.

I spent most of my life in New Delhi, India other than short stints in
different parts of the country (including living in a beach house on a
tropical island for 3 years when I was young). After spending the last 3
years in Bangalore, I just relocated to San Francisco to be at the
LucidWorks office in the Bay Area. Prior to this I've been a part of the
search teams at A9 (CloudSearch), Cleartrip.com and Naukri.com where I was
involved in designing and developing search and recommendation engines.

These days, I love contributing stuff to Solr, primarily around SolrCloud
and hope to continue to be at least as active towards it.

In my free time I love photography, traveling, eating out and drinking my
beer.


On Sun, Feb 16, 2014 at 2:33 PM, Mark Miller markrmil...@gmail.com wrote:

 Hey everybody!

 The Lucene PMC is happy to welcome Anshum Gupta as a committer on the
 Lucene / Solr project.  Anshum has contributed to a number of issues for
 the project, especially around SolrCloud.

 Welcome Anshum!

 It's tradition to introduce yourself with a short bio :)

 --
 - Mark

 http://about.me/markrmiller




-- 

Anshum Gupta
http://www.anshumgupta.net


[jira] [Resolved] (LUCENE-5408) SerializedDVStrategy -- match geometries in DocValues

2014-02-16 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-5408.
--

   Resolution: Fixed
Fix Version/s: 5.0

 SerializedDVStrategy -- match geometries in DocValues
 -

 Key: LUCENE-5408
 URL: https://issues.apache.org/jira/browse/LUCENE-5408
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5408_GeometryStrategy.patch, 
 LUCENE-5408_SerializedDVStrategy.patch


 I've started work on a new SpatialStrategy implementation I'm tentatively 
 calling SerializedDVStrategy.  It's similar to the [JtsGeoStrategy in 
 Spatial-Solr-Sandbox|https://github.com/ryantxu/spatial-solr-sandbox/tree/master/LSE/src/main/java/org/apache/lucene/spatial/pending/jts]
  but a little different in the details -- certainly faster.  Using Spatial4j 
 0.4's BinaryCodec, it'll serialize the shape to bytes (for polygons this in 
 internally WKB format) and the strategy will put it in a 
 BinaryDocValuesField.  In practice the shape is likely a polygon but it 
 needn't be.  Then I'll implement a Filter that returns a DocIdSetIterator 
 that evaluates a given document passed via advance(docid)) to see if the 
 query shape matches a shape in DocValues. It's improper usage for it to be 
 used in a situation where it will evaluate every document id via nextDoc().  
 And in practice the DocValues format chosen should be a disk resident one 
 since each value tends to be kind of big.
 This spatial strategy in and of itself has no _index_; it's O(N) where N is 
 the number of documents that get passed thru it.  So it should be placed last 
 in the query/filter tree so that the other queries limit the documents it 
 needs to see.  At a minimum, another query/filter to use in conjunction is 
 another SpatialStrategy like RecursivePrefixTreeStrategy.
 Eventually once the PrefixTree grid encoding has a little bit more metadata, 
 it will be possible to further combine the grid  this strategy in such a way 
 that many documents won't need to be checked against the serialized geometry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Ensuring a test uses a codec supporting DocValues

2014-02-16 Thread Smiley, David W.
I wrote a test that requires DocValues. It failed on me once because of the 
Codec randomization chose Lucene3x which doesn’t support DocValues.  What’s the 
best way to adjust my test to assure this doesn’t happen?

What I ended up doing was this:
indexWriterConfig.setCodec( _TestUtil.alwaysDocValuesFormat(new 
Lucene45DocValuesFormat()));
But I don’t like that I hard-coded a particular format.
(FYI the source file is an abstract base test class: SpatialTestCase, method 
newIndexWriterConfig )

Another approach might be to call:
assumeTrue(defaultCodecSupportsDocValues())
Although sometimes the test of course won’t be run at all instead of it 
preferably forcing a compatible format.

Thoughts?

~ David


Only highlight terms that caused a search hit/match

2014-02-16 Thread mike_大雄

3 posts
Hello, 

I have recently been given a requirement to improve document highlights
within our system. Unfortunately, the current functionality gives more of a
best-guess on what terms to highlight vs the actual terms to highlight that
actually did perform the match. A couple examples of issues that were found: 

Nested boolean clause with a term that doesn’t exist ANDed with a term that
does highlights the ignored term in the query 
Text: a b c 
Logical Query: a OR (b AND z) 
Result: *a* *b* c 
Expected: *a* b c 
Nested span query doesn’t maintain the proper positions and offsets 
Text: y z x y z a 
Logical Query: (“x y z”, a) span near 10 
Result: *y* *z* *x* *y* *z* *a* 
Expected: y z *x* *y* *z* *a* 

I am currently using the Highlighter with a QueryScorer and a
SimpleSpanFragmenter. While looking through the code it looks like the
entire query structure is dropped in the WeightedSpanTermExtractor by just
grabbing any positive TermQuery and flattening them all into a simple Map
which is then passed on to highlight all of those terms. I believe this over
simplification of term extraction is the crux of the issue and needs to be
modified in order to produce more “exact” highlights. 

I was brainstorming with a colleague and thought perhaps we can spin up a
MemoryIndex to index that one document and start performing a depth-first
search of all queries within the overall Lucene query graph. At that point
we can start querying the MemoryIndex for leaf queries and start walking
back up the tree, pruning branches that don’t result in a search hit which
results in a map of actual matched query terms. This approach seems pretty
painful but will hopefully produce better matches. I would like to see what
the experts on the mailing list would have to say about this approach or is
there a better way to retrieve the query terms  positions that produced the
match? Or perhaps there is a different Highlighter implementation that
should be used, though our user queries are extremely complex with a lot of
nested queries of various types. 

Thanks, 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Only-highlight-terms-that-caused-a-search-hit-match-tp4117692.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
Welcome on board Anshum, Looking forward to more exciting days

--Noble


On Mon, Feb 17, 2014 at 8:44 AM, Anshum Gupta ans...@anshumgupta.netwrote:

 Thanks Mark.

 I spent most of my life in New Delhi, India other than short stints in
 different parts of the country (including living in a beach house on a
 tropical island for 3 years when I was young). After spending the last 3
 years in Bangalore, I just relocated to San Francisco to be at the
 LucidWorks office in the Bay Area. Prior to this I've been a part of the
 search teams at A9 (CloudSearch), Cleartrip.com and Naukri.com where I was
 involved in designing and developing search and recommendation engines.

 These days, I love contributing stuff to Solr, primarily around SolrCloud
 and hope to continue to be at least as active towards it.

 In my free time I love photography, traveling, eating out and drinking my
 beer.


 On Sun, Feb 16, 2014 at 2:33 PM, Mark Miller markrmil...@gmail.comwrote:

 Hey everybody!

 The Lucene PMC is happy to welcome Anshum Gupta as a committer on the
 Lucene / Solr project.  Anshum has contributed to a number of issues for
 the project, especially around SolrCloud.

 Welcome Anshum!

 It's tradition to introduce yourself with a short bio :)

 --
 - Mark

 http://about.me/markrmiller




 --

 Anshum Gupta
 http://www.anshumgupta.net




-- 
-
Noble Paul


Re: Welcome Anshum Gupta as Lucene/Solr Committer!

2014-02-16 Thread Shalin Shekhar Mangar
Welcome Anshum!

On Mon, Feb 17, 2014 at 4:03 AM, Mark Miller markrmil...@gmail.com wrote:
 Hey everybody!

 The Lucene PMC is happy to welcome Anshum Gupta as a committer on the Lucene
 / Solr project.  Anshum has contributed to a number of issues for the
 project, especially around SolrCloud.

 Welcome Anshum!

 It's tradition to introduce yourself with a short bio :)

 --
 - Mark

 http://about.me/markrmiller



-- 
Regards,
Shalin Shekhar Mangar.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: LUCENE-5092, 2nd try

2014-02-16 Thread mkhludnev
Github user mkhludnev commented on the pull request:

https://github.com/apache/lucene-solr/pull/33#issuecomment-35233474
  
I like it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastruct...@apache.org or file a JIRA ticket with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5092) join: don't expect all filters to be FixedBitSet instances

2014-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903011#comment-13903011
 ] 

ASF GitHub Bot commented on LUCENE-5092:


Github user mkhludnev commented on the pull request:

https://github.com/apache/lucene-solr/pull/33#issuecomment-35233474
  
I like it. 


 join: don't expect all filters to be FixedBitSet instances
 --

 Key: LUCENE-5092
 URL: https://issues.apache.org/jira/browse/LUCENE-5092
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5092.patch


 The join module throws exceptions when the parents filter isn't a 
 FixedBitSet. The reason is that the join module relies on prevSetBit to find 
 the first child document given a parent ID.
 As suggested by Uwe and Paul Elschot on LUCENE-5081, we could fix it by 
 exposing methods in the iterators to iterate backwards. When the join modules 
 gets an iterator which isn't able to iterate backwards, it would just need to 
 dump its content into another DocIdSet that supports backward iteration, 
 FixedBitSet for example.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #591: POMs out of sync

2014-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/591/

1 tests failed.
REGRESSION:  
org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.testDistribSearch

Error Message:
document count mismatch.  control=269 sum(shards)=268 cloudClient=268

Stack Trace:
java.lang.AssertionError: document count mismatch.  control=269 sum(shards)=268 
cloudClient=268
at 
__randomizedtesting.SeedInfo.seed([6815C01AF496ADA6:E9F34E0283C9CD9A]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1230)
at 
org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.doTest(ChaosMonkeyNothingIsSafeTest.java:208)




Build Log:
[...truncated 52398 lines...]
BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-4.x/build.xml:482: 
The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-4.x/build.xml:176: 
The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-4.x/extra-targets.xml:77:
 Java returned: 1

Total time: 139 minutes 24 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org