[jira] [Commented] (SOLR-5416) CollapsingQParserPlugin bug with Tagging

2013-12-17 Thread shruti suri (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851458#comment-13851458
 ] 

shruti suri commented on SOLR-5416:
---

Joel

Please tell me steps to integrate it with svn checkout lucene_solr_4_5_1

Shruti

> CollapsingQParserPlugin bug with Tagging
> 
>
> Key: SOLR-5416
> URL: https://issues.apache.org/jira/browse/SOLR-5416
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.6
>Reporter: David
>Assignee: Joel Bernstein
>  Labels: group, grouping
> Fix For: 5.0, 4.7
>
> Attachments: CollapseQParserPluginPatch-solr-4.5.1.patch, 
> CollapsingQParserPlugin.java, SOLR-5416.patch, SOLR-5416.patch, 
> SOLR-5416.patch, SOLR-5416.patch, SOLR-5416.patch, SOLR-5416.patch, 
> SolrIndexSearcher.java, TestCollapseQParserPlugin.java
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Trying to use CollapsingQParserPlugin with facet tagging throws an exception. 
> {code}
>  ModifiableSolrParams params = new ModifiableSolrParams();
> params.add("q", "*:*");
> params.add("fq", "{!collapse field=group_s}");
> params.add("defType", "edismax");
> params.add("bf", "field(test_ti)");
> params.add("fq","{!tag=test_ti}test_ti:5");
> params.add("facet","true");
> params.add("facet.field","{!ex=test_ti}test_ti");
> assertQ(req(params), "*[count(//doc)=1]", 
> "//doc[./int[@name='test_ti']='5']");
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5372) IntArray toString has O(n^2) performance

2013-12-17 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-5372:


Fix Version/s: 4.7
   5.0

> IntArray toString has O(n^2) performance
> 
>
> Key: LUCENE-5372
> URL: https://issues.apache.org/jira/browse/LUCENE-5372
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Joshua Hartman
>Priority: Minor
> Fix For: 5.0, 4.7
>
> Attachments: 5372.patch
>
>
> This is pretty minor, but I found a few issues with the toString 
> implementations while looking through the facet data structures.
> The most egregious is the use of string concatenation in the IntArray class. 
> I have fixed that using StringBuilders. I also noticed that other classes 
> were using StringBuffer instead of StringBuilder. According to the javadoc,
> "This class is designed for use as a drop-in replacement for StringBuffer in 
> places where the string buffer was being used by a single thread (as is 
> generally the case). Where possible, it is recommended that this class be 
> used in preference to StringBuffer as it will be faster under most 
> implementations."



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5372) IntArray toString has O(n^2) performance

2013-12-17 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851447#comment-13851447
 ] 

Dawid Weiss commented on LUCENE-5372:
-

Looks good to me and I think it's applicable to 4.x and 5.x (StringBuilder 
requires Java >= 1.5 but both of these branches do, right)?

> IntArray toString has O(n^2) performance
> 
>
> Key: LUCENE-5372
> URL: https://issues.apache.org/jira/browse/LUCENE-5372
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Joshua Hartman
>Priority: Minor
> Fix For: 5.0, 4.7
>
> Attachments: 5372.patch
>
>
> This is pretty minor, but I found a few issues with the toString 
> implementations while looking through the facet data structures.
> The most egregious is the use of string concatenation in the IntArray class. 
> I have fixed that using StringBuilders. I also noticed that other classes 
> were using StringBuffer instead of StringBuilder. According to the javadoc,
> "This class is designed for use as a drop-in replacement for StringBuffer in 
> places where the string buffer was being used by a single thread (as is 
> generally the case). Where possible, it is recommended that this class be 
> used in preference to StringBuffer as it will be faster under most 
> implementations."



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5535) The "partialResults" header should be set for shards that error out using shards.tolerant

2013-12-17 Thread Steve Davids (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Davids updated SOLR-5535:
---

Attachment: SOLR-5535.patch

Attached a patch which reports partial results for both standard and grouped 
search requests.

> The "partialResults" header should be set for shards that error out using 
> shards.tolerant
> -
>
> Key: SOLR-5535
> URL: https://issues.apache.org/jira/browse/SOLR-5535
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.5
>Reporter: Steve Davids
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5535.patch
>
>
> Currently the only way to know if shards error out while using the 
> shards.tolerant flag is to set the shards.info flag and iterate through each 
> shard's info to see if an error is reported.
> The "partialResults" response header value should be set when errors are 
> detected from distributed searches. This header value is currently being used 
> by the timeAllowed request parameter if shards don't respond in time. This 
> change will provide a more consistent mechanism to detect distributed search 
> errors.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5543) solr.xml duplicat eentries after SWAP 4.6

2013-12-17 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851329#comment-13851329
 ] 

Shawn Heisey commented on SOLR-5543:


The reason I think it's critical to include in 4.6.1 is this:  Although 
everything works perfectly after the problem appears, as soon as you restart 
Solr, the duplicate entries cause exceptions and cores don't start up properly. 
 That was my experience when I upgraded my dev server to the released 4.6.0, 
rebuilt the index (resulting in seven core swaps), and then later restarted 
Solr for some config changes.  It's not standard procedure to restart Solr 
after rebuilding, but it does occasionally happen.  It happens a LOT on my dev 
server, where I try out new configs.

If I were to do a PERSIST CoreAdmin action after each swap, would it 
effectively fix the problem even on an unpatched 4.6.0?

TL;DR: I do have the ability to instead do a "persist all" after all of the 
swaps, because I separately create and track per-server objects for CoreAdmin 
SolrJ calls.  Although I end up with 28 HttpSolrServer objects for updates and 
queries with my production build application, I only create one CoreAdmin 
server object for each host/port combination, instead of an additional 28 
objects.  I don't know what the incremental size of HttpSolrServer objects is, 
but I do try to be careful about memory.


> solr.xml duplicat eentries after SWAP 4.6
> -
>
> Key: SOLR-5543
> URL: https://issues.apache.org/jira/browse/SOLR-5543
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Bill Bell
>Assignee: Alan Woodward
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5543.patch
>
>
> We are having issues with SWAP CoreAdmin in 4.6.
> Using legacy solr.xml we issue a COreodmin SWAP, and we want it persistent. 
> It has been running flawless since 4.5. Now it creates duplicate lines in 
> solr.xml.
> Even the example multi core schema in doesn't work with persistent="true" - 
> it creates duplicate lines in solr.xml.
>  
>  transient="false"/>
>  instanceDir="citystateprovider" transient="false"/>
>  transient="false"/>
>  transient="false"/>
>  instanceDir="inactiveproviders" transient="false"/>
>  transient="false"/>
>  loadOnStartup="true" transient="false"/>
>  transient="false"/>
>  transient="false"/>
>  instanceDir="portalprovider" transient="false"/>
>  transient="false"/>
>  transient="false"/>
>  instanceDir="providersearch" transient="false"/>
>  instanceDir="tridioncomponents" transient="false"/>
>  transient="false"/>
>  loadOnStartup="true" transient="false"/>
> 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5372) IntArray toString has O(n^2) performance

2013-12-17 Thread Joshua Hartman (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Hartman updated LUCENE-5372:
---

Attachment: 5372.patch

I was using git-svn, but I believe the patch should still apply with patch -p1

> IntArray toString has O(n^2) performance
> 
>
> Key: LUCENE-5372
> URL: https://issues.apache.org/jira/browse/LUCENE-5372
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Joshua Hartman
>Priority: Minor
> Attachments: 5372.patch
>
>
> This is pretty minor, but I found a few issues with the toString 
> implementations while looking through the facet data structures.
> The most egregious is the use of string concatenation in the IntArray class. 
> I have fixed that using StringBuilders. I also noticed that other classes 
> were using StringBuffer instead of StringBuilder. According to the javadoc,
> "This class is designed for use as a drop-in replacement for StringBuffer in 
> places where the string buffer was being used by a single thread (as is 
> generally the case). Where possible, it is recommended that this class be 
> used in preference to StringBuffer as it will be faster under most 
> implementations."



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5372) IntArray toString has O(n^2) performance

2013-12-17 Thread Joshua Hartman (JIRA)
Joshua Hartman created LUCENE-5372:
--

 Summary: IntArray toString has O(n^2) performance
 Key: LUCENE-5372
 URL: https://issues.apache.org/jira/browse/LUCENE-5372
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Joshua Hartman
Priority: Minor


This is pretty minor, but I found a few issues with the toString 
implementations while looking through the facet data structures.

The most egregious is the use of string concatenation in the IntArray class. I 
have fixed that using StringBuilders. I also noticed that other classes were 
using StringBuffer instead of StringBuilder. According to the javadoc,

"This class is designed for use as a drop-in replacement for StringBuffer in 
places where the string buffer was being used by a single thread (as is 
generally the case). Where possible, it is recommended that this class be used 
in preference to StringBuffer as it will be faster under most implementations."



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5543) solr.xml duplicat eentries after SWAP 4.6

2013-12-17 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5543:
--

Fix Version/s: 5.0

> solr.xml duplicat eentries after SWAP 4.6
> -
>
> Key: SOLR-5543
> URL: https://issues.apache.org/jira/browse/SOLR-5543
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Bill Bell
>Assignee: Alan Woodward
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5543.patch
>
>
> We are having issues with SWAP CoreAdmin in 4.6.
> Using legacy solr.xml we issue a COreodmin SWAP, and we want it persistent. 
> It has been running flawless since 4.5. Now it creates duplicate lines in 
> solr.xml.
> Even the example multi core schema in doesn't work with persistent="true" - 
> it creates duplicate lines in solr.xml.
>  
>  transient="false"/>
>  instanceDir="citystateprovider" transient="false"/>
>  transient="false"/>
>  transient="false"/>
>  instanceDir="inactiveproviders" transient="false"/>
>  transient="false"/>
>  loadOnStartup="true" transient="false"/>
>  transient="false"/>
>  transient="false"/>
>  instanceDir="portalprovider" transient="false"/>
>  transient="false"/>
>  transient="false"/>
>  instanceDir="providersearch" transient="false"/>
>  instanceDir="tridioncomponents" transient="false"/>
>  transient="false"/>
>  loadOnStartup="true" transient="false"/>
> 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")

2013-12-17 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5463:
---

Attachment: SOLR-5463.patch

DeepPagingComponent is dead, long live CursorMark!

This patch removes DeepPagingComponent completely, as all of the necessary 
functionality is now integrated nicely into various places of QueryComponent, 
ResponseBuilde, and SolrIndexSearcher.

there are still plenty of nocommits, but those are all mostly around test 
improvements and/or code paths that i want to give some more review before 
committing.

I'm getting ready to go on vacation for a week+ so now is a _really_ good time 
for people to take the patch for a spin and try out with their use cases 
(nudge, nudge) w/o needing to worry that i'll upload a new one as soon as they 
download it.

> Provide cursor/token based "searchAfter" support that works with arbitrary 
> sorting (ie: "deep paging")
> --
>
> Key: SOLR-5463
> URL: https://issues.apache.org/jira/browse/SOLR-5463
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
>Assignee: Hoss Man
> Attachments: SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man__MissingStringLastComparatorSource.patch
>
>
> I'd like to revist a solution to the problem of "deep paging" in Solr, 
> leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
> at the lucene level: require the clients to provide back a token indicating 
> the sort values of the last document seen on the previous "page".  This is 
> similar to the "cursor" model I've seen in several other REST APIs that 
> support "pagnation" over a large sets of results (notable the twitter API and 
> it's "since_id" param) except that we'll want something that works with 
> arbitrary multi-level sort critera that can be either ascending or descending.
> SOLR-1726 laid some initial ground work here and was commited quite a while 
> ago, but the key bit of argument parsing to leverage it was commented out due 
> to some problems (see comments in that issue).  It's also somewhat out of 
> date at this point: at the time it was commited, IndexSearcher only supported 
> searchAfter for simple scores, not arbitrary field sorts; and the params 
> added in SOLR-1726 suffer from this limitation as well.
> ---
> I think it would make sense to start fresh with a new issue with a focus on 
> ensuring that we have deep paging which:
> * supports arbitrary field sorts in addition to sorting by score
> * works in distributed mode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5528) Change New Suggester Response and minor cleanups

2013-12-17 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-5528.
---

Resolution: Fixed

Thanks Areek!

> Change New Suggester Response and minor cleanups
> 
>
> Key: SOLR-5528
> URL: https://issues.apache.org/jira/browse/SOLR-5528
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: Areek Zillur
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5528.patch
>
>
> It would be nice to have a simplified response format for the new Suggester 
> Component. 
> The proposed format is as follows:
> XML: 
> {code}
> 
> 
>
>   0
>   32
>
>build
>
>   
>  1
>  
> 
>electronics and computer1
>2199
>
> 
>  
>   
>
> 
> {code}
> JSON:
> {code}
> {
> "responseHeader": {
> "status": 0,
> "QTime": 30
> },
> "command": "build",
> "suggest": {
> "ele": {
> "numFound": 1,
> "suggestions": [
> {
> "term": "electronics and computer1",
> "weight": 2199,
> "payload": ""
> }
> ]
> }
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5528) Change New Suggester Response and minor cleanups

2013-12-17 Thread Areek Zillur (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851102#comment-13851102
 ] 

Areek Zillur commented on SOLR-5528:


Thanks for committing this!

> Change New Suggester Response and minor cleanups
> 
>
> Key: SOLR-5528
> URL: https://issues.apache.org/jira/browse/SOLR-5528
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: Areek Zillur
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5528.patch
>
>
> It would be nice to have a simplified response format for the new Suggester 
> Component. 
> The proposed format is as follows:
> XML: 
> {code}
> 
> 
>
>   0
>   32
>
>build
>
>   
>  1
>  
> 
>electronics and computer1
>2199
>
> 
>  
>   
>
> 
> {code}
> JSON:
> {code}
> {
> "responseHeader": {
> "status": 0,
> "QTime": 30
> },
> "command": "build",
> "suggest": {
> "ele": {
> "numFound": 1,
> "suggestions": [
> {
> "term": "electronics and computer1",
> "weight": 2199,
> "payload": ""
> }
> ]
> }
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5528) Change New Suggester Response and minor cleanups

2013-12-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851098#comment-13851098
 ] 

ASF subversion and git services commented on SOLR-5528:
---

Commit 1551759 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1551759 ]

SOLR-5528: improve response format of the new SuggestComponent

> Change New Suggester Response and minor cleanups
> 
>
> Key: SOLR-5528
> URL: https://issues.apache.org/jira/browse/SOLR-5528
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: Areek Zillur
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5528.patch
>
>
> It would be nice to have a simplified response format for the new Suggester 
> Component. 
> The proposed format is as follows:
> XML: 
> {code}
> 
> 
>
>   0
>   32
>
>build
>
>   
>  1
>  
> 
>electronics and computer1
>2199
>
> 
>  
>   
>
> 
> {code}
> JSON:
> {code}
> {
> "responseHeader": {
> "status": 0,
> "QTime": 30
> },
> "command": "build",
> "suggest": {
> "ele": {
> "numFound": 1,
> "suggestions": [
> {
> "term": "electronics and computer1",
> "weight": 2199,
> "payload": ""
> }
> ]
> }
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5528) Change New Suggester Response and minor cleanups

2013-12-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851075#comment-13851075
 ] 

ASF subversion and git services commented on SOLR-5528:
---

Commit 1551753 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1551753 ]

SOLR-5528: improve response format of the new SuggestComponent

> Change New Suggester Response and minor cleanups
> 
>
> Key: SOLR-5528
> URL: https://issues.apache.org/jira/browse/SOLR-5528
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: Areek Zillur
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5528.patch
>
>
> It would be nice to have a simplified response format for the new Suggester 
> Component. 
> The proposed format is as follows:
> XML: 
> {code}
> 
> 
>
>   0
>   32
>
>build
>
>   
>  1
>  
> 
>electronics and computer1
>2199
>
> 
>  
>   
>
> 
> {code}
> JSON:
> {code}
> {
> "responseHeader": {
> "status": 0,
> "QTime": 30
> },
> "command": "build",
> "suggest": {
> "ele": {
> "numFound": 1,
> "suggestions": [
> {
> "term": "electronics and computer1",
> "weight": 2199,
> "payload": ""
> }
> ]
> }
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 6907 - Failure!

2013-12-17 Thread builder
Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/6907/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads

Error Message:
Captured an uncaught exception in thread: Thread[id=193, name=Thread-133, 
state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=193, name=Thread-133, state=RUNNABLE, 
group=TGRP-TestIndexWriterWithThreads]
Caused by: java.lang.RuntimeException: 
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at __randomizedtesting.SeedInfo.seed([3FAF37E1AFFB2502]:0)
at 
org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619)
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is 
closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:645)
at 
org.apache.lucene.index.IndexWriter.numDeletedDocs(IndexWriter.java:622)
at org.apache.lucene.index.IndexWriter.segString(IndexWriter.java:4265)
at 
org.apache.lucene.index.IndexWriter.publishFlushedSegment(IndexWriter.java:2324)
at 
org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.publishFlushedSegment(DocumentsWriterFlushQueue.java:198)
at 
org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.finishFlush(DocumentsWriterFlushQueue.java:213)
at 
org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:249)
at 
org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:116)
at 
org.apache.lucene.index.DocumentsWriterFlushQueue.forcePurge(DocumentsWriterFlushQueue.java:138)
at 
org.apache.lucene.index.DocumentsWriter.purgeBuffer(DocumentsWriter.java:185)
at org.apache.lucene.index.IndexWriter.purge(IndexWriter.java:4634)
at 
org.apache.lucene.index.DocumentsWriter$ForcedPurgeEvent.process(DocumentsWriter.java:701)
at 
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4665)
at 
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4657)
at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1067)
at 
org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2106)
at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2024)
at 
org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:575)




Build Log:
[...truncated 782 lines...]
   [junit4] Suite: org.apache.lucene.index.TestIndexWriterWithThreads
   [junit4]   2> 17 déc. 2013 19:41:23 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
   [junit4]   2> ATTENTION: Uncaught exception in thread: 
Thread[Thread-133,5,TGRP-TestIndexWriterWithThreads]
   [junit4]   2> java.lang.RuntimeException: 
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
   [junit4]   2>at 
__randomizedtesting.SeedInfo.seed([3FAF37E1AFFB2502]:0)
   [junit4]   2>at 
org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619)
   [junit4]   2> Caused by: org.apache.lucene.store.AlreadyClosedException: 
this IndexWriter is closed
   [junit4]   2>at 
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:645)
   [junit4]   2>at 
org.apache.lucene.index.IndexWriter.numDeletedDocs(IndexWriter.java:622)
   [junit4]   2>at 
org.apache.lucene.index.IndexWriter.segString(IndexWriter.java:4265)
   [junit4]   2>at 
org.apache.lucene.index.IndexWriter.publishFlushedSegment(IndexWriter.java:2324)
   [junit4]   2>at 
org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.publishFlushedSegment(DocumentsWriterFlushQueue.java:198)
   [junit4]   2>at 
org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.finishFlush(DocumentsWriterFlushQueue.java:213)
   [junit4]   2>at 
org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:249)
   [junit4]   2>at 
org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:116)
   [junit4]   2>at 
org.apache.lucene.index.DocumentsWriterFlushQueue.forcePurge(DocumentsWriterFlushQueue.java:138)
   [junit4]   2>at 
org.apache.lucene.index.DocumentsWriter.purgeBuffer(DocumentsWriter.java:185)
   [junit4]   2>at 
org.apache.lucene.index.IndexWriter.purge(IndexWriter.java:4634)
   [junit4]   2>at 
org.apache.lucene.index.DocumentsWriter$ForcedPurgeEvent.process(DocumentsWriter.java:701)
   [junit4]   2>at 
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4665)
   [junit4]   2>at 
org.apache.lucene.index.Inde

Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 6907 - Failure!

2013-12-17 Thread Michael McCandless
It actually repros with java7 too, if you beast it.

It's a scary failure: it seems like, after closing, the very last
thing IW does in closeInternal's finally clause is to process events,
and one of those events is trying to flush a segment ... weird.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Dec 17, 2013 at 3:21 PM, Robert Muir  wrote:
> I just want to point out this confused me a bit. the jenkins job says
> java7 but it really runs java6.
>
> On Tue, Dec 17, 2013 at 1:43 PM,   wrote:
>> Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/6907/
>>
>> 1 tests failed.
>> REGRESSION:  
>> org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads
>>
>> Error Message:
>> Captured an uncaught exception in thread: Thread[id=193, name=Thread-133, 
>> state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads]
>>
>> Stack Trace:
>> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
>> uncaught exception in thread: Thread[id=193, name=Thread-133, 
>> state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads]
>> Caused by: java.lang.RuntimeException: 
>> org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
>> at __randomizedtesting.SeedInfo.seed([3FAF37E1AFFB2502]:0)
>> at 
>> org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619)
>> Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter 
>> is closed
>> at 
>> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:645)
>> at 
>> org.apache.lucene.index.IndexWriter.numDeletedDocs(IndexWriter.java:622)
>> at 
>> org.apache.lucene.index.IndexWriter.segString(IndexWriter.java:4265)
>> at 
>> org.apache.lucene.index.IndexWriter.publishFlushedSegment(IndexWriter.java:2324)
>> at 
>> org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.publishFlushedSegment(DocumentsWriterFlushQueue.java:198)
>> at 
>> org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.finishFlush(DocumentsWriterFlushQueue.java:213)
>> at 
>> org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:249)
>> at 
>> org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:116)
>> at 
>> org.apache.lucene.index.DocumentsWriterFlushQueue.forcePurge(DocumentsWriterFlushQueue.java:138)
>> at 
>> org.apache.lucene.index.DocumentsWriter.purgeBuffer(DocumentsWriter.java:185)
>> at org.apache.lucene.index.IndexWriter.purge(IndexWriter.java:4634)
>> at 
>> org.apache.lucene.index.DocumentsWriter$ForcedPurgeEvent.process(DocumentsWriter.java:701)
>> at 
>> org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4665)
>> at 
>> org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4657)
>> at 
>> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1067)
>> at 
>> org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2106)
>> at 
>> org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2024)
>> at 
>> org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:575)
>>
>>
>>
>>
>> Build Log:
>> [...truncated 782 lines...]
>>[junit4] Suite: org.apache.lucene.index.TestIndexWriterWithThreads
>>[junit4]   2> 17 déc. 2013 19:41:23 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>>  uncaughtException
>>[junit4]   2> ATTENTION: Uncaught exception in thread: 
>> Thread[Thread-133,5,TGRP-TestIndexWriterWithThreads]
>>[junit4]   2> java.lang.RuntimeException: 
>> org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
>>[junit4]   2>at 
>> __randomizedtesting.SeedInfo.seed([3FAF37E1AFFB2502]:0)
>>[junit4]   2>at 
>> org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619)
>>[junit4]   2> Caused by: org.apache.lucene.store.AlreadyClosedException: 
>> this IndexWriter is closed
>>[junit4]   2>at 
>> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:645)
>>[junit4]   2>at 
>> org.apache.lucene.index.IndexWriter.numDeletedDocs(IndexWriter.java:622)
>>[junit4]   2>at 
>> org.apache.lucene.index.IndexWriter.segString(IndexWriter.java:4265)
>>[junit4]   2>at 
>> org.apache.lucene.index.IndexWriter.publishFlushedSegment(IndexWriter.java:2324)
>>[junit4]   2>at 
>> org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.publishFlushedSegment(DocumentsWriterFlushQueue.java:198)
>>[junit4]   2>at 
>> org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.finishFlush(DocumentsWriterFlushQueue.java:213)
>>[junit4]   2>at 
>> org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushT

Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 6907 - Failure!

2013-12-17 Thread Robert Muir
I just want to point out this confused me a bit. the jenkins job says
java7 but it really runs java6.

On Tue, Dec 17, 2013 at 1:43 PM,   wrote:
> Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/6907/
>
> 1 tests failed.
> REGRESSION:  
> org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads
>
> Error Message:
> Captured an uncaught exception in thread: Thread[id=193, name=Thread-133, 
> state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads]
>
> Stack Trace:
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=193, name=Thread-133, state=RUNNABLE, 
> group=TGRP-TestIndexWriterWithThreads]
> Caused by: java.lang.RuntimeException: 
> org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
> at __randomizedtesting.SeedInfo.seed([3FAF37E1AFFB2502]:0)
> at 
> org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619)
> Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter 
> is closed
> at 
> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:645)
> at 
> org.apache.lucene.index.IndexWriter.numDeletedDocs(IndexWriter.java:622)
> at 
> org.apache.lucene.index.IndexWriter.segString(IndexWriter.java:4265)
> at 
> org.apache.lucene.index.IndexWriter.publishFlushedSegment(IndexWriter.java:2324)
> at 
> org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.publishFlushedSegment(DocumentsWriterFlushQueue.java:198)
> at 
> org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.finishFlush(DocumentsWriterFlushQueue.java:213)
> at 
> org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:249)
> at 
> org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:116)
> at 
> org.apache.lucene.index.DocumentsWriterFlushQueue.forcePurge(DocumentsWriterFlushQueue.java:138)
> at 
> org.apache.lucene.index.DocumentsWriter.purgeBuffer(DocumentsWriter.java:185)
> at org.apache.lucene.index.IndexWriter.purge(IndexWriter.java:4634)
> at 
> org.apache.lucene.index.DocumentsWriter$ForcedPurgeEvent.process(DocumentsWriter.java:701)
> at 
> org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4665)
> at 
> org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4657)
> at 
> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1067)
> at 
> org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2106)
> at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2024)
> at 
> org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:575)
>
>
>
>
> Build Log:
> [...truncated 782 lines...]
>[junit4] Suite: org.apache.lucene.index.TestIndexWriterWithThreads
>[junit4]   2> 17 déc. 2013 19:41:23 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
>[junit4]   2> ATTENTION: Uncaught exception in thread: 
> Thread[Thread-133,5,TGRP-TestIndexWriterWithThreads]
>[junit4]   2> java.lang.RuntimeException: 
> org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
>[junit4]   2>at 
> __randomizedtesting.SeedInfo.seed([3FAF37E1AFFB2502]:0)
>[junit4]   2>at 
> org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619)
>[junit4]   2> Caused by: org.apache.lucene.store.AlreadyClosedException: 
> this IndexWriter is closed
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:645)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.numDeletedDocs(IndexWriter.java:622)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.segString(IndexWriter.java:4265)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.publishFlushedSegment(IndexWriter.java:2324)
>[junit4]   2>at 
> org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.publishFlushedSegment(DocumentsWriterFlushQueue.java:198)
>[junit4]   2>at 
> org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.finishFlush(DocumentsWriterFlushQueue.java:213)
>[junit4]   2>at 
> org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:249)
>[junit4]   2>at 
> org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:116)
>[junit4]   2>at 
> org.apache.lucene.index.DocumentsWriterFlushQueue.forcePurge(DocumentsWriterFlushQueue.java:138)
>[junit4]   2>at 
> org.apache.lucene.index.DocumentsWriter.purgeBuffer(DocumentsWriter.java:185)
>[junit4]   2>at 
> org.apa

[jira] [Commented] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

2013-12-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850713#comment-13850713
 ] 

Michael McCandless commented on LUCENE-5354:


Thanks Remi, patch looks great!

Can you move that {{boolean finished}} inside the {{if (lastToken != null)}}?  
(If there was no lastToken then we should not be calling 
{{offsetEnd.endOffset}}).

Can we leave AnalyzingInfixSuggester with DOCS_ONLY?  I.e., open up a method 
(maybe getTextFieldType?) that the subclass would override and set to 
DOCS_AND_FREQS_AND_POSITIONS.

In createCoefficient, instead of splitting the incoming key on space, I think 
you should ask the analyzer to do so?  In fact, since the lookup (in super) 
already did that (break into tokens, figure out if last token is a "prefix" or 
not), maybe we can just pass that down to createResult?

If the query has more than one term, it looks like you only use the first?  
Maybe instead we should visit all the terms and record which one has the lowest 
position?

Have you done any performance testing?  Visiting term vectors for each hit can 
be costly.  It should be more performant to pull a DocsAndPositionsEnum up 
front and then do .advance to each (sorted) docID to get the position ... but 
this is likely more complex (it inverts the "stride", so you'd do term by term 
on the outer loop, then
docs on the inner loop, vs the opposite that you have now).

key.toString() can be pulled out of the while loop and done once up front.

Why do you use key.toString().contains(docTerm) for the finished case? Won't 
that result in false positives, e.g. if key is "foobar" and docTerm is "oba"?

Can you rewrite the embedded ternary operator in the LookUpComparator to just 
use simple if statements?  I think that's more readable...


> Blended score in AnalyzingInfixSuggester
> 
>
> Key: LUCENE-5354
> URL: https://issues.apache.org/jira/browse/LUCENE-5354
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spellchecker
>Affects Versions: 4.4
>Reporter: Remi Melisson
>Priority: Minor
>  Labels: suggester
> Attachments: LUCENE-5354.patch
>
>
> I'm working on a custom suggester derived from the AnalyzingInfix. I require 
> what is called a "blended score" (//TODO ln.399 in AnalyzingInfixSuggester) 
> to transform the suggestion weights depending on the position of the searched 
> term(s) in the text.
> Right now, I'm using an easy solution :
> If I want 10 suggestions, then I search against the current ordered index for 
> the 100 first results and transform the weight :
> bq. a) by using the term position in the text (found with TermVector and 
> DocsAndPositionsEnum)
> or
> bq. b) by multiplying the weight by the score of a SpanQuery that I add when 
> searching
> and return the updated 10 most weighted suggestions.
> Since we usually don't need to suggest so many things, the bigger search + 
> rescoring overhead is not so significant but I agree that this is not the 
> most elegant solution.
> We could include this factor (here the position of the term) directly into 
> the index.
> So, I can contribute to this if you think it's worth adding it.
> Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a 
> dedicated class ?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5543) solr.xml duplicat eentries after SWAP 4.6

2013-12-17 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850591#comment-13850591
 ] 

Alan Woodward commented on SOLR-5543:
-

It doesn't seem to actually make a difference to the running application, 
because the ConfigSolr object deduplicates the entries internally, which is 
also why it only writes out the latest swap (the current set, plus the two 
extra core definitions).  But it makes it confusing for people who are editing 
solr.xml offline.

> solr.xml duplicat eentries after SWAP 4.6
> -
>
> Key: SOLR-5543
> URL: https://issues.apache.org/jira/browse/SOLR-5543
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Bill Bell
>Assignee: Alan Woodward
> Fix For: 4.7
>
> Attachments: SOLR-5543.patch
>
>
> We are having issues with SWAP CoreAdmin in 4.6.
> Using legacy solr.xml we issue a COreodmin SWAP, and we want it persistent. 
> It has been running flawless since 4.5. Now it creates duplicate lines in 
> solr.xml.
> Even the example multi core schema in doesn't work with persistent="true" - 
> it creates duplicate lines in solr.xml.
>  
>  transient="false"/>
>  instanceDir="citystateprovider" transient="false"/>
>  transient="false"/>
>  transient="false"/>
>  instanceDir="inactiveproviders" transient="false"/>
>  transient="false"/>
>  loadOnStartup="true" transient="false"/>
>  transient="false"/>
>  transient="false"/>
>  instanceDir="portalprovider" transient="false"/>
>  transient="false"/>
>  transient="false"/>
>  instanceDir="providersearch" transient="false"/>
>  instanceDir="tridioncomponents" transient="false"/>
>  transient="false"/>
>  loadOnStartup="true" transient="false"/>
> 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5543) solr.xml duplicat eentries after SWAP 4.6

2013-12-17 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850584#comment-13850584
 ] 

Shawn Heisey commented on SOLR-5543:


[~romseygeek] ... I hear we're doing a 4.6.1 in the near future.  IMHO this is 
a strong candidate for backporting.

Interesting detail, which I don't think needs to change your patch at all:  My 
'full index rebuild' process does seven core swaps.  It appears that the 
duplication introduced by a core swap is eliminated by a subsequent core swap 
-- there were only extra entries for the last core swapped.


> solr.xml duplicat eentries after SWAP 4.6
> -
>
> Key: SOLR-5543
> URL: https://issues.apache.org/jira/browse/SOLR-5543
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Bill Bell
>Assignee: Alan Woodward
> Fix For: 4.7
>
> Attachments: SOLR-5543.patch
>
>
> We are having issues with SWAP CoreAdmin in 4.6.
> Using legacy solr.xml we issue a COreodmin SWAP, and we want it persistent. 
> It has been running flawless since 4.5. Now it creates duplicate lines in 
> solr.xml.
> Even the example multi core schema in doesn't work with persistent="true" - 
> it creates duplicate lines in solr.xml.
>  
>  transient="false"/>
>  instanceDir="citystateprovider" transient="false"/>
>  transient="false"/>
>  transient="false"/>
>  instanceDir="inactiveproviders" transient="false"/>
>  transient="false"/>
>  loadOnStartup="true" transient="false"/>
>  transient="false"/>
>  transient="false"/>
>  instanceDir="portalprovider" transient="false"/>
>  transient="false"/>
>  transient="false"/>
>  instanceDir="providersearch" transient="false"/>
>  instanceDir="tridioncomponents" transient="false"/>
>  transient="false"/>
>  loadOnStartup="true" transient="false"/>
> 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5371) Range faceting should use O(log(N)) search per hit

2013-12-17 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5371:
---

Attachment: LUCENE-5371.patch

Patch (applies to the LUCENE-5339 branch), implementing a simple
segment tree solution.  The per-hit increment is a binary search,
O(log(N)).

This only handles longs right now; I still need to do doubles.  Maybe
I can convert all doubles to longs and then reuse the LongRangeCounter
class for doubles...

I didn't use ASM; just "normal" java sources.


> Range faceting should use O(log(N)) search per hit
> --
>
> Key: LUCENE-5371
> URL: https://issues.apache.org/jira/browse/LUCENE-5371
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5371.patch
>
>
> Today, Lucene's dynamic range faceting uses a simple linear search to
> find which ranges match, but there are known data structures to do
> this in log(N) time.  I played with segment trees and wrote up a blog
> post here:
>   
> http://blog.mikemccandless.com/2013/12/fast-range-faceting-using-segment-trees.html
> O(N) cost is actually OK when number of ranges is smallish, which is
> typical for facet use cases, but then scales badly if there are many
> ranges.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5371) Range faceting should use O(log(N)) search per hit

2013-12-17 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5371:
--

 Summary: Range faceting should use O(log(N)) search per hit
 Key: LUCENE-5371
 URL: https://issues.apache.org/jira/browse/LUCENE-5371
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.7


Today, Lucene's dynamic range faceting uses a simple linear search to
find which ranges match, but there are known data structures to do
this in log(N) time.  I played with segment trees and wrote up a blog
post here:

  
http://blog.mikemccandless.com/2013/12/fast-range-faceting-using-segment-trees.html

O(N) cost is actually OK when number of ranges is smallish, which is
typical for facet use cases, but then scales badly if there are many
ranges.




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: New JDK 8 tool: jdeps

2013-12-17 Thread Dalibor Topic
On 12/17/13 2:48 PM, Uwe Schindler wrote:
> Hi Rory,
> 
> Thanks, too! Maybe we can merge those tools in the future! If you have a 
> pointer to the source code of your tool (I assume it also uses ASM 
> internally), I can have a look.

See 
http://hg.openjdk.java.net/jdk8/jdk8/langtools/file/tip/src/share/classes/com/sun/tools/jdeps/

cheers,
dalibor topic
 


-- 
Oracle 
Dalibor Topic | Principal Product Manager
Phone: +494089091214  | Mobile: +491737185961 

Oracle Java Platform Group

ORACLE Deutschland B.V. & Co. KG | Kühnehöfe 5 | 22761 Hamburg

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603
Geschäftsführer: Jürgen Kunz

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher

Green Oracle  Oracle is committed to 
developing practices and products that help protect the environment

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5416) CollapsingQParserPlugin bug with Tagging

2013-12-17 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850479#comment-13850479
 ] 

Joel Bernstein commented on SOLR-5416:
--

Hi Shruti,

I suspect you have certain parts of the patch already applied. David created 
this patch, but I believe it applies the CollapsingQParserPlugin all the way 
back to the original commit. 

Joel

> CollapsingQParserPlugin bug with Tagging
> 
>
> Key: SOLR-5416
> URL: https://issues.apache.org/jira/browse/SOLR-5416
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.6
>Reporter: David
>Assignee: Joel Bernstein
>  Labels: group, grouping
> Fix For: 5.0, 4.7
>
> Attachments: CollapseQParserPluginPatch-solr-4.5.1.patch, 
> CollapsingQParserPlugin.java, SOLR-5416.patch, SOLR-5416.patch, 
> SOLR-5416.patch, SOLR-5416.patch, SOLR-5416.patch, SOLR-5416.patch, 
> SolrIndexSearcher.java, TestCollapseQParserPlugin.java
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Trying to use CollapsingQParserPlugin with facet tagging throws an exception. 
> {code}
>  ModifiableSolrParams params = new ModifiableSolrParams();
> params.add("q", "*:*");
> params.add("fq", "{!collapse field=group_s}");
> params.add("defType", "edismax");
> params.add("bf", "field(test_ti)");
> params.add("fq","{!tag=test_ti}test_ti:5");
> params.add("facet","true");
> params.add("facet.field","{!ex=test_ti}test_ti");
> assertQ(req(params), "*[count(//doc)=1]", 
> "//doc[./int[@name='test_ti']='5']");
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: New JDK 8 tool: jdeps

2013-12-17 Thread Uwe Schindler
Hi Rory,

Thanks, too! Maybe we can merge those tools in the future! If you have a 
pointer to the source code of your tool (I assume it also uses ASM internally), 
I can have a look.

About your initial question, I can confirm: Lucene and Solr do not use any 
internal JDK APIs directly. If we need to use them (like Unsafe to discover 
some internal JVM properties for RAM accounting or to unmap mmapped files), it 
is done solely using reflection and with proper fall back and error handling.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Rory O'Donnell [mailto:rory.odonn...@oracle.com]
> Sent: Tuesday, December 17, 2013 2:20 PM
> To: Uwe Schindler
> Cc: 'Rory O'Donnell'; 'Dawid Weiss'; dev@lucene.apache.org;
> 'TOPIC,DALIBOR'; 'BORG,CECILIA'; balchandra.vai...@oracle.com
> Subject: Re: New JDK 8 tool: jdeps
> 
> Thanks for that Uwe!
> On 17/12/2013 13:13, Uwe Schindler wrote:
> > Hey Rory, I already left a comment on the blog entry - Oracle should have
> looked whats already available on the Open Source market!
> >
> > We use a more generic tool available via Maven Central that does the same
> since a few years:
> > https://code.google.com/p/forbidden-apis/
> >
> > It was written by me, but it is used by more and more projects,
> > especially those who need 100% correct locale, charset and timezone
> > independnetness (like text processing tools). The main use-case of
> > this tool it to scan your application classes for things like calling
> > opening text files without giving a charset and fail the build. This
> > tool also allows to find calls to internal JDK apis. We use this tool
> > in Lucene. See the docs, you can pass internalRuntimeForbidden="true"
> > and it will fail your build:
> > https://code.google.com/p/forbidden-apis/wiki/AntUsage or
> > https://code.google.com/p/forbidden-apis/wiki/MavenUsage. It is also
> > available as command line tool:
> > https://code.google.com/p/forbidden-apis/wiki/CliUsage
> >
> > See also my blog post:
> > http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.h
> > tml
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >> -Original Message-
> >> From: Rory O'Donnell [mailto:rory.odonn...@oracle.com]
> >> Sent: Tuesday, December 17, 2013 1:50 PM
> >> To: Uwe Schindler; Dawid Weiss
> >> Cc: rory.odonn...@oracle.com; dev@lucene.apache.org; TOPIC,DALIBOR;
> >> BORG,CECILIA; balchandra.vai...@oracle.com
> >> Subject: New JDK 8 tool: jdeps
> >>
> >> Hi Uwe/Dawid,
> >>
> >> Here's a blog from Erik Costlow on a new tool in JDK 8 that lets you
> >> analyze your code for dependencies on JDK internal APIs :
> >>
> >> https://blogs.oracle.com/java-platform-
> >> group/entry/closing_the_closed_apis
> >>
> >> Please let me know if you have any feedback - I'd be interested to
> >> hear if you use any internal APIs.
> >>
> >> Rgds,Rory
> >>
> >> --
> >> Rgds,Rory O'Donnell
> >> Quality Engineering Manager
> >> Oracle EMEA , Dublin, Ireland
> 
> --
> Rgds,Rory O'Donnell
> Quality Engineering Manager
> Oracle EMEA , Dublin, Ireland
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5473) Make one state.json per collection

2013-12-17 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5473:
-

Attachment: SOLR-5473.patch

I could get all tests to pass. 

> Make one state.json per collection
> --
>
> Key: SOLR-5473
> URL: https://issues.apache.org/jira/browse/SOLR-5473
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Tranforming Index Data

2013-12-17 Thread Uwe Schindler
Hi,

 

You were mainly talking about XML, so i did not see you mentioning JSON, sorry.

There is no support for handling JSON data at the moment.

 

I just want to mention that there is no need for a new transformer tool when 
using XML.

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

  http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Furkan KAMACI [mailto:furkankam...@gmail.com] 
Sent: Tuesday, December 17, 2013 2:22 PM
To: dev@lucene.apache.org
Subject: Re: Tranforming Index Data

 

Hi Uwe;

 

How about JSON data?

 

Thanks;

Furkan KAMACI

 

2013/12/17 Uwe Schindler 

UpdateRequestHandler can apply an XSL that is applied to the data coming in. 
This does everything you would like to use here: 
http://wiki.apache.org/solr/XsltUpdateRequestHandler

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de  

eMail: u...@thetaphi.de

 

From: Furkan KAMACI [mailto:furkankam...@gmail.com] 
Sent: Tuesday, December 17, 2013 1:50 PM
To: dev@lucene.apache.org
Subject: Tranforming Index Data

 

Hi;

 

Can we put a transformer (not the transformer of DIH) in front of the indexing 
mechanism and let people index arbitrary style data?

I mean that: people may send data that does not obey the schema. Transformer 
includes XPath for XML data and something like JSON Path for JSON Data. So, we 
can transform the data that fits schema. 

 

Such kind of improvement may remove some people's client side developments that 
is only responsible for transforming data to make it ready for indexing 
according to schema. If you think so I can fire a Jira issue and I can be 
responsible to make a contribution for it.

 

Thanks;

Furkan KAMACI

 



Re: Tranforming Index Data

2013-12-17 Thread Furkan KAMACI
Hi Uwe;

How about JSON data?

Thanks;
Furkan KAMACI


2013/12/17 Uwe Schindler 

> UpdateRequestHandler can apply an XSL that is applied to the data coming
> in. This does everything you would like to use here:
> http://wiki.apache.org/solr/XsltUpdateRequestHandler
>
>
>
> Uwe
>
>
>
> -
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
>
>
> *From:* Furkan KAMACI [mailto:furkankam...@gmail.com]
> *Sent:* Tuesday, December 17, 2013 1:50 PM
> *To:* dev@lucene.apache.org
> *Subject:* Tranforming Index Data
>
>
>
> Hi;
>
>
>
> Can we put a transformer (not the transformer of DIH) in front of the
> indexing mechanism and let people index arbitrary style data?
>
> I mean that: people may send data that does not obey the schema.
> Transformer includes XPath for XML data and something like JSON Path
> for JSON Data. So, we can transform the data that fits schema.
>
>
>
> Such kind of improvement may remove some people's client side developments
> that is only responsible for transforming data to make it ready for
> indexing according to schema. If you think so I can fire a Jira issue and I
> can be responsible to make a contribution for it.
>
>
>
> Thanks;
>
> Furkan KAMACI
>


Re: New JDK 8 tool: jdeps

2013-12-17 Thread Rory O'Donnell

Thanks for that Uwe!
On 17/12/2013 13:13, Uwe Schindler wrote:

Hey Rory, I already left a comment on the blog entry - Oracle should have 
looked whats already available on the Open Source market!

We use a more generic tool available via Maven Central that does the same since 
a few years:
https://code.google.com/p/forbidden-apis/

It was written by me, but it is used by more and more projects, especially those who need 
100% correct locale, charset and timezone independnetness (like text processing tools). 
The main use-case of this tool it to scan your application classes for things like 
calling opening text files without giving a charset and fail the build. This tool also 
allows to find calls to internal JDK apis. We use this tool in Lucene. See the docs, you 
can pass internalRuntimeForbidden="true" and it will fail your build: 
https://code.google.com/p/forbidden-apis/wiki/AntUsage or 
https://code.google.com/p/forbidden-apis/wiki/MavenUsage. It is also available as command 
line tool: https://code.google.com/p/forbidden-apis/wiki/CliUsage

See also my blog post: 
http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.html

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: Rory O'Donnell [mailto:rory.odonn...@oracle.com]
Sent: Tuesday, December 17, 2013 1:50 PM
To: Uwe Schindler; Dawid Weiss
Cc: rory.odonn...@oracle.com; dev@lucene.apache.org; TOPIC,DALIBOR;
BORG,CECILIA; balchandra.vai...@oracle.com
Subject: New JDK 8 tool: jdeps

Hi Uwe/Dawid,

Here's a blog from Erik Costlow on a new tool in JDK 8 that lets you analyze
your code for dependencies on JDK internal APIs :

https://blogs.oracle.com/java-platform-
group/entry/closing_the_closed_apis

Please let me know if you have any feedback - I'd be interested to hear if you
use any internal APIs.

Rgds,Rory

--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland


--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Tranforming Index Data

2013-12-17 Thread Uwe Schindler
UpdateRequestHandler can apply an XSL that is applied to the data coming in. 
This does everything you would like to use here: 
http://wiki.apache.org/solr/XsltUpdateRequestHandler

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de  

eMail: u...@thetaphi.de

 

From: Furkan KAMACI [mailto:furkankam...@gmail.com] 
Sent: Tuesday, December 17, 2013 1:50 PM
To: dev@lucene.apache.org
Subject: Tranforming Index Data

 

Hi;

 

Can we put a transformer (not the transformer of DIH) in front of the indexing 
mechanism and let people index arbitrary style data?

I mean that: people may send data that does not obey the schema. Transformer 
includes XPath for XML data and something like JSON Path for JSON Data. So, we 
can transform the data that fits schema. 

 

Such kind of improvement may remove some people's client side developments that 
is only responsible for transforming data to make it ready for indexing 
according to schema. If you think so I can fire a Jira issue and I can be 
responsible to make a contribution for it.

 

Thanks;

Furkan KAMACI



RE: New JDK 8 tool: jdeps

2013-12-17 Thread Uwe Schindler
Hey Rory, I already left a comment on the blog entry - Oracle should have 
looked whats already available on the Open Source market!

We use a more generic tool available via Maven Central that does the same since 
a few years:
https://code.google.com/p/forbidden-apis/

It was written by me, but it is used by more and more projects, especially 
those who need 100% correct locale, charset and timezone independnetness (like 
text processing tools). The main use-case of this tool it to scan your 
application classes for things like calling opening text files without giving a 
charset and fail the build. This tool also allows to find calls to internal JDK 
apis. We use this tool in Lucene. See the docs, you can pass 
internalRuntimeForbidden="true" and it will fail your build: 
https://code.google.com/p/forbidden-apis/wiki/AntUsage or 
https://code.google.com/p/forbidden-apis/wiki/MavenUsage. It is also available 
as command line tool: https://code.google.com/p/forbidden-apis/wiki/CliUsage

See also my blog post: 
http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.html

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Rory O'Donnell [mailto:rory.odonn...@oracle.com]
> Sent: Tuesday, December 17, 2013 1:50 PM
> To: Uwe Schindler; Dawid Weiss
> Cc: rory.odonn...@oracle.com; dev@lucene.apache.org; TOPIC,DALIBOR;
> BORG,CECILIA; balchandra.vai...@oracle.com
> Subject: New JDK 8 tool: jdeps
> 
> Hi Uwe/Dawid,
> 
> Here's a blog from Erik Costlow on a new tool in JDK 8 that lets you analyze
> your code for dependencies on JDK internal APIs :
> 
> https://blogs.oracle.com/java-platform-
> group/entry/closing_the_closed_apis
> 
> Please let me know if you have any feedback - I'd be interested to hear if you
> use any internal APIs.
> 
> Rgds,Rory
> 
> --
> Rgds,Rory O'Donnell
> Quality Engineering Manager
> Oracle EMEA , Dublin, Ireland


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene Hierarchial Taxonomy Search

2013-12-17 Thread Furkan KAMACI
Hi;

It is better to ask this question at user list instead of dev list.

Thanks;
Furkan KAMACI


2013/12/16 Nino_87 

> I've a set of documents annotated with hierarchial taxonomy tags, E.g.
>
> [
> {
> "id": 1,
> "title": "a funny book",
> "authors": ["Jean Bon", "Alex Terieur"],
> "book_category": "/novel/comedy/new"
> },
> {
> "id": 2,
> "title": "a dramatic book",
> "authors": ["Alex Terieur"],
> "book_category": "/novel/drama"
> },
> {
> "id": 3,
> "title": "A hilarious book",
> "authors": ["Marc Assin", "Harry Covert"],
> "book_category": "/novel/comedy"
> },
> {
> "id": 4,
> "title": "A sad story",
> "authors": ["Gerard Menvusa", "Alex Terieur"],
> "book_category": "/novel/drama"
> },
> {
> "id": 5,
> "title": "A very sad story",
> "authors": ["Gerard Menvusa", "Alain Terieur"],
> "book_category": "/novel"
> }]
> I need to search book by "book_category". The search must return books that
> match the query category exactly or partially (with a defined depth
> threshold) and give them a different score in function of the match degree.
>
> E.g.: query "book_category=/novel/comedy" and "depth_threshold=1" must
> return books with book_category=/novel/comedy (score=100%), /novel and
> /novel/comedy/new (score < 100%).
>
> I tried the TopScoreDocCollector in the search, but it returns the book
> which book_category at least contains the query category, and gives them
> the
> same score.
>
> How can i obtain this search function that returns also the more general
> category and gives different match scores to the results?
>
> P.S.: i don't need a faced search.
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-Hierarchial-Taxonomy-Search-tp4106928.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] [Commented] (SOLR-4809) OpenOffice document body is not indexed by SolrCell

2013-12-17 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850406#comment-13850406
 ] 

Uwe Schindler commented on SOLR-4809:
-

I opened TIKA-1211. The bug was introduced when parsing of headers and footers 
was added by [~mikemccand] in TIKA-736.

> OpenOffice document body is not indexed by SolrCell
> ---
>
> Key: SOLR-4809
> URL: https://issues.apache.org/jira/browse/SOLR-4809
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 3.6.1, 4.3
>Reporter: Jack Krupansky
> Attachments: HelloWorld.docx, HelloWorld.odp, HelloWorld.odt, 
> HelloWorld.txt, SOLR-4809.patch
>
>
> As reported on the solr user mailing list, SolrCell is not indexing document 
> body content for OpenOffice documents.
> I tested with Apache Open Office 3.4.1 on Solr 4.3 and 3.6.1, for both 
> OpenWriter (.ODT) and Impress (.ODS).
> The extractOnly option does return the document body text, but Solr does not 
> index the document body text. In my test cases (.ODS and .ODT), all I see for 
> the "content" attribute in Solr are a few spaces.
> Using the example schema, I indexed HelloWorld.odt using:
> {code}
>  curl 
> "http://localhost:8983/solr/update/extract?literal.id=doc-1&uprefix=attr_&commit=true";
>  -F "myfile=@HelloWorld.odt"
> {code}
> It queries as:
> {code}
> 
> 
> 
>   0
>   2
>   
> true
> id:doc-1
>   
> 
> 
>   
> doc-1
> 
>   0
> 
> 
>   1
> 
> 
>   myfile
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   Hello, World
> 
> Hello World - subject
> 
>   2013-05-10T17:11:58.88
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   This is a test of SolrCell using OpenOffice 3.4.1 - 
> OpenWriter.
> 
> 
>   0
> 
> 
>   10
> 
> 
>   PT3M44S
> 
> 
>   4
> 
> 
>   2013-05-10T17:11:58.88
> 
> 
>   Hello World SolrCell Test - title
> 
> 
>   0
> 
> 
>   application/octet-stream
> 
> 
>   0
> 
> This is a test of SolrCell using OpenOffice 3.4.1 
> - OpenWriter.
> 
>   8960
> 
> 
>   0
> 
> 
>   Hello World - subject
> 
> 
>   HelloWorld.odt
> 
> 
>   OpenOffice.org/3.4.1$Win32 
> OpenOffice.org_project/341m1$Build-9593
> 
> Hello, World
> 
>   2013-05-10T17:15:40.99
> 
> 
>   4
> 
> 
>   Hello World SolrCell Test - title
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   2013-05-10T17:11:58.88
> 
> 
>   1
> 
> 
>   60
> 
> 2013-05-10T17:15:40Z
> 
>   0
> 
> 
>   10
> 
> 
>   0
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   0
> 
> 
>   1
> 
> 
>   0
> 
> 
>   4
> 
> 
>   60
> 
> 
>   1
> 
> 
>   10
> 
> 
>   1
> 
> 
>   application/vnd.oasis.opendocument.text
> 
> 
>   60
> 
> 
> 
> 
> 1434688567598120960
> 
> 
> {code}
> Command to extract as text:
> {code}
> curl 
> "http://localhost:8983/solr/update/extract?literal.id=doc-1&indent=true&extractOnly=true&extractFormat=text&commit=true";
>  -F "myfile=@HelloWorld.odt"
> {code}
> The response:
> {code}
> 
> 
> 
>   0
>   124
> 
> 
> Hello World, from OpenOffice!
> Third line.
> Fourth line.
> The end.
> 
> 
>   
> 0
>   
>   
> 1
>   
>   
> myfile
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> Hello, World
>   
>   
> Hello World - subject
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> This is a test of SolrCell using OpenOffice 3.4.1 - OpenWriter.
>   
>   
> 0
>   
>   
> 10
>   
>   
> PT3M44S
>   
>   
> 4
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> Hello World SolrCell Test - title
>   
>   
> 0
>   
>   
> application/octet-stream
>   
>   
> 0
>   
>   
> This is a test of SolrCell using OpenOffice 3.4.1 - OpenWriter.
>   
>   
> 8960
>   
>   
> 0
>   
>   
> Hello World - subject
>   
>   
> HelloWorld.odt
>   
>   
> OpenOffice.org/3.4.1$Win32 
> OpenOffice.org_project/341m1$Build-9593
>   
>   
> Hello, World
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> 4
>   
>   
> Hello World SolrCell Test - title
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> 1
>   
>   
> 60
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> 0
>   
>   
> 10
>   
>   
> 0
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> 0
>   
>   
> 1
>   
>   
> 0

Tranforming Index Data

2013-12-17 Thread Furkan KAMACI
Hi;

Can we put a transformer (not the transformer of DIH) in front of the
indexing mechanism and let people index arbitrary style data?
I mean that: people may send data that does not obey the schema.
Transformer includes XPath for XML data and something like JSON Path
for JSON Data. So, we can transform the data that fits schema.

Such kind of improvement may remove some people's client side developments
that is only responsible for transforming data to make it ready for
indexing according to schema. If you think so I can fire a Jira issue and I
can be responsible to make a contribution for it.

Thanks;
Furkan KAMACI


New JDK 8 tool: jdeps

2013-12-17 Thread Rory O'Donnell

Hi Uwe/Dawid,

Here's a blog from Erik Costlow on a new tool in JDK 8 that lets you 
analyze your code

for dependencies on JDK internal APIs :

https://blogs.oracle.com/java-platform-group/entry/closing_the_closed_apis

Please let me know if you have any feedback - I'd be interested to hear 
if you use any

internal APIs.

Rgds,Rory

--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



DIH for Json Data

2013-12-17 Thread Furkan KAMACI
Hi;

There was a question at mail list wondering about using Json Data for DIH.
I want to make a contribution for it because I see that people may need it.
Currently we support REST/XML and RSS/ATOM feeds via HttpDataSource.

What folks think about it? If you think that it is a reasonable thing to do
I can fire a Jira issue and make contribute?

Thanks;
Furkan KAMACI


[jira] [Commented] (SOLR-5416) CollapsingQParserPlugin bug with Tagging

2013-12-17 Thread shruti suri (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850390#comment-13850390
 ] 

shruti suri commented on SOLR-5416:
---

Joel,

I am getting the following error when i am applying 
CollapseQParserPluginPatch-solr-4.5.1.patch in solr-4.5.1.

Error
1 out of 1 hunk ignored -- saving rejects to file 
solr/core/src/java/org/apache/solr/search/QParserPlugin.java.rej
patching file solr/core/ivy.xml
Hunk #1 FAILED at 42.
1 out of 1 hunk FAILED -- saving rejects to file solr/core/ivy.xml.rej
patching file solr/core/src/test/org/apache/solr/search/QueryEqualityTest.java
Hunk #1 succeeded at 295 (offset 100 lines).
patching file solr/core/src/java/org/apache/solr/search/ScoreFilter.java
patching file solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file 
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java.rej
patching file 
solr/core/src/test/org/apache/solr/search/TestCollapseQParserPlugin.java
patching file 
solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java

Regards
Shruti 


> CollapsingQParserPlugin bug with Tagging
> 
>
> Key: SOLR-5416
> URL: https://issues.apache.org/jira/browse/SOLR-5416
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.6
>Reporter: David
>Assignee: Joel Bernstein
>  Labels: group, grouping
> Fix For: 5.0, 4.7
>
> Attachments: CollapseQParserPluginPatch-solr-4.5.1.patch, 
> CollapsingQParserPlugin.java, SOLR-5416.patch, SOLR-5416.patch, 
> SOLR-5416.patch, SOLR-5416.patch, SOLR-5416.patch, SOLR-5416.patch, 
> SolrIndexSearcher.java, TestCollapseQParserPlugin.java
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Trying to use CollapsingQParserPlugin with facet tagging throws an exception. 
> {code}
>  ModifiableSolrParams params = new ModifiableSolrParams();
> params.add("q", "*:*");
> params.add("fq", "{!collapse field=group_s}");
> params.add("defType", "edismax");
> params.add("bf", "field(test_ti)");
> params.add("fq","{!tag=test_ti}test_ti:5");
> params.add("facet","true");
> params.add("facet.field","{!ex=test_ti}test_ti");
> assertQ(req(params), "*[count(//doc)=1]", 
> "//doc[./int[@name='test_ti']='5']");
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4809) OpenOffice document body is not indexed by SolrCell

2013-12-17 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850373#comment-13850373
 ] 

Uwe Schindler edited comment on SOLR-4809 at 12/17/13 11:43 AM:


It looks like the problem with double endDocument() is fixed in TIKA, but it 
does not prevent multiple startDocument() calls, see TIKA-646. The used 
EndDocumentShieldContentHandler should also prevent double startDocument() 
calls. By calling startDocument 2 times, the data of the first one ist lost. We 
have to open an issue in TIKA.


was (Author: thetaphi):
It looks like the problem with double endDocument() is fixed in Solr, but it 
does not prevent double startDocument() calls, see TIKA-646. The used 
EndDocumentShiedlContentHandler should also prevent double startDocument() 
calls. We have to open an issue in TIKA.

> OpenOffice document body is not indexed by SolrCell
> ---
>
> Key: SOLR-4809
> URL: https://issues.apache.org/jira/browse/SOLR-4809
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 3.6.1, 4.3
>Reporter: Jack Krupansky
> Attachments: HelloWorld.docx, HelloWorld.odp, HelloWorld.odt, 
> HelloWorld.txt, SOLR-4809.patch
>
>
> As reported on the solr user mailing list, SolrCell is not indexing document 
> body content for OpenOffice documents.
> I tested with Apache Open Office 3.4.1 on Solr 4.3 and 3.6.1, for both 
> OpenWriter (.ODT) and Impress (.ODS).
> The extractOnly option does return the document body text, but Solr does not 
> index the document body text. In my test cases (.ODS and .ODT), all I see for 
> the "content" attribute in Solr are a few spaces.
> Using the example schema, I indexed HelloWorld.odt using:
> {code}
>  curl 
> "http://localhost:8983/solr/update/extract?literal.id=doc-1&uprefix=attr_&commit=true";
>  -F "myfile=@HelloWorld.odt"
> {code}
> It queries as:
> {code}
> 
> 
> 
>   0
>   2
>   
> true
> id:doc-1
>   
> 
> 
>   
> doc-1
> 
>   0
> 
> 
>   1
> 
> 
>   myfile
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   Hello, World
> 
> Hello World - subject
> 
>   2013-05-10T17:11:58.88
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   This is a test of SolrCell using OpenOffice 3.4.1 - 
> OpenWriter.
> 
> 
>   0
> 
> 
>   10
> 
> 
>   PT3M44S
> 
> 
>   4
> 
> 
>   2013-05-10T17:11:58.88
> 
> 
>   Hello World SolrCell Test - title
> 
> 
>   0
> 
> 
>   application/octet-stream
> 
> 
>   0
> 
> This is a test of SolrCell using OpenOffice 3.4.1 
> - OpenWriter.
> 
>   8960
> 
> 
>   0
> 
> 
>   Hello World - subject
> 
> 
>   HelloWorld.odt
> 
> 
>   OpenOffice.org/3.4.1$Win32 
> OpenOffice.org_project/341m1$Build-9593
> 
> Hello, World
> 
>   2013-05-10T17:15:40.99
> 
> 
>   4
> 
> 
>   Hello World SolrCell Test - title
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   2013-05-10T17:11:58.88
> 
> 
>   1
> 
> 
>   60
> 
> 2013-05-10T17:15:40Z
> 
>   0
> 
> 
>   10
> 
> 
>   0
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   0
> 
> 
>   1
> 
> 
>   0
> 
> 
>   4
> 
> 
>   60
> 
> 
>   1
> 
> 
>   10
> 
> 
>   1
> 
> 
>   application/vnd.oasis.opendocument.text
> 
> 
>   60
> 
> 
> 
> 
> 1434688567598120960
> 
> 
> {code}
> Command to extract as text:
> {code}
> curl 
> "http://localhost:8983/solr/update/extract?literal.id=doc-1&indent=true&extractOnly=true&extractFormat=text&commit=true";
>  -F "myfile=@HelloWorld.odt"
> {code}
> The response:
> {code}
> 
> 
> 
>   0
>   124
> 
> 
> Hello World, from OpenOffice!
> Third line.
> Fourth line.
> The end.
> 
> 
>   
> 0
>   
>   
> 1
>   
>   
> myfile
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> Hello, World
>   
>   
> Hello World - subject
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> This is a test of SolrCell using OpenOffice 3.4.1 - OpenWriter.
>   
>   
> 0
>   
>   
> 10
>   
>   
> PT3M44S
>   
>   
> 4
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> Hello World SolrCell Test - title
>   
>   
> 0
>   
>   
> application/octet-stream
>   
>   
> 0
>   
>   
> This is a test of SolrCell using OpenOffice 3.4.1 - OpenWriter.
>   
>   
> 8960
>   
>   
> 0
>   
>   
> Hello World 

[jira] [Commented] (SOLR-4809) OpenOffice document body is not indexed by SolrCell

2013-12-17 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850373#comment-13850373
 ] 

Uwe Schindler commented on SOLR-4809:
-

It looks like the problem with double endDocument() is fixed in Solr, but it 
does not prevent double startDocument() calls, see TIKA-646. The used 
EndDocumentShiedlContentHandler should also prevent double startDocument() 
calls. We have to open an issue in TIKA.

> OpenOffice document body is not indexed by SolrCell
> ---
>
> Key: SOLR-4809
> URL: https://issues.apache.org/jira/browse/SOLR-4809
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 3.6.1, 4.3
>Reporter: Jack Krupansky
> Attachments: HelloWorld.docx, HelloWorld.odp, HelloWorld.odt, 
> HelloWorld.txt, SOLR-4809.patch
>
>
> As reported on the solr user mailing list, SolrCell is not indexing document 
> body content for OpenOffice documents.
> I tested with Apache Open Office 3.4.1 on Solr 4.3 and 3.6.1, for both 
> OpenWriter (.ODT) and Impress (.ODS).
> The extractOnly option does return the document body text, but Solr does not 
> index the document body text. In my test cases (.ODS and .ODT), all I see for 
> the "content" attribute in Solr are a few spaces.
> Using the example schema, I indexed HelloWorld.odt using:
> {code}
>  curl 
> "http://localhost:8983/solr/update/extract?literal.id=doc-1&uprefix=attr_&commit=true";
>  -F "myfile=@HelloWorld.odt"
> {code}
> It queries as:
> {code}
> 
> 
> 
>   0
>   2
>   
> true
> id:doc-1
>   
> 
> 
>   
> doc-1
> 
>   0
> 
> 
>   1
> 
> 
>   myfile
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   Hello, World
> 
> Hello World - subject
> 
>   2013-05-10T17:11:58.88
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   This is a test of SolrCell using OpenOffice 3.4.1 - 
> OpenWriter.
> 
> 
>   0
> 
> 
>   10
> 
> 
>   PT3M44S
> 
> 
>   4
> 
> 
>   2013-05-10T17:11:58.88
> 
> 
>   Hello World SolrCell Test - title
> 
> 
>   0
> 
> 
>   application/octet-stream
> 
> 
>   0
> 
> This is a test of SolrCell using OpenOffice 3.4.1 
> - OpenWriter.
> 
>   8960
> 
> 
>   0
> 
> 
>   Hello World - subject
> 
> 
>   HelloWorld.odt
> 
> 
>   OpenOffice.org/3.4.1$Win32 
> OpenOffice.org_project/341m1$Build-9593
> 
> Hello, World
> 
>   2013-05-10T17:15:40.99
> 
> 
>   4
> 
> 
>   Hello World SolrCell Test - title
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   2013-05-10T17:11:58.88
> 
> 
>   1
> 
> 
>   60
> 
> 2013-05-10T17:15:40Z
> 
>   0
> 
> 
>   10
> 
> 
>   0
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   0
> 
> 
>   1
> 
> 
>   0
> 
> 
>   4
> 
> 
>   60
> 
> 
>   1
> 
> 
>   10
> 
> 
>   1
> 
> 
>   application/vnd.oasis.opendocument.text
> 
> 
>   60
> 
> 
> 
> 
> 1434688567598120960
> 
> 
> {code}
> Command to extract as text:
> {code}
> curl 
> "http://localhost:8983/solr/update/extract?literal.id=doc-1&indent=true&extractOnly=true&extractFormat=text&commit=true";
>  -F "myfile=@HelloWorld.odt"
> {code}
> The response:
> {code}
> 
> 
> 
>   0
>   124
> 
> 
> Hello World, from OpenOffice!
> Third line.
> Fourth line.
> The end.
> 
> 
>   
> 0
>   
>   
> 1
>   
>   
> myfile
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> Hello, World
>   
>   
> Hello World - subject
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> This is a test of SolrCell using OpenOffice 3.4.1 - OpenWriter.
>   
>   
> 0
>   
>   
> 10
>   
>   
> PT3M44S
>   
>   
> 4
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> Hello World SolrCell Test - title
>   
>   
> 0
>   
>   
> application/octet-stream
>   
>   
> 0
>   
>   
> This is a test of SolrCell using OpenOffice 3.4.1 - OpenWriter.
>   
>   
> 8960
>   
>   
> 0
>   
>   
> Hello World - subject
>   
>   
> HelloWorld.odt
>   
>   
> OpenOffice.org/3.4.1$Win32 
> OpenOffice.org_project/341m1$Build-9593
>   
>   
> Hello, World
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> 4
>   
>   
> Hello World SolrCell Test - title
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> 1
>   
>   
> 60
>   
>   
> 2013-05-10T17:15:40.99
>

[jira] [Commented] (SOLR-4809) OpenOffice document body is not indexed by SolrCell

2013-12-17 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850370#comment-13850370
 ] 

Uwe Schindler commented on SOLR-4809:
-

I don't think this is a bug in Solr. The patch seems to fix the issue, but is 
using a hack. On startDocument, the StrinBuilder must be cleared, otherwise 
later indexed documents may contain text of the prvious ones.

The bug seems to be in TIKA, so it has to be fixed there. I am the writer of 
the OpenOffice parser, so it could be that there is an additional incorrect 
startDocument() event inserted into the XHTML output.

> OpenOffice document body is not indexed by SolrCell
> ---
>
> Key: SOLR-4809
> URL: https://issues.apache.org/jira/browse/SOLR-4809
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 3.6.1, 4.3
>Reporter: Jack Krupansky
> Attachments: HelloWorld.docx, HelloWorld.odp, HelloWorld.odt, 
> HelloWorld.txt, SOLR-4809.patch
>
>
> As reported on the solr user mailing list, SolrCell is not indexing document 
> body content for OpenOffice documents.
> I tested with Apache Open Office 3.4.1 on Solr 4.3 and 3.6.1, for both 
> OpenWriter (.ODT) and Impress (.ODS).
> The extractOnly option does return the document body text, but Solr does not 
> index the document body text. In my test cases (.ODS and .ODT), all I see for 
> the "content" attribute in Solr are a few spaces.
> Using the example schema, I indexed HelloWorld.odt using:
> {code}
>  curl 
> "http://localhost:8983/solr/update/extract?literal.id=doc-1&uprefix=attr_&commit=true";
>  -F "myfile=@HelloWorld.odt"
> {code}
> It queries as:
> {code}
> 
> 
> 
>   0
>   2
>   
> true
> id:doc-1
>   
> 
> 
>   
> doc-1
> 
>   0
> 
> 
>   1
> 
> 
>   myfile
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   Hello, World
> 
> Hello World - subject
> 
>   2013-05-10T17:11:58.88
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   This is a test of SolrCell using OpenOffice 3.4.1 - 
> OpenWriter.
> 
> 
>   0
> 
> 
>   10
> 
> 
>   PT3M44S
> 
> 
>   4
> 
> 
>   2013-05-10T17:11:58.88
> 
> 
>   Hello World SolrCell Test - title
> 
> 
>   0
> 
> 
>   application/octet-stream
> 
> 
>   0
> 
> This is a test of SolrCell using OpenOffice 3.4.1 
> - OpenWriter.
> 
>   8960
> 
> 
>   0
> 
> 
>   Hello World - subject
> 
> 
>   HelloWorld.odt
> 
> 
>   OpenOffice.org/3.4.1$Win32 
> OpenOffice.org_project/341m1$Build-9593
> 
> Hello, World
> 
>   2013-05-10T17:15:40.99
> 
> 
>   4
> 
> 
>   Hello World SolrCell Test - title
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   2013-05-10T17:11:58.88
> 
> 
>   1
> 
> 
>   60
> 
> 2013-05-10T17:15:40Z
> 
>   0
> 
> 
>   10
> 
> 
>   0
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   0
> 
> 
>   1
> 
> 
>   0
> 
> 
>   4
> 
> 
>   60
> 
> 
>   1
> 
> 
>   10
> 
> 
>   1
> 
> 
>   application/vnd.oasis.opendocument.text
> 
> 
>   60
> 
> 
> 
> 
> 1434688567598120960
> 
> 
> {code}
> Command to extract as text:
> {code}
> curl 
> "http://localhost:8983/solr/update/extract?literal.id=doc-1&indent=true&extractOnly=true&extractFormat=text&commit=true";
>  -F "myfile=@HelloWorld.odt"
> {code}
> The response:
> {code}
> 
> 
> 
>   0
>   124
> 
> 
> Hello World, from OpenOffice!
> Third line.
> Fourth line.
> The end.
> 
> 
>   
> 0
>   
>   
> 1
>   
>   
> myfile
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> Hello, World
>   
>   
> Hello World - subject
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> This is a test of SolrCell using OpenOffice 3.4.1 - OpenWriter.
>   
>   
> 0
>   
>   
> 10
>   
>   
> PT3M44S
>   
>   
> 4
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> Hello World SolrCell Test - title
>   
>   
> 0
>   
>   
> application/octet-stream
>   
>   
> 0
>   
>   
> This is a test of SolrCell using OpenOffice 3.4.1 - OpenWriter.
>   
>   
> 8960
>   
>   
> 0
>   
>   
> Hello World - subject
>   
>   
> HelloWorld.odt
>   
>   
> OpenOffice.org/3.4.1$Win32 
> OpenOffice.org_project/341m1$Build-9593
>   
>   
> Hello, World
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> 4
>   
>   
> Hello World SolrCell Test -

[jira] [Commented] (SOLR-4809) OpenOffice document body is not indexed by SolrCell

2013-12-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850304#comment-13850304
 ] 

Jan Høydahl commented on SOLR-4809:
---

Any chance you could add a Unit Test to the patch, demonstrating how the bug is 
fixed?

> OpenOffice document body is not indexed by SolrCell
> ---
>
> Key: SOLR-4809
> URL: https://issues.apache.org/jira/browse/SOLR-4809
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 3.6.1, 4.3
>Reporter: Jack Krupansky
> Attachments: HelloWorld.docx, HelloWorld.odp, HelloWorld.odt, 
> HelloWorld.txt, SOLR-4809.patch
>
>
> As reported on the solr user mailing list, SolrCell is not indexing document 
> body content for OpenOffice documents.
> I tested with Apache Open Office 3.4.1 on Solr 4.3 and 3.6.1, for both 
> OpenWriter (.ODT) and Impress (.ODS).
> The extractOnly option does return the document body text, but Solr does not 
> index the document body text. In my test cases (.ODS and .ODT), all I see for 
> the "content" attribute in Solr are a few spaces.
> Using the example schema, I indexed HelloWorld.odt using:
> {code}
>  curl 
> "http://localhost:8983/solr/update/extract?literal.id=doc-1&uprefix=attr_&commit=true";
>  -F "myfile=@HelloWorld.odt"
> {code}
> It queries as:
> {code}
> 
> 
> 
>   0
>   2
>   
> true
> id:doc-1
>   
> 
> 
>   
> doc-1
> 
>   0
> 
> 
>   1
> 
> 
>   myfile
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   Hello, World
> 
> Hello World - subject
> 
>   2013-05-10T17:11:58.88
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   This is a test of SolrCell using OpenOffice 3.4.1 - 
> OpenWriter.
> 
> 
>   0
> 
> 
>   10
> 
> 
>   PT3M44S
> 
> 
>   4
> 
> 
>   2013-05-10T17:11:58.88
> 
> 
>   Hello World SolrCell Test - title
> 
> 
>   0
> 
> 
>   application/octet-stream
> 
> 
>   0
> 
> This is a test of SolrCell using OpenOffice 3.4.1 
> - OpenWriter.
> 
>   8960
> 
> 
>   0
> 
> 
>   Hello World - subject
> 
> 
>   HelloWorld.odt
> 
> 
>   OpenOffice.org/3.4.1$Win32 
> OpenOffice.org_project/341m1$Build-9593
> 
> Hello, World
> 
>   2013-05-10T17:15:40.99
> 
> 
>   4
> 
> 
>   Hello World SolrCell Test - title
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   2013-05-10T17:11:58.88
> 
> 
>   1
> 
> 
>   60
> 
> 2013-05-10T17:15:40Z
> 
>   0
> 
> 
>   10
> 
> 
>   0
> 
> 
>   2013-05-10T17:15:40.99
> 
> 
>   0
> 
> 
>   1
> 
> 
>   0
> 
> 
>   4
> 
> 
>   60
> 
> 
>   1
> 
> 
>   10
> 
> 
>   1
> 
> 
>   application/vnd.oasis.opendocument.text
> 
> 
>   60
> 
> 
> 
> 
> 1434688567598120960
> 
> 
> {code}
> Command to extract as text:
> {code}
> curl 
> "http://localhost:8983/solr/update/extract?literal.id=doc-1&indent=true&extractOnly=true&extractFormat=text&commit=true";
>  -F "myfile=@HelloWorld.odt"
> {code}
> The response:
> {code}
> 
> 
> 
>   0
>   124
> 
> 
> Hello World, from OpenOffice!
> Third line.
> Fourth line.
> The end.
> 
> 
>   
> 0
>   
>   
> 1
>   
>   
> myfile
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> Hello, World
>   
>   
> Hello World - subject
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> This is a test of SolrCell using OpenOffice 3.4.1 - OpenWriter.
>   
>   
> 0
>   
>   
> 10
>   
>   
> PT3M44S
>   
>   
> 4
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> Hello World SolrCell Test - title
>   
>   
> 0
>   
>   
> application/octet-stream
>   
>   
> 0
>   
>   
> This is a test of SolrCell using OpenOffice 3.4.1 - OpenWriter.
>   
>   
> 8960
>   
>   
> 0
>   
>   
> Hello World - subject
>   
>   
> HelloWorld.odt
>   
>   
> OpenOffice.org/3.4.1$Win32 
> OpenOffice.org_project/341m1$Build-9593
>   
>   
> Hello, World
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> 4
>   
>   
> Hello World SolrCell Test - title
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> 2013-05-10T17:11:58.88
>   
>   
> 1
>   
>   
> 60
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> 0
>   
>   
> 10
>   
>   
> 0
>   
>   
> 2013-05-10T17:15:40.99
>   
>   
> 0
>   
>   
> 1
>   
>   
> 0
>   
>   
> 4
>   
>   
> 6