[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034961#comment-13034961 ] Christopher Currens commented on PYLUCENE-9: We can close it. Thanks for the help. QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033877#comment-13033877 ] Andi Vajda commented on PYLUCENE-9: --- Hi Christopher, Have you elucidated this yet ? Can this bug be closed or is there still something to be done for it ? QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031259#comment-13031259 ] Christopher Currens commented on PYLUCENE-9: I've posted a question to the java-lucene list, however, I'm sure it won't help at all. The simple fact is that the lucene 3.0 jar parses the query as ft:calendar item msg. The *same* lucene 3.0 jar when invoked from pylucene, produces ft:calendar item ? msg for me, on both windows and ubuntu boxes. I suppose this just might be an issue with jcc? I've been able to produce this both on my boxes at work, and my box at home, both producing the incorrect output. Perhaps I'm most curious if this can be reproduced by any developer for pylucene, or if its just some crazy environment issue happening on my boxes and everyone else I know. QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031284#comment-13031284 ] Christopher Currens commented on PYLUCENE-9: Hmm, the code I have is nearly identical, and when I pull it out of the contained code, it behaves as it should. I can't post the whole code, but the issue must be that there's a lingering Version.LUCENE_24 somewhere I suppose. I'll try figuring it out on my own, I'm glad to see its something idiotic I've done. :) QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029666#comment-13029666 ] Andi Vajda commented on PYLUCENE-9: --- Are you sure you're comparing the right versions ? Lucene.Net is quite behind Java Lucene and in more recent versions lots of things changed. For instance, trying different Version instances gives different results, notably LUCENE_24 works as you seem to expect: qp = QueryParser(Version.LUCENE_29, ft, StandardAnalyzer(Version.LUCENE_29)) qp.parse('Calendar Item as Msg') Query: ft:calendar item ? msg -- the 'as' stop word gets replaced by a hole as expected in that version qp = QueryParser(Version.LUCENE_24, ft, StandardAnalyzer(Version.LUCENE_24)) qp.parse('Calendar Item as Msg') Query: ft:calendar item msg -- works as Lucene.Net (probably, as I've never run it) I'm inclined to resolve this bug as INVALID unless I'm missing something here. Please, let me know. QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029674#comment-13029674 ] Christopher Currens commented on PYLUCENE-9: I was very hesitant to report this as a bug, since pylucene isn't a port, rather just recompiled. I am positive I am comparing the correct versions (I'm a committer on Lucene.Net). I'll show you all the configurations I've done: Lucene.Net 2.9.2 - Valid Lucene.Net 2.9.4 - Valid Java Lucene (via Luke 1.0.1 (uses Lucene 2.9.4)) - Valid Java Lucene (via Luke 3.1.0 (uses Lucene 3.0)) - Valid pyLucene (Lucene 2.9.2) - Invalid replaced by single Wildcard ('?') pyLucene (Lucene 2.9.4) - Invalid replaced by single Wildcard ('?') pyLucene (Lucene 3.0.3) - Invalid replaced by single Wildcard ('?') Those tests are all on the 32-bin Win-XP. The ubuntu box I've used was using pyLucene w/ lucene 2.9.2. One thing I hadn't considered, though, was to see if it can be replicated outside of the many machines I've used myself to test, specifically if there's in issue with our building of it via JCC, or something in our environment. But considering I've tried it at work and at home, there's no real other place I can test it. QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029691#comment-13029691 ] Andi Vajda commented on PYLUCENE-9: --- Could you please ask on the java-u...@lucene.apache.org list what is actually the expected behavior from Java Lucene's point of view with versions Version.LUCENE_24, 29 and 30 passed to both the QueryParser and StandardAnalyzer contructors. I remember this changing at some point but I'm not sure when. Nor do I see, without further investigation how PyLucene could be different there as it just invokes the embedded Java Lucene jar. Thanks ! QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira