[jira] [Commented] (LUCENE-8565) SimpleQueryParser to support field filtering (aka Add field:text operator)

2019-08-16 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909145#comment-16909145
 ] 

Itamar Syn-Hershko commented on LUCENE-8565:


Heya - is this waiting for anything in particular that I can help in 
finalizing? Would really like to see this merged in. Thanks

> SimpleQueryParser to support field filtering (aka Add field:text operator)
> --
>
> Key: LUCENE-8565
> URL: https://issues.apache.org/jira/browse/LUCENE-8565
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>        Reporter: Itamar Syn-Hershko
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimpleQueryParser lacks support for the `field:` operator for creating 
> queries which operate on fields other than the default field. Seems like one 
> can either get the parsed query to operate on a single field, or on ALL 
> defined fields (+ weights). No support for specifying `field:value` in the 
> query.
> It probably wasn't forgotten, but rather left out for simplicity, but since 
> we are using this QP implementation more and more (mostly through 
> Elasticsearch) we thought it would be useful to have it in.
> Seems like this is not too hard to pull off and I'll be happy to contribute a 
> patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8565) SimpleQueryParser to support field filtering (aka Add field:text operator)

2019-02-19 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771783#comment-16771783
 ] 

Itamar Syn-Hershko commented on LUCENE-8565:


I'm not sure what the Lucene versioning policy about that would be; but we can 
always change the default flag to turn off field filtering support

> SimpleQueryParser to support field filtering (aka Add field:text operator)
> --
>
> Key: LUCENE-8565
> URL: https://issues.apache.org/jira/browse/LUCENE-8565
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>        Reporter: Itamar Syn-Hershko
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimpleQueryParser lacks support for the `field:` operator for creating 
> queries which operate on fields other than the default field. Seems like one 
> can either get the parsed query to operate on a single field, or on ALL 
> defined fields (+ weights). No support for specifying `field:value` in the 
> query.
> It probably wasn't forgotten, but rather left out for simplicity, but since 
> we are using this QP implementation more and more (mostly through 
> Elasticsearch) we thought it would be useful to have it in.
> Seems like this is not too hard to pull off and I'll be happy to contribute a 
> patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: SimpleQueryParser to support field filtering?

2019-02-19 Thread Itamar Syn-Hershko
Anyone?

--

Itamar Syn-Hershko
CTO, Founder
BigData Boutique <http://bigdataboutique.com/>
Elasticsearch Consulting Partner
Microsoft MVP | Lucene.NET PMC
http://code972.com | @synhershko <https://twitter.com/synhershko>


On Mon, Jan 14, 2019 at 10:19 AM Itamar Syn-Hershko 
wrote:

> Hi all,
>
> I sent a PR back in November to resolve the title and would appreciate
> feedback.
>
> Summary:
>
> SimpleQueryParser lacks support for the `field:` operator for creating
> queries which operate on fields other than the default field. Seems
> like one can either get the parsed query to operate on a single field, or
> on ALL defined fields (+ weights). No support for specifying `field:value`
> in the query.
>
> It probably wasn't forgotten, but rather left out for simplicity, but
> since we are using this QP implementation more and more (mostly through
> Elasticsearch) we thought it would be useful to have it in.
>
> JIRA: https://issues.apache.org/jira/browse/LUCENE-8565
>
> PR: https://github.com/apache/lucene-solr/pull/498
>
> What do people think?
>
> Cheers,
>
> --
>
> Itamar Syn-Hershko
> CTO, Founder
> BigData Boutique <http://bigdataboutique.com/>
> Elasticsearch Consulting Partner
> http://code972.com | @synhershko <https://twitter.com/synhershko>
>
>


SimpleQueryParser to support field filtering?

2019-01-14 Thread Itamar Syn-Hershko
Hi all,

I sent a PR back in November to resolve the title and would appreciate
feedback.

Summary:

SimpleQueryParser lacks support for the `field:` operator for creating
queries which operate on fields other than the default field. Seems
like one can either get the parsed query to operate on a single field, or
on ALL defined fields (+ weights). No support for specifying `field:value`
in the query.

It probably wasn't forgotten, but rather left out for simplicity, but since
we are using this QP implementation more and more (mostly through
Elasticsearch) we thought it would be useful to have it in.

JIRA: https://issues.apache.org/jira/browse/LUCENE-8565

PR: https://github.com/apache/lucene-solr/pull/498

What do people think?

Cheers,

--

Itamar Syn-Hershko
CTO, Founder
BigData Boutique <http://bigdataboutique.com/>
Elasticsearch Consulting Partner
http://code972.com | @synhershko <https://twitter.com/synhershko>


[jira] [Updated] (LUCENE-8565) SimpleQueryParser to support field filtering (aka Add field:text operator)

2018-11-14 Thread Itamar Syn-Hershko (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itamar Syn-Hershko updated LUCENE-8565:
---
Summary: SimpleQueryParser to support field filtering (aka Add field:text 
operator)  (was: SimpleQueryString to support field filtering (aka Add 
field:text operator))

> SimpleQueryParser to support field filtering (aka Add field:text operator)
> --
>
> Key: LUCENE-8565
> URL: https://issues.apache.org/jira/browse/LUCENE-8565
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>        Reporter: Itamar Syn-Hershko
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimpleQueryString lacks support for the `field:` operator for creating 
> queries which operate on fields other than the default field. Seems like one 
> can either get the parsed query to operate on a single field, or on ALL 
> defined fields (+ weights). No support for specifying `field:value` in the 
> query.
> It probably wasn't forgotten, but rather left out for simplicity, but since 
> we are using this QP implementation more and more (mostly through 
> Elasticsearch) we thought it would be useful to have it in.
> Seems like this is not too hard to pull off and I'll be happy to contribute a 
> patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8565) SimpleQueryParser to support field filtering (aka Add field:text operator)

2018-11-14 Thread Itamar Syn-Hershko (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itamar Syn-Hershko updated LUCENE-8565:
---
Description: 
SimpleQueryParser lacks support for the `field:` operator for creating queries 
which operate on fields other than the default field. Seems like one can either 
get the parsed query to operate on a single field, or on ALL defined fields (+ 
weights). No support for specifying `field:value` in the query.

It probably wasn't forgotten, but rather left out for simplicity, but since we 
are using this QP implementation more and more (mostly through Elasticsearch) 
we thought it would be useful to have it in.

Seems like this is not too hard to pull off and I'll be happy to contribute a 
patch for it.

  was:
SimpleQueryString lacks support for the `field:` operator for creating queries 
which operate on fields other than the default field. Seems like one can either 
get the parsed query to operate on a single field, or on ALL defined fields (+ 
weights). No support for specifying `field:value` in the query.

It probably wasn't forgotten, but rather left out for simplicity, but since we 
are using this QP implementation more and more (mostly through Elasticsearch) 
we thought it would be useful to have it in.

Seems like this is not too hard to pull off and I'll be happy to contribute a 
patch for it.


> SimpleQueryParser to support field filtering (aka Add field:text operator)
> --
>
> Key: LUCENE-8565
> URL: https://issues.apache.org/jira/browse/LUCENE-8565
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>        Reporter: Itamar Syn-Hershko
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimpleQueryParser lacks support for the `field:` operator for creating 
> queries which operate on fields other than the default field. Seems like one 
> can either get the parsed query to operate on a single field, or on ALL 
> defined fields (+ weights). No support for specifying `field:value` in the 
> query.
> It probably wasn't forgotten, but rather left out for simplicity, but since 
> we are using this QP implementation more and more (mostly through 
> Elasticsearch) we thought it would be useful to have it in.
> Seems like this is not too hard to pull off and I'll be happy to contribute a 
> patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8565) SimpleQueryString to support field filtering (aka Add field:text operator)

2018-11-14 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686301#comment-16686301
 ] 

Itamar Syn-Hershko commented on LUCENE-8565:


PR submitted on github: [https://github.com/apache/lucene-solr/pull/498.] 
Reviews appreciated.

> SimpleQueryString to support field filtering (aka Add field:text operator)
> --
>
> Key: LUCENE-8565
> URL: https://issues.apache.org/jira/browse/LUCENE-8565
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>        Reporter: Itamar Syn-Hershko
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimpleQueryString lacks support for the `field:` operator for creating 
> queries which operate on fields other than the default field. Seems like one 
> can either get the parsed query to operate on a single field, or on ALL 
> defined fields (+ weights). No support for specifying `field:value` in the 
> query.
> It probably wasn't forgotten, but rather left out for simplicity, but since 
> we are using this QP implementation more and more (mostly through 
> Elasticsearch) we thought it would be useful to have it in.
> Seems like this is not too hard to pull off and I'll be happy to contribute a 
> patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8565) SimpleQueryString to support field filtering (aka Add field:text operator)

2018-11-14 Thread Itamar Syn-Hershko (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itamar Syn-Hershko updated LUCENE-8565:
---
Description: 
SimpleQueryString lacks support for the `field:` operator for creating queries 
which operate on fields other than the default field. Seems like one can either 
get the parsed query to operate on a single field, or on ALL defined fields (+ 
weights). No support for specifying `field:value` in the query.

It probably wasn't forgotten, but rather left out for simplicity, but since we 
are using this QP implementation more and more (mostly through Elasticsearch) 
we thought it would be useful to have it in.

Seems like this is not too hard to pull off and I'll be happy to contribute a 
patch for it.

  was:
SimpleQueryString lacks support for the `field:` operator for creating queries 
which operate on fields other than the default field. Seems like one can either 
get the parsed query to operate on a single field, or on ALL defined fields (+ 
weights). No support for specifying `field:value` in the query.

It probably wasn't forgotten, but rather left out for simplicity, but since we 
are using this QP implementation more and more (mostly through Elasticsearch) 
we thought it would be 

Seems like this is not too hard to pull off and I'll be happy to contribute a 
patch for it.


> SimpleQueryString to support field filtering (aka Add field:text operator)
> --
>
> Key: LUCENE-8565
> URL: https://issues.apache.org/jira/browse/LUCENE-8565
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>        Reporter: Itamar Syn-Hershko
>Priority: Minor
>
> SimpleQueryString lacks support for the `field:` operator for creating 
> queries which operate on fields other than the default field. Seems like one 
> can either get the parsed query to operate on a single field, or on ALL 
> defined fields (+ weights). No support for specifying `field:value` in the 
> query.
> It probably wasn't forgotten, but rather left out for simplicity, but since 
> we are using this QP implementation more and more (mostly through 
> Elasticsearch) we thought it would be useful to have it in.
> Seems like this is not too hard to pull off and I'll be happy to contribute a 
> patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8565) SimpleQueryString to support field filtering (aka Add field:text operator)

2018-11-13 Thread Itamar Syn-Hershko (JIRA)
Itamar Syn-Hershko created LUCENE-8565:
--

 Summary: SimpleQueryString to support field filtering (aka Add 
field:text operator)
 Key: LUCENE-8565
 URL: https://issues.apache.org/jira/browse/LUCENE-8565
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Itamar Syn-Hershko


SimpleQueryString lacks support for the `field:` operator for creating queries 
which operate on fields other than the default field. Seems like one can either 
get the parsed query to operate on a single field, or on ALL defined fields (+ 
weights). No support for specifying `field:value` in the query.

It probably wasn't forgotten, but rather left out for simplicity, but since we 
are using this QP implementation more and more (mostly through Elasticsearch) 
we thought it would be 

Seems like this is not too hard to pull off and I'll be happy to contribute a 
patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6302) Adding Date Math support to Lucene Expressions module

2015-02-26 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338677#comment-14338677
 ] 

Itamar Syn-Hershko commented on LUCENE-6302:


Sent a PR for the latter https://github.com/apache/lucene-solr/pull/129

 Adding Date Math support to Lucene Expressions module
 -

 Key: LUCENE-6302
 URL: https://issues.apache.org/jira/browse/LUCENE-6302
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/expressions
Affects Versions: 4.10.3
Reporter: Itamar Syn-Hershko

 Lucene Expressions are great, but they don't allow for date math. More 
 specifically, they don't allow to infer date parts from a numeric 
 representation of a date stamp, nor they allow to parse strings 
 representations to dates.
 Some of the features requested here easy to implement via ValueSource 
 implementation (and potentially minor changes to the lexer definition) , some 
 are more involved. I'll be happy if we could get half of those in, and will 
 be happy to work on a PR for the parts we can agree on.
 The items we will be happy to have:
 - A now() function (with or without TZ support) to return a current long 
 date/time value as numeric, that we could use against indexed datetime fields 
 (which are infact numerics)
 - Parsing methods - to allow to express datetime as strings, and / or read it 
 from stored fields and parse it from there. Parse errors would render a value 
 of zero.
 - Given a numeric value, allow to specify it is a date value and then infer 
 date parts - e.g. Date(1424963520).Year == 2015, or Date(now()) - 
 Date(1424963520).Year. Basically methods which return numerics but internally 
 create and use Date objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6302) Adding Date Math support to Lucene Expressions module

2015-02-26 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338563#comment-14338563
 ] 

Itamar Syn-Hershko commented on LUCENE-6302:


I actually expected the main objection would be to adding date parsing methods 
:)

Maybe it would make sense to explain the use cases this is trying to solve.

We are using Elasticsearch  Kibana and since the latest version switched to 
using Lucene Expressions (from Groovy) we found ourselves blocked by the things 
we can do with Kibana's scripted fields

For example, given a user's DOB, how can we do aggregations on their age? or 
compute how many years (or days) have passed between 2 given days?

Yes we can subtract the epochs (except that it doesn't seem to work 
https://github.com/elasticsearch/elasticsearch/issues/9884) but translating the 
result to terms of days, hours or years is even uglier using an expression.

I think introducing ValueSources to do this should be enough, but if changing 
the lexer will be the preferred way I can go and do that as well. With regards 
to syntax - I'm not locked on any preferred syntax.

Either way it seems like adding a now() function is the easiest change and can 
send a PR with this change alone to start with.

 Adding Date Math support to Lucene Expressions module
 -

 Key: LUCENE-6302
 URL: https://issues.apache.org/jira/browse/LUCENE-6302
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/expressions
Affects Versions: 4.10.3
Reporter: Itamar Syn-Hershko

 Lucene Expressions are great, but they don't allow for date math. More 
 specifically, they don't allow to infer date parts from a numeric 
 representation of a date stamp, nor they allow to parse strings 
 representations to dates.
 Some of the features requested here easy to implement via ValueSource 
 implementation (and potentially minor changes to the lexer definition) , some 
 are more involved. I'll be happy if we could get half of those in, and will 
 be happy to work on a PR for the parts we can agree on.
 The items we will be happy to have:
 - A now() function (with or without TZ support) to return a current long 
 date/time value as numeric, that we could use against indexed datetime fields 
 (which are infact numerics)
 - Parsing methods - to allow to express datetime as strings, and / or read it 
 from stored fields and parse it from there. Parse errors would render a value 
 of zero.
 - Given a numeric value, allow to specify it is a date value and then infer 
 date parts - e.g. Date(1424963520).Year == 2015, or Date(now()) - 
 Date(1424963520).Year. Basically methods which return numerics but internally 
 create and use Date objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6302) Adding Date Math support to Lucene Expressions module

2015-02-26 Thread Itamar Syn-Hershko (JIRA)
Itamar Syn-Hershko created LUCENE-6302:
--

 Summary: Adding Date Math support to Lucene Expressions module
 Key: LUCENE-6302
 URL: https://issues.apache.org/jira/browse/LUCENE-6302
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/expressions
Affects Versions: 4.10.3
Reporter: Itamar Syn-Hershko


Lucene Expressions are great, but they don't allow for date math. More 
specifically, they don't allow to infer date parts from a numeric 
representation of a date stamp, nor they allow to parse strings representations 
to dates.

Some of the features requested here easy to implement via ValueSource 
implementation (and potentially minor changes to the lexer definition) , some 
are more involved. I'll be happy if we could get half of those in, and will be 
happy to work on a PR for the parts we can agree on.

The items we will be happy to have:

- A now() function (with or without TZ support) to return a current long 
date/time value as numeric, that we could use against indexed datetime fields 
(which are infact numerics)
- Parsing methods - to allow to express datetime as strings, and / or read it 
from stored fields and parse it from there. Parse errors would render a value 
of zero.
- Given a numeric value, allow to specify it is a date value and then infer 
date parts - e.g. Date(1424963520).Year == 2015, or Date(now()) - 
Date(1424963520).Year. Basically methods which return numerics but internally 
create and use Date objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FSDirectory and creating directory

2015-02-04 Thread Itamar Syn-Hershko
Thanks guys, we will mimic the current behavior and ignore the comment.
Mike I did promise to find bugs!

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Lucene.NET committer and PMC member

On Wed, Feb 4, 2015 at 11:20 AM, Uwe Schindler u...@thetaphi.de wrote:

 Hi Mike,

 This is why I ask here! So I think we should fix this before release of
 5.0! Maybe Robert has an explanation why he does the createDirectories() on
 ctor.
 In any case I will now commit the removal of the bogus comment in 4.10
 branch.

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


  -Original Message-
  From: Michael McCandless [mailto:luc...@mikemccandless.com]
  Sent: Wednesday, February 04, 2015 10:07 AM
  To: Lucene/Solr dev
  Cc: d...@lucenenet.apache.org
  Subject: Re: FSDirectory and creating directory
 
  In the past we considered this (mkdir when creating FSDir) a bug:
  https://issues.apache.org/jira/browse/LUCENE-1464
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Wed, Feb 4, 2015 at 4:03 AM, Uwe Schindler uschind...@apache.org
  wrote:
   Hi,
  
   on the Lucene.NET mailing list there were some issues with porting over
  Lucene 4.8's FSDirectory class to .NET. In fact the following comment on
 a
  method caused confusion:
  
 // returns the canonical version of the directory, creating it if it
 doesn't
  exist.
 private static File getCanonicalPath(File file) throws IOException {
   return new File(file.getCanonicalPath());
 }
  
   In fact, the comment is not correct (and the whole method is useless -
 one
  could call file.getCanonicalFile() to do the same. According to Javadocs
 and
  my tests, this method does *not* generate the directory. If the directory
  does not exists, it just returns a synthetic canonical name (modifying
 only
  known parts of the path). In fact we should maybe fix this comment and
  remove this method in 4.10.x (if we get a further bugfix release).
  
   We also have a test that validates that a directory is not created by
  FSDirectory's ctor (a side effect of some IndexWriter test).
  
   Nevertheless, in Lucene 5 we changed the behavior of the FSDirectory
  CTOR with NIO.2:
  
 protected FSDirectory(Path path, LockFactory lockFactory) throws
  IOException {
   super(lockFactory);
   Files.createDirectories(path);  // create directory, if it doesn't
 exist
   directory = path.toRealPath();
 }
  
   The question is now: Do we really intend to create the directory in
 Lucene 5
  ? What about opening an IndexReader on a non-existent directory on a
 read-
  only filesystem? I know that Robert added this to make path.getRealPath()
  to work correctly?
  
   I just want to discuss this before we release 5.0. To me it sounds
 wrong to
  create the directory in the constructor...
  
   Uwe
  
   -
   Uwe Schindler
   uschind...@apache.org
   Apache Lucene PMC Member / Committer
   Bremen, Germany
   http://lucene.apache.org/
  
  
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
   additional commands, e-mail: dev-h...@lucene.apache.org
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
  commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: FSDirectory and creating directory

2015-02-04 Thread Itamar Syn-Hershko
Rob, what is the intended behavior, and what is the reasoning behind it?

Doesn't this affect only attempts to open a non-existent index directory -
and whether or not there will be an empty folder left behind?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Lucene.NET committer and PMC member

On Wed, Feb 4, 2015 at 2:45 PM, Robert Muir rcm...@gmail.com wrote:

 Personally, I am completely against changing this for 5.0

 This is the worst possible thing you can do, it will trickle into more
 bugs in lockfactory etc. Please don't make this last minute risky
 change. it has no benefits and will only cause bugs.

 On Wed, Feb 4, 2015 at 7:44 AM, Robert Muir rcm...@gmail.com wrote:
  On Wed, Feb 4, 2015 at 4:03 AM, Uwe Schindler uschind...@apache.org
 wrote:
 
  The question is now: Do we really intend to create the directory in
 Lucene 5 ? What about opening an IndexReader on a non-existent directory on
 a read-only filesystem? I know that Robert added this to make
 path.getRealPath() to work correctly?
 
  I just want to discuss this before we release 5.0. To me it sounds
 wrong to create the directory in the constructor...
 
 
  Please dont call this a bug until you understand why the change was
  made. Please, read the behavior of getCanonicalPath and understand
  exactly why and how it fails: and its this nonexistent case.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-10 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241214#comment-14241214
 ] 

Itamar Syn-Hershko commented on LUCENE-6103:


Maybe out of scope of this ticket, but how do we go about #2? will be happy to 
take this discussion offline as well

 StandardTokenizer doesn't tokenize word:word
 

 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko
Assignee: Steve Rowe

 StandardTokenizer (and by result most default analyzers) will not tokenize 
 word:word and will preserve it as one token. This can be easily seen using 
 Elasticsearch's analyze API:
 localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
 If this is the intended behavior, then why? I can't really see the logic 
 behind it.
 If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-10 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241306#comment-14241306
 ] 

Itamar Syn-Hershko commented on LUCENE-6103:


Sent them a request. I'll buy Robert beers if that could help pushing this 
forward!

 StandardTokenizer doesn't tokenize word:word
 

 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko
Assignee: Steve Rowe

 StandardTokenizer (and by result most default analyzers) will not tokenize 
 word:word and will preserve it as one token. This can be easily seen using 
 Elasticsearch's analyze API:
 localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
 If this is the intended behavior, then why? I can't really see the logic 
 behind it.
 If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6103) StandardTokenizer doesn't tokenizer word:word

2014-12-09 Thread Itamar Syn-Hershko (JIRA)
Itamar Syn-Hershko created LUCENE-6103:
--

 Summary: StandardTokenizer doesn't tokenizer word:word
 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko


StandardTokenizer (and by result most default analyzers) will not tokenize 
word:word and will preserve it as one token. This can be easily seen using 
Elasticsearch's analyze API:

localhost:9200/_analyze?tokenizer=standardtext=word%20word:word

If this is the intended behavior, then why? I can't really see the logic behind 
it.

If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-09 Thread Itamar Syn-Hershko (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itamar Syn-Hershko updated LUCENE-6103:
---
Summary: StandardTokenizer doesn't tokenize word:word  (was: 
StandardTokenizer doesn't tokenizer word:word)

 StandardTokenizer doesn't tokenize word:word
 

 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko

 StandardTokenizer (and by result most default analyzers) will not tokenize 
 word:word and will preserve it as one token. This can be easily seen using 
 Elasticsearch's analyze API:
 localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
 If this is the intended behavior, then why? I can't really see the logic 
 behind it.
 If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5997) StandardFilter redundant

2014-12-09 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239697#comment-14239697
 ] 

Itamar Syn-Hershko commented on LUCENE-5997:


Sounds good!

 StandardFilter redundant
 

 Key: LUCENE-5997
 URL: https://issues.apache.org/jira/browse/LUCENE-5997
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.10.1
Reporter: Itamar Syn-Hershko
Priority: Trivial

 Any reason why StandardFilter is still around? its just a no-op class now:
   @Override
   public final boolean incrementToken() throws IOException {
 return input.incrementToken(); // TODO: add some niceties for the new 
 grammar
   }
 https://github.com/apache/lucene-solr/blob/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardFilter.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5723) Performance improvements for FastCharStream

2014-12-09 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239728#comment-14239728
 ] 

Itamar Syn-Hershko commented on LUCENE-5723:


Reported as https://java.net/jira/browse/JAVACC-285

 Performance improvements for FastCharStream
 ---

 Key: LUCENE-5723
 URL: https://issues.apache.org/jira/browse/LUCENE-5723
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Itamar Syn-Hershko
Priority: Minor

 Hello from the .NET land,
 A user of ours has identified an optimization opportunity, although minor I 
 think it points to a valid point - we should avoid using exceptions from 
 controlling flow when possible.
 Here's the original ticket + commits to our codebase. If this looks valid to 
 you too I can go ahead and prepare a PR.
 https://issues.apache.org/jira/browse/LUCENENET-541
 https://github.com/apache/lucene.net/commit/ac8c9fa809110ddb180bf7b2ce93e86270b39ff6
 https://git-wip-us.apache.org/repos/asf?p=lucenenet.git;a=blobdiff;f=src/core/QueryParser/QueryParserTokenManager.cs;h=ec09c8e451f7a7d1572fbdce4c7598e362526a7c;hp=17583d20f660fdb6e4aa86105c7574383f965ebe;hb=41ebbc2d;hpb=ac8c9fa809110ddb180bf7b2ce93e86270b39ff6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-09 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239784#comment-14239784
 ] 

Itamar Syn-Hershko commented on LUCENE-6103:


Yes, I figured it will be down to some Unicode rules. Can you give a rationale 
for this, mainly out of curiosity?

Not a Unicode expert, but I'd assume just like you wouldn't want English words 
to not-break on Hebrew Punctuation Gershayim (e.g. TestWord is actually 2 
tokens and מנכלים is one), maybe this rule is meant for specific scenarios and 
not for the general use case?

On another note, any type of Gershayim should be preserved within Hebrew words, 
not only U+05F4. This is mainly because keyboards and editors used produce the 
standard  character in most cases. I had a chat with Robert a while back where 
he said that's the case, I'm just making sure you didn't follow the specs to 
the letter in that regard...

 StandardTokenizer doesn't tokenize word:word
 

 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko
Assignee: Steve Rowe

 StandardTokenizer (and by result most default analyzers) will not tokenize 
 word:word and will preserve it as one token. This can be easily seen using 
 Elasticsearch's analyze API:
 localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
 If this is the intended behavior, then why? I can't really see the logic 
 behind it.
 If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-09 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240090#comment-14240090
 ] 

Itamar Syn-Hershko commented on LUCENE-6103:


Good stuff, thanks Steve. I'm going to see if the rest of the UAX is good for 
us, and if so see if I can comply with the 6.2.5 version of the specs.

It's a good thing StandardTokenizer is no longer English centric, but I cannot 
imagine what use the colon has especially since in most cases it is not 
something reasonable :)

 StandardTokenizer doesn't tokenize word:word
 

 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko
Assignee: Steve Rowe

 StandardTokenizer (and by result most default analyzers) will not tokenize 
 word:word and will preserve it as one token. This can be easily seen using 
 Elasticsearch's analyze API:
 localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
 If this is the intended behavior, then why? I can't really see the logic 
 behind it.
 If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-09 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240133#comment-14240133
 ] 

Itamar Syn-Hershko commented on LUCENE-6103:


Ok so I did some homework. In swedish, connect is a way to shortcut writings 
of words. So C:a is infact cirka which means approximately. I guess it 
can be thought of as English acronyms, only apparently its way less commonly 
used in Swedish (my source says very very seldomly used; old style and not 
used in modern Swedish at all).

Not only it is hardly being used, apparently it's only legal in 3 letter 
combinations (c:a but not c:ka).

And also, the affects it has are quite severe at the moment - 2 words with a 
colon in between that didn't have space will be outputted as one token even 
though its 100% its not applicable to Swedish, since each words has  2 
characters.

I'm not aiming at changing the Unicode standards, that's way beyond my limited 
powers, but:

1. Given the above, does it really make sense to use this tokenizer in all 
language-specific analyzers as well? e.g. 
https://github.com/apache/lucene-solr/blob/lucene_solr_4_9_1/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java#L105

I'd think for language specific analyzers we'd want tokenizers aiming for this 
language with limited support for others. So, in this case, colon will always 
be considered a tokenizing char.

2. Can we change the jflex definition to at least limit the effects of this, 
e.g. only support colon as MidLetter if the overall token length == 3, so c:a 
is a valid token and word:word is not?

 StandardTokenizer doesn't tokenize word:word
 

 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko
Assignee: Steve Rowe

 StandardTokenizer (and by result most default analyzers) will not tokenize 
 word:word and will preserve it as one token. This can be easily seen using 
 Elasticsearch's analyze API:
 localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
 If this is the intended behavior, then why? I can't really see the logic 
 behind it.
 If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-09 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240392#comment-14240392
 ] 

Itamar Syn-Hershko commented on LUCENE-6103:


0. You mean it implements UAX#29 version 6.3 :)

1. I'll likely be sending a PR for #1 sometime soon. Would you recommend using 
UAX#29 minus specific non-English tweaks, or fall back to 
ClassicStandardTokenizer which is English specific, or something else?

2. Here's the thing: the standard is wrong, or buggy. Ask any Swedish and they 
will tell you, and any non-Swedish corpus wouldn't care. And basically this is 
a bug in every Lucene based system today because of the word:word scenario; its 
a bit of an edge case but I bet I can find multiple occurrences in every big 
enough system. What can we do about that?

We already solved this using char filters, converting colons to a comma. It 
feels a bit hacky though, and again - this _is_ a flaw in Lucene's analysis 
even though it conforms to a Unicode standard.

 StandardTokenizer doesn't tokenize word:word
 

 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko
Assignee: Steve Rowe

 StandardTokenizer (and by result most default analyzers) will not tokenize 
 word:word and will preserve it as one token. This can be easily seen using 
 Elasticsearch's analyze API:
 localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
 If this is the intended behavior, then why? I can't really see the logic 
 behind it.
 If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



JFlex, tokenization, and custom token exceptions

2014-11-13 Thread Itamar Syn-Hershko
Hey all,

I posted this question also to the JFlex[1] list as it seems a more
appropriate place, but I thought I should raise this here as well.

I'm looking for ways to use Lucene's tokenizers, but preserve some custom
tokens defined by the user. For example, use StandardTokenizer but preserve
C++, C# and i-phone as whole tokens. The gotcha here is I want that list to
be loaded on runtime, and not compiled into the tokenizer - mainly because
it will change over time.

The problem is there's no real way of doing this currently. While I had
implemented this myself, JFlex doesn't seem to support this (other than
defining new macros and regenerating the Java pieces, recompiling etc).

I discussed this with Rob Muir a couple of months back and he seemed
interested, will be happy to see if there's interest in pursuing this, or
get any new ideas on how to enable this more easily on the JFlex layer or
otherwise. I'll be happy to take this on but every approach I'm looking at
currently has some significant flaws.

Cheers,

  [1] http://sourceforge.net/p/jflex/mailman/jflex-users/?viewmonth=201411

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Author of RavenDB in Action http://manning.com/synhershko/


[jira] [Created] (LUCENE-5997) StandardFilter redundant

2014-10-07 Thread Itamar Syn-Hershko (JIRA)
Itamar Syn-Hershko created LUCENE-5997:
--

 Summary: StandardFilter redundant
 Key: LUCENE-5997
 URL: https://issues.apache.org/jira/browse/LUCENE-5997
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.10.1
Reporter: Itamar Syn-Hershko
Priority: Trivial


Any reason why StandardFilter is still around? its just a no-op class now:

  @Override
  public final boolean incrementToken() throws IOException {
return input.incrementToken(); // TODO: add some niceties for the new 
grammar
  }

https://github.com/apache/lucene-solr/blob/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardFilter.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4601) ivy availability check isn't quite right

2014-06-18 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035885#comment-14035885
 ] 

Itamar Syn-Hershko commented on LUCENE-4601:


May not be directly related, but I just tried running this: 
http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ on OSX Mavericks, 
with ant and ivy both installed via homebrew. Ivy was not found by and idea 
even when I placed a manually downloaded jar locally myself.

I had to run ivy-bootstrap to get off the ground - maybe it worths adding that 
to the docs

 ivy availability check isn't quite right
 

 Key: LUCENE-4601
 URL: https://issues.apache.org/jira/browse/LUCENE-4601
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Reporter: Robert Muir
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4601.patch


 remove ivy from your .ant/lib but load it up on a build file like so:
 You have to lie to lucene's build, overriding ivy.available, because for some 
 reason the detection is wrong and will tell you ivy is not available, when it 
 actually is.
 I tried changing the detector to use available classname=some.ivy.class and 
 this didnt work either... so I don't actually know what the correct fix is.
 {noformat}
 project name=test default=test basedir=.
   path id=ivy.lib.path
 fileset dir=/Users/rmuir includes=ivy-2.2.0.jar /
   /path
   taskdef resource=org/apache/ivy/ant/antlib.xml 
 uri=antlib:org.apache.ivy.ant classpathref=ivy.lib.path /
   target name=test
 subant target=test inheritAll=false inheritRefs=false 
 failonerror=true
   fileset dir=lucene-trunk/lucene includes=build.xml/
   !-- lie --
   property name=ivy.available value=true/
 /subant
   /target
 /project
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2841) CommonGramsFilter improvements

2014-06-18 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035978#comment-14035978
 ] 

Itamar Syn-Hershko commented on LUCENE-2841:


Can anyone review and comment?

 CommonGramsFilter improvements
 --

 Key: LUCENE-2841
 URL: https://issues.apache.org/jira/browse/LUCENE-2841
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.1, 4.0-ALPHA
Reporter: Steve Rowe
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: commit-6402a55.patch


 Currently CommonGramsFilter expects users to remove the common words around 
 which output token ngrams are formed, by appending a StopFilter to the 
 analysis pipeline.  This is inefficient in two ways: captureState() is called 
 on (trailing) stopwords, and then the whole stream has to be re-examined by 
 the following StopFilter.
 The current ctor should be deprecated, and another ctor added with a boolean 
 option controlling whether the common words should be output as unigrams.
 If common words *are* configured to be output as unigrams, captureState() 
 will still need to be called, as it is now.
 If the common words are *not* configured to be output as unigrams, rather 
 than calling captureState() for the trailing token in each output token 
 ngram, the term text, position and offset can be maintained in the same way 
 as they are now for the leading token: using a System.arrayCopy()'d term 
 buffer and a few ints for positionIncrement and offsetd.  The user then no 
 longer would need to append a StopFilter to the analysis chain.
 An example illustrating both possibilities should also be added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5723) Performance improvements for FastCharStream

2014-05-31 Thread Itamar Syn-Hershko (JIRA)
Itamar Syn-Hershko created LUCENE-5723:
--

 Summary: Performance improvements for FastCharStream
 Key: LUCENE-5723
 URL: https://issues.apache.org/jira/browse/LUCENE-5723
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Itamar Syn-Hershko
Priority: Minor


Hello from the .NET land,

A user of ours has identified an optimization opportunity, although minor I 
think it points to a valid point - we should avoid using exceptions from 
controlling flow when possible.

Here's the original ticket + commits to our codebase. If this looks valid to 
you too I can go ahead and prepare a PR.

https://issues.apache.org/jira/browse/LUCENENET-541
https://github.com/apache/lucene.net/commit/ac8c9fa809110ddb180bf7b2ce93e86270b39ff6
https://git-wip-us.apache.org/repos/asf?p=lucenenet.git;a=blobdiff;f=src/core/QueryParser/QueryParserTokenManager.cs;h=ec09c8e451f7a7d1572fbdce4c7598e362526a7c;hp=17583d20f660fdb6e4aa86105c7574383f965ebe;hb=41ebbc2d;hpb=ac8c9fa809110ddb180bf7b2ce93e86270b39ff6



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: ICUFoldingFilter obsolete?

2014-03-03 Thread Itamar Syn-Hershko
This makes sense, thanks Rob

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Author of RavenDB in Action http://manning.com/synhershko/


On Sun, Mar 2, 2014 at 3:54 PM, Robert Muir rcm...@gmail.com wrote:

 I use it too, its fine. Its just not really standardized, and never was :)

 that UTR had that status when i wrote it!

 On Sun, Mar 2, 2014 at 8:52 AM, Shawn Heisey s...@elyograg.org wrote:
  On 3/2/2014 6:37 AM, Robert Muir wrote:
  It was always this way. i don't think such kinds of normalization
  should be standards either (what this stuff is doing is heuristical in
  nature).
 
  I use ICUFoldingFilterFactory in my Solr schema, with the idea that it's
  a smart and single-pass way to fold and lowercase.
 
  Is support from IBM and Lucene expected to continue, or should I be
  looking for another solution?
 
  Thanks,
  Shawn
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




ICUFoldingFilter obsolete?

2014-03-02 Thread Itamar Syn-Hershko
Hi all,

I may have missed the train on this, but what is the status of
ICUFoldingFilter?

Documentation suggests it follows foldings specified in UTR#30 (
http://lucene.apache.org/core/4_6_1/analyzers-icu/org/apache/lucene/analysis/icu/ICUFoldingFilter.html),
but UTR#30 is a draft that was later withdrawn (
http://www.unicode.org/reports/tr30/).

I'm not up-to-date with the greatest and latest in the Unicode world so I'm
not sure why it was withdrawn, but given the delicacy of term normalization
I suppose this worth revisiting?

Thanks,

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Author of RavenDB in Action http://manning.com/synhershko/


[jira] [Created] (LUCENE-5358) Code cleanup on KStemmer

2013-12-03 Thread Itamar Syn-Hershko (JIRA)
Itamar Syn-Hershko created LUCENE-5358:
--

 Summary: Code cleanup on KStemmer
 Key: LUCENE-5358
 URL: https://issues.apache.org/jira/browse/LUCENE-5358
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6, 4.5.1, 4.5, 3.0
Reporter: Itamar Syn-Hershko
Priority: Minor


This affects all versions with KStemmer in them

The code of KStemmer needs some intensive cleanup, just to give you some idea 
on something that immediately popped up:

https://github.com/apache/lucene-solr/blob/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/KStemmer.java#L283-286

I'll be happy to do this myself, just wanted to check in advance to see if this 
is something you'd consider accepting in



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5011) MemoryIndex and FVH don't play along with multi-value fields

2013-05-21 Thread Itamar Syn-Hershko (JIRA)
Itamar Syn-Hershko created LUCENE-5011:
--

 Summary: MemoryIndex and FVH don't play along with multi-value 
fields
 Key: LUCENE-5011
 URL: https://issues.apache.org/jira/browse/LUCENE-5011
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Itamar Syn-Hershko


When multi-value fields are indexed to a MemoryIndex, positions are computed 
correctly on search but the start and end offsets and the values array index 
aren't correct.

Comparing the same execution path for IndexReader on a Directory impl  and 
MemoryIndex (same document, same query, same analyzer, different Index impl), 
the difference first shows in FieldTermStack.java line 125:

termList.add( new TermInfo( term, dpEnum.startOffset(), dpEnum.endOffset(), 
pos, weight ) );

dpEnum.startOffset() and dpEnum.endOffset don't match between implementations.

This looks like a bug in MemoryIndex, which doesn't seem to handle tokenized 
multi-value fields all too well when positions and offsets are required.

I should also mention we are using an Analyzer which outputs several tokens at 
a position (a la SynonymFilter), but I don't believe this is related.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5011) MemoryIndex and FVH don't play along with multi-value fields

2013-05-21 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662950#comment-13662950
 ] 

Itamar Syn-Hershko commented on LUCENE-5011:


The actual test case we have now is very tightly coupled with ElasticSearch and 
our custom analysis chain, it may take me some time to decouple it into a 
stand-alone Lucene test. Alternatively, I'll be happy to work this out with you 
via Skype using our existing test case.

 MemoryIndex and FVH don't play along with multi-value fields
 

 Key: LUCENE-5011
 URL: https://issues.apache.org/jira/browse/LUCENE-5011
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Itamar Syn-Hershko

 When multi-value fields are indexed to a MemoryIndex, positions are computed 
 correctly on search but the start and end offsets and the values array index 
 aren't correct.
 Comparing the same execution path for IndexReader on a Directory impl  and 
 MemoryIndex (same document, same query, same analyzer, different Index impl), 
 the difference first shows in FieldTermStack.java line 125:
 termList.add( new TermInfo( term, dpEnum.startOffset(), dpEnum.endOffset(), 
 pos, weight ) );
 dpEnum.startOffset() and dpEnum.endOffset don't match between implementations.
 This looks like a bug in MemoryIndex, which doesn't seem to handle tokenized 
 multi-value fields all too well when positions and offsets are required.
 I should also mention we are using an Analyzer which outputs several tokens 
 at a position (a la SynonymFilter), but I don't believe this is related.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4673) TermQuery.toString() doesn't play nicely with whitespace

2013-01-09 Thread Itamar Syn-Hershko (JIRA)
Itamar Syn-Hershko created LUCENE-4673:
--

 Summary: TermQuery.toString() doesn't play nicely with whitespace
 Key: LUCENE-4673
 URL: https://issues.apache.org/jira/browse/LUCENE-4673
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.6.2, 4.0-BETA, 4.1
Reporter: Itamar Syn-Hershko


A TermQuery where term.text() contains whitespace outputs incorrect string 
representation: field:foo bar instead of field:foo bar. A correct 
representation is such that could be parsed again to the correct Query object 
(using the correct analyzer, yes, but still).

This may not be so critical, but in our system we use Lucene's QP to parse and 
then pre-process and optimize user queries. To do that we use Query.toString on 
some clauses to rebuild the query string.

This can be easily resolved by always adding quote marks before and after the 
term text in TermQuery.toString. Testing to see if they are required or not  is 
too much work and TermQuery is ignorant of quote marks anyway.

Some other scenarios which could benefit from this change is places where 
escaped characters are used, such as URLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4673) TermQuery.toString() doesn't play nicely with whitespace

2013-01-09 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548874#comment-13548874
 ] 

Itamar Syn-Hershko commented on LUCENE-4673:


I figured as much, yet we would definitely like to have use this behavior 
built-in. Are there any plans on making such an interface to perform a proper 
Query - String conversion?

 TermQuery.toString() doesn't play nicely with whitespace
 

 Key: LUCENE-4673
 URL: https://issues.apache.org/jira/browse/LUCENE-4673
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0-BETA, 4.1, 3.6.2
Reporter: Itamar Syn-Hershko

 A TermQuery where term.text() contains whitespace outputs incorrect string 
 representation: field:foo bar instead of field:foo bar. A correct 
 representation is such that could be parsed again to the correct Query object 
 (using the correct analyzer, yes, but still).
 This may not be so critical, but in our system we use Lucene's QP to parse 
 and then pre-process and optimize user queries. To do that we use 
 Query.toString on some clauses to rebuild the query string.
 This can be easily resolved by always adding quote marks before and after the 
 term text in TermQuery.toString. Testing to see if they are required or not  
 is too much work and TermQuery is ignorant of quote marks anyway.
 Some other scenarios which could benefit from this change is places where 
 escaped characters are used, such as URLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2841) CommonGramsFilter improvements

2012-12-24 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539310#comment-13539310
 ] 

Itamar Syn-Hershko commented on LUCENE-2841:


Attached is a patch to fix this, including tests. There is no regression, and 
the new behavior when keepOrig is set to true is as described in the comments 
here.

The only thing I wasn't sure about was CommonGramsQueryFilter - should it be 
deprecated? or how should it be made to work with this change?

 CommonGramsFilter improvements
 --

 Key: LUCENE-2841
 URL: https://issues.apache.org/jira/browse/LUCENE-2841
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.1, 4.0-ALPHA
Reporter: Steven Rowe
Priority: Minor
 Fix For: 4.1

 Attachments: commit-6402a55.patch


 Currently CommonGramsFilter expects users to remove the common words around 
 which output token ngrams are formed, by appending a StopFilter to the 
 analysis pipeline.  This is inefficient in two ways: captureState() is called 
 on (trailing) stopwords, and then the whole stream has to be re-examined by 
 the following StopFilter.
 The current ctor should be deprecated, and another ctor added with a boolean 
 option controlling whether the common words should be output as unigrams.
 If common words *are* configured to be output as unigrams, captureState() 
 will still need to be called, as it is now.
 If the common words are *not* configured to be output as unigrams, rather 
 than calling captureState() for the trailing token in each output token 
 ngram, the term text, position and offset can be maintained in the same way 
 as they are now for the leading token: using a System.arrayCopy()'d term 
 buffer and a few ints for positionIncrement and offsetd.  The user then no 
 longer would need to append a StopFilter to the analysis chain.
 An example illustrating both possibilities should also be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-12-01 Thread Itamar Syn-Hershko
In the past git had bad tooling, that is not the case today. I've been
using git also without github screens - and while they definitely add a
lot, it is still ten times more usable than SVN.

As I told the Lucene.NET mailing list, you should all watch the following
video and give git a few days of your time before continuing with this
discussion: http://www.youtube.com/watch?v=4XpnKHJAok8

Also, Apache mirrors to github, so basically you work against github all
the time


On Fri, Nov 30, 2012 at 4:15 PM, Robert Muir rcm...@gmail.com wrote:



 On Fri, Nov 30, 2012 at 9:10 AM, Mark Miller markrmil...@gmail.comwrote:


 On Nov 30, 2012, at 8:56 AM, Robert Muir rcm...@gmail.com wrote:

  but git by itself, is pretty unusable.

 Given the number of committers that eat some pain to use git when
 developing lucene/solr, and have no github or pull requests, I'm not sure
 that's a common though :)


 Sure, some people might disagree with me.
 I'm more than willing to eat some pain if it makes contributions easier.

 I just feel like a lot of what makes github successful is
 unfortunately actually in github and not git.

 Its like if your development team is screaming for linux machines. You
 have to be careful how to interpret that. If you hand them a bunch of
 machines with just linux kernels, they probably won't be productive. When
 they scream for linux they want a userland with a shell, compiler,
 X-windows, editor and so on too.




[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance

2012-09-08 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451430#comment-13451430
 ] 

Itamar Syn-Hershko commented on LUCENE-4208:


What's the status of this? are query results being properly sorted based on 
distance?

 Spatial distance relevancy should use score of 1/distance
 -

 Key: LUCENE-4208
 URL: https://issues.apache.org/jira/browse/LUCENE-4208
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
 Fix For: 4.0


 The SpatialStrategy.makeQuery() at the moment uses the distance as the score 
 (although some strategies -- TwoDoubles if I recall might not do anything 
 which would be a bug).  The distance is a poor value to use as the score 
 because the score should be related to relevancy, and the distance itself is 
 inversely related to that.  A score of 1/distance would be nice.  Another 
 alternative is earthCircumference/2 - distance, although I like 1/distance 
 better.  Maybe use a different constant than 1.
 Credit: this is Chris Male's idea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4186) Lucene spatial's distErrPct is treated as a fraction, not a percent.

2012-09-02 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447037#comment-13447037
 ] 

Itamar Syn-Hershko commented on LUCENE-4186:


distErrPct makes sense to me - it makes more sense to talk about the expected 
error rate rather than actual given precision. Hence the name Distance Error 
Percentage makes perfect sense, although is tough to make an acronym of...

And while at it throw a bug fix in: SpatialArgs.toString should multiply 
distPrecision by 100, not divide it.

 Lucene spatial's distErrPct is treated as a fraction, not a percent.
 --

 Key: LUCENE-4186
 URL: https://issues.apache.org/jira/browse/LUCENE-4186
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
Priority: Critical
 Fix For: 4.0


 The distance-error-percent of a query shape in Lucene spatial is, in a 
 nutshell, the percent of the shape's area that is an error epsilon when 
 considering search detail at its edges.  The default is 2.5%, for reference.  
 However, as configured, it is read in as a fraction:
 {code:xml}
 fieldType name=location_2d_trie 
 class=solr.SpatialRecursivePrefixTreeFieldType
distErrPct=0.025 maxDetailDist=0.001 /
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4342) Issues with prefix tree's Distance Error Percentage

2012-08-31 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445807#comment-13445807
 ] 

Itamar Syn-Hershko commented on LUCENE-4342:


I can confirm this is fixed now. Thanks for the fast turnaround!

 Issues with prefix tree's Distance Error Percentage 
 

 Key: LUCENE-4342
 URL: https://issues.apache.org/jira/browse/LUCENE-4342
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Affects Versions: 4.0-ALPHA, 4.0-BETA
Reporter: Itamar Syn-Hershko
Assignee: David Smiley
 Fix For: 4.0

 Attachments: 
 LUCENE-4342_fix_distance_precision_lookup_for_prefix_trees,_and_modify_the_default_algorit.patch,
  unnamed.patch


 See attached patch for a failing test
 Basically, it's a simple point and radius scenario that works great as long 
 as args.setDistPrecision(0.0); is called. Once the default precision is used 
 (2.5%), it doesn't work as expected.
 The distance between the 2 points in the patch is 35.75 KM. Taking into 
 account the 2.5% error the effective radius without false negatives/positives 
 should be around 34.8 KM. This test fails with a radius of 33 KM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4342) Issues with prefix tree's Distance Error Percentage

2012-08-29 Thread Itamar Syn-Hershko (JIRA)
Itamar Syn-Hershko created LUCENE-4342:
--

 Summary: Issues with prefix tree's Distance Error Percentage 
 Key: LUCENE-4342
 URL: https://issues.apache.org/jira/browse/LUCENE-4342
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Affects Versions: 4.0-BETA, 4.0-ALPHA
Reporter: Itamar Syn-Hershko
 Attachments: unnamed.patch

See attached patch for a failing test

Basically, it's a simple point and radius scenario that works great as long as 
args.setDistPrecision(0.0); is called. Once the default precision is used 
(2.5%), it doesn't work as expected.

The distance between the 2 points in the patch is 35.75 KM. Taking into account 
the 2.5% error the effective radius without false negatives/positives should be 
around 34.8 KM. This test fails with a radius of 33 KM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4342) Issues with prefix tree's Distance Error Percentage

2012-08-29 Thread Itamar Syn-Hershko (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itamar Syn-Hershko updated LUCENE-4342:
---

Attachment: unnamed.patch

A failing test

 Issues with prefix tree's Distance Error Percentage 
 

 Key: LUCENE-4342
 URL: https://issues.apache.org/jira/browse/LUCENE-4342
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Affects Versions: 4.0-ALPHA, 4.0-BETA
Reporter: Itamar Syn-Hershko
 Attachments: unnamed.patch


 See attached patch for a failing test
 Basically, it's a simple point and radius scenario that works great as long 
 as args.setDistPrecision(0.0); is called. Once the default precision is used 
 (2.5%), it doesn't work as expected.
 The distance between the 2 points in the patch is 35.75 KM. Taking into 
 account the 2.5% error the effective radius without false negatives/positives 
 should be around 34.8 KM. This test fails with a radius of 33 KM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1375282 - /incubator/lucene.net/trunk/src/core/Util/Parameter.cs

2012-08-20 Thread Itamar Syn-Hershko
This will probably require releasing the core again as well as a new RC...

The spatial module was updated, still doing some integration tests, will
send more updates soon

On Tue, Aug 21, 2012 at 1:14 AM, synhers...@apache.org wrote:

 Author: synhershko
 Date: Mon Aug 20 22:14:01 2012
 New Revision: 1375282

 URL: http://svn.apache.org/viewvc?rev=1375282view=rev
 Log:
 Fixing a possible NRE which can be thrown during a race condition on
 accessing allParameters

 This is not an air-tight solution, as an ArgumentException can still be
 thrown. I don't care much about doing this within a lock as it will never
 be a bottleneck.


 https://groups.google.com/group/ravendb/browse_thread/thread/a5cf07e80f70c856

 Modified:
 incubator/lucene.net/trunk/src/core/Util/Parameter.cs

 Modified: incubator/lucene.net/trunk/src/core/Util/Parameter.cs
 URL:
 http://svn.apache.org/viewvc/incubator/lucene.net/trunk/src/core/Util/Parameter.cs?rev=1375282r1=1375281r2=1375282view=diff

 ==
 --- incubator/lucene.net/trunk/src/core/Util/Parameter.cs (original)
 +++ incubator/lucene.net/trunk/src/core/Util/Parameter.cs Mon Aug 20 22:14:01
 2012
 @@ -39,11 +39,13 @@ namespace Lucene.Net.Util
 // typesafe enum pattern, no public constructor
 this.name = name;
 string key = MakeKey(name);
 -
 -   if (allParameters.ContainsKey(key))
 -   throw new
 System.ArgumentException(Parameter name  + key +  already used!);
 -
 -   allParameters[key] = this;
 +
 +   lock (allParameters)
 +   {
 +   if (allParameters.ContainsKey(key))
 +   throw new
 System.ArgumentException(Parameter name  + key +  already used!);
 +   allParameters[key] = this;
 +   }
 }

 private string MakeKey(string name)






Re: svn commit: r1375282 - /incubator/lucene.net/trunk/src/core/Util/Parameter.cs

2012-08-20 Thread Itamar Syn-Hershko
That won't work, the Occur flags need to be statically and publicly
available

Since the entire point of that Parameter class is to make the enum
serializable, which is infact the case with C# (while it is not in Java 5),
I just removed it and made Occur a native enum again

All core tests pass (aside from 2 in TestOpenBitSet and
TestWeakDictionaryBehavior, but they aren't related to this change).

Commit details: http://svn.apache.org/viewvc?view=revisionrevision=1375296

On Tue, Aug 21, 2012 at 1:21 AM, Oren Eini (Ayende Rahien) 
aye...@ayende.com wrote:

 Instead of doing it this way, do NOT create Occur using separate static
 fields.
 Merge Parameter into Occur (only used there) and create the entire
 dictionary once.
 Otherwise, you run into risk of the ArgumentException.
 If that happens, because this is raised from the static ctor, you'll have
 killed the entire app domain.

 On Tue, Aug 21, 2012 at 1:19 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:

  This will probably require releasing the core again as well as a new
 RC...
 
  The spatial module was updated, still doing some integration tests, will
  send more updates soon
 
  On Tue, Aug 21, 2012 at 1:14 AM, synhers...@apache.org wrote:
 
   Author: synhershko
   Date: Mon Aug 20 22:14:01 2012
   New Revision: 1375282
  
   URL: http://svn.apache.org/viewvc?rev=1375282view=rev
   Log:
   Fixing a possible NRE which can be thrown during a race condition on
   accessing allParameters
  
   This is not an air-tight solution, as an ArgumentException can still be
   thrown. I don't care much about doing this within a lock as it will
 never
   be a bottleneck.
  
  
  
 
 https://groups.google.com/group/ravendb/browse_thread/thread/a5cf07e80f70c856
  
   Modified:
   incubator/lucene.net/trunk/src/core/Util/Parameter.cs
  
   Modified: incubator/lucene.net/trunk/src/core/Util/Parameter.cs
   URL:
  
 
 http://svn.apache.org/viewvc/incubator/lucene.net/trunk/src/core/Util/Parameter.cs?rev=1375282r1=1375281r2=1375282view=diff
  
  
 
 ==
   --- incubator/lucene.net/trunk/src/core/Util/Parameter.cs (original)
   +++ incubator/lucene.net/trunk/src/core/Util/Parameter.cs Mon Aug 20
 22
  :14:01
   2012
   @@ -39,11 +39,13 @@ namespace Lucene.Net.Util
   // typesafe enum pattern, no public constructor
   this.name = name;
   string key = MakeKey(name);
   -
   -   if (allParameters.ContainsKey(key))
   -   throw new
   System.ArgumentException(Parameter name  + key +  already used!);
   -
   -   allParameters[key] = this;
   +
   +   lock (allParameters)
   +   {
   +   if (allParameters.ContainsKey(key))
   +   throw new
   System.ArgumentException(Parameter name  + key +  already used!);
   +   allParameters[key] = this;
   +   }
   }
  
   private string MakeKey(string name)
  
  
  
  
 



Re: Outstanding issues for 3.0.3

2012-08-02 Thread Itamar Syn-Hershko
Nowadays git works just great for Windows, and it's much easier to work
with than Hg

On Wed, Aug 1, 2012 at 9:41 PM, Zachary Gramana zgram...@feature23.comwrote:

 On Aug 1, 2012, at 12:51 PM, Itamar Syn-Hershko wrote:

  And for heaven's sake, can we move to git when graduating?

 Given that we're a .NET-focused community, and many of us are likely
 primarily using Windows as both our primary development and deployment
 platforms, I'd suggest looking at Mercurial before committing to git.

 Either way, +1 for any DVCS.



Re: Outstanding issues for 3.0.3

2012-08-02 Thread Itamar Syn-Hershko
The point is to make the code better, not to satisfy R# :)

The main benefit of this process is marking fields as readonly, finding
code paths with stupid behavior and moving simple aggregations to use LINQ.
I don't apply the LINQ syntax to a non-trivial operations, to make it
easier to keep track of the Java version.

My thoughts on the points you raised inline

On Thu, Aug 2, 2012 at 6:53 PM, Zachary Gramana zgram...@gmail.com wrote:

 I would like to pitch into this effort and put my ReSharper license to
 use. I pulled down trunk, and picked a yellow item at random, and started
 to dig in. I quickly generated more questions than answers, realized I
 needed to stop munging code and consult the wiki and list archives. After
 digging through both, I'm still not entirely certain about what the style
 guidelines are for 3.x onward.

 I also noted this[1] discussion regarding some other guidelines, but it
 didn't see if it made it beyond the proposal stage.

 [1]
 http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-dev/201112.mbox/%3ccajtrbsrdbzkocwln6d6ywhzn2fno91mko1acrp-pflx62du...@mail.gmail.com%3E

 Here are some of the things Re# is catching that I'm unsure of:

 1) Usage of this prefix when not required.

 this.blah = blah;  - required this.
 this.aBlah = blah; - optional this, which Re# doesn't like.

 I'm assuming consistency wins here, and 'this.' stays, but wanted to
 double check.


Doesn't really matter IMO. I just hit Alt-enter when I have it in focus,
otherwise I ignore that.



 2) Using different conventions for fields and parameters\local vars.

 blah vs. _blah

 Combined with 1, Re# wants (and I'm personally accustomed to):

 _blah = blah;

 However, that seems to violate the adopted style.


I think we should stick to the Java naming conventions in the private parts
(minus the function casings) as much as possible. Main reason is the
ability to apply patches from Java Lucene and support future ports more
easily. This is why I kept variable names untouched.



 3) Full qualification of type names.

 Re # wants to remove redundant namespace qualifiers. Leave them or remove
 them?


Same as Alt-Enter argument as above...



 4) Removing unreferenced classes.

 Should I remove non-public unreferenced classes? The ones I've come across
 so far are private.


It's .NET, not C++, but I still usually remove them, not really sure why
tho...



 5) var vs. explicit

 I know this has been brought up before, but not sure of the final
 disposition. FWIW, I prefer var.


 There are some non-Re# issues I came across as well that look like
 artifacts of code generation:


I move to var because it *might* help in the future when the API changes,
and it doesn't really affect anything now



 6) Weird param names.

 Param1 vs. directory

 I assume it's okay to replace 'Param1' with something a descriptive name
 like 'directory'.


Yes. Also var names like out_Renamed to @out. This one is important.



 7) Field names that follow local variable naming conventions.

 Lots of issues related to private vars with names like i, j, k, etc. It
 feels like the right thing to do is to change the scope so that they go
 back to being local vars instead of fields. However, this requires a much
 more significant refactoring, and I didn't want to assume it was okay to do
 that.


See above, I don't think we should touch those.



 If these questions have already been answered elsewhere and I missed the
 documentation/FAQ/developer guide, then I apologize and would appreciate
 the links. Alternatively, if someone has a Re# rule config that they are
 willing to post somewhere, I would be glad to use it.

 - Zack


 On Jul 27, 2012, at 12:00 PM, Itamar Syn-Hershko wrote:

  The cleanup consists mainly of going file by file with ReSharper and
 trying
  to get them as green as possible. Making a lot of fields readonly,
 removing
  unused vars and stuff like that. There are still loads of files left.
 
  I was also hoping to get to updating the spatial module with some recent
  updates, and to also support polygon searches. But that may take a bit
 more
  time, so it's really up to you guys (or we can open a vote for it).





Re: Outstanding issues for 3.0.3

2012-08-02 Thread Itamar Syn-Hershko
Prescott - we could make an RC and push it to Nuget as a PreRelease, to get
real feedback.

On Thu, Aug 2, 2012 at 7:13 PM, Prescott Nasser geobmx...@hotmail.comwrote:

 I don't think we ever fully adopted the style guidelines, probably not a
 terrible discussion to have. As for this release, I think that by lazy
 consensus we should branch the trunk at the end of this weekend (say
 monday), and begin the process of cutting a release. - my $.02 below


  1) Usage of this prefix when not required.
 
  this.blah = blah; - required this.
  this.aBlah = blah; - optional this, which Re# doesn't like.
 
  I'm assuming consistency wins here, and 'this.' stays, but wanted to
 double check.

 I'd error with consistency


 
  2) Using different conventions for fields and parameters\local vars.
 
  blah vs. _blah
 

  Combined with 1, Re# wants (and I'm personally accustomed to):
 
  _blah = blah;
 


 For private variables _ is ok, for anything else, don't use _ as it's not
 CLR compliant


  However, that seems to violate the adopted style.
 
  3) Full qualification of type names.
 
  Re # wants to remove redundant namespace qualifiers. Leave them or
 remove them?
 

 I try to remove them

  4) Removing unreferenced classes.
 
  Should I remove non-public unreferenced classes? The ones I've come
 across so far are private.
 

 I'm not sure I understand - are you saying we have classes that are never
 used in random places? If so, I think before removing them we should have a
 conversation; what are they, why are they there, etc. - I'm hoping there
 aren't too many of these..

  5) var vs. explicit
 
  I know this has been brought up before, but not sure of the final
 disposition. FWIW, I prefer var.
 

 I use var with it's plainly obvious the object var obj = new MyClass(). I
 usually use explicit when it's an object returned from some function that
 makes it unclear what the return value is:


 var items = search.GetResults();

 vs

 IListSearchResult items = search.GetResults(); //prefer


 
  There are some non-Re# issues I came across as well that look like
 artifacts of code generation:
 
  6) Weird param names.
 
  Param1 vs. directory
 
  I assume it's okay to replace 'Param1' with something a descriptive name
 like 'directory'.
 

 Weird - I think a rename is OK for this release (Since we're ticking up a
 full version number), but I believe changing param names can potentially
 break code. That said, I don't really think we need to change the names and
 push the 3.0.3 release out, and if it does in fact cause breaking changes,
 I'd be a little careful about how we do it going forward to 3.6.

  7) Field names that follow local variable naming conventions.
 
  Lots of issues related to private vars with names like i, j, k, etc. It
 feels like the right thing to do is to change the scope so that they go
 back to being local vars instead of fields. However, this requires a much
 more significant refactoring, and I didn't want to assume it was okay to do
 that.
 

 I'd avoid this for now - a lot of this is a carry over from the java
 version and to rename all those, it starts to get a bit confusing if we
 have to compare java to C# and these are all changed around.



  If these questions have already been answered elsewhere and I missed the
 documentation/FAQ/developer guide, then I apologize and would appreciate
 the links. Alternatively, if someone has a Re# rule config that they are
 willing to post somewhere, I would be glad to use it.
 

 I think we talked about Re#'s rules at one point, I'll try to dig that
 conversation up and see where it landed. It's probably a good idea for us
 to build rules though.

  - Zack
 
 
  On Jul 27, 2012, at 12:00 PM, Itamar Syn-Hershko wrote:
 
   The cleanup consists mainly of going file by file with ReSharper and
 trying
   to get them as green as possible. Making a lot of fields readonly,
 removing
   unused vars and stuff like that. There are still loads of files left.
  
   I was also hoping to get to updating the spatial module with some
 recent
   updates, and to also support polygon searches. But that may take a bit
 more
   time, so it's really up to you guys (or we can open a vote for it).
 



Re: Lucene Nuget

2012-08-01 Thread Itamar Syn-Hershko
Yes, with the due release. I, for once, always mistake one for another.

On Wed, Aug 1, 2012 at 4:09 AM, Prescott Nasser geobmx...@hotmail.comwrote:




 There are two packages Lucene packages on nuget that are depreciated. With
 some updates nuget made a while ago, we have the ability to remove those
 packages. Do we want to?


Re: Outstanding issues for 3.0.3

2012-08-01 Thread Itamar Syn-Hershko
+1 from me too, then

On Wed, Aug 1, 2012 at 7:42 PM, Prescott Nasser geobmx...@hotmail.comwrote:

 Spatial could be something cool to look forward to in 3.6 IMO.

 I'm good with tagging what we have and I'd like to take a week to allow
 the community test the tag code against their stuff before cutting release
 binaries.

 +1 to going now.


 
  Date: Wed, 1 Aug 2012 19:31:45 +0300
  Subject: Re: Outstanding issues for 3.0.3
  From: ita...@code972.com
  To: lucene-net-dev@lucene.apache.org
 
  I agree
 
  What about the spatial stuff? you guys want to wait for it?
 
  On Wed, Aug 1, 2012 at 7:19 PM, Christopher Currens 
 currens.ch...@gmail.com
   wrote:
 
   I think that while it would be nice to get it done, it's a fairly large
   effort, and we might be better off with doing a release. The tests are
   massively changed between 3.0.3 and 3.6, so I think a lot of it will
 get
   cleaned up anyway during the port. Also, a little while back, I did
 clean
   up a lot of the test code to use Assert.Throws and to remove
 unnecessary
   variables, though that might have only been in catch statements. Either
   way, I think we just might be ready as it is.
  
   I am eager to start working on porting 3.6.
  
  
   Thanks,
   Christopher
  
   On Wed, Aug 1, 2012 at 9:14 AM, Itamar Syn-Hershko ita...@code972.com
   wrote:
  
I still have plenty to go on, but on a second thought we could do
 that
   work
just the same when we work towards 3.6, so I won't hold you off
 anymore
   
Up to Chris - he wanted to do some tests cleanup
   
Also, I'll be updating the Spatial contrib during the next week or so
   with
polygon support. I think we should hold off the release so we can
 provide
that as well, but I suggest we will take a vote on it, don't let me
 hold
you off.
   
On Wed, Aug 1, 2012 at 6:58 PM, Prescott Nasser 
 geobmx...@hotmail.com
wrote:
   
 Just wanted to check in - where do we feel like we stand? What is
 left
   to
 do - is there anything I can help with specifically? I'll have some
   spare
 cycles this weekend. I want to really make a push to get this
 ready to
roll
 and not let it languish

 ~P

 
  Date: Sat, 28 Jul 2012 20:38:10 +0300
  Subject: Re: Outstanding issues for 3.0.3
  From: ita...@code972.com
  To: lucene-net-dev@lucene.apache.org
 
  Go ahead with contrib and tests, ill resume with core and
 coordinate
  further later
  On Jul 27, 2012 7:04 PM, Christopher Currens 
currens.ch...@gmail.com
  wrote:
 
   I've got resharper and can help with that if you'd like to
   coordinate
 it.
   I can take a one or some of the contrib projects or part of the
   main
   library, or *shudder* the any of the test libraries. The code
 has
 needed
   come cleaning up for a while and some of the clean up work is
 an
   optimization some levels, so I'm definitely okay with spending
 some
 time
   doing that. I'm okay with waiting longer as long as something
 is
 getting
   done.
  
  
   Thanks,
   Christopher
  
   On Fri, Jul 27, 2012 at 9:00 AM, Itamar Syn-Hershko 
 ita...@code972.com
   wrote:
  
The cleanup consists mainly of going file by file with
 ReSharper
and
   trying
to get them as green as possible. Making a lot of fields
   readonly,
   removing
unused vars and stuff like that. There are still loads of
 files
left.
   
I was also hoping to get to updating the spatial module with
 some
 recent
updates, and to also support polygon searches. But that may
 take
   a
 bit
   more
time, so it's really up to you guys (or we can open a vote
 for
   it).
   
On Fri, Jul 27, 2012 at 6:35 PM, Christopher Currens 
currens.ch...@gmail.com wrote:
   
 Itamar,

 Where do we stand on the clean up now? Is there anything in
 particular
 that you're doing that you'd like help with? I have some
 free
time
   today
 and am eager to get this version released.


 Thanks,
 Christopher


 On Sat, Jul 21, 2012 at 1:02 PM, Prescott Nasser 
   geobmx...@hotmail.com
 wrote:

 
  Alright, I'll hold off a bit.
 
  
   Date: Sat, 21 Jul 2012 22:59:32 +0300
   Subject: Re: Outstanding issues for 3.0.3
   From: ita...@code972.com
   To: lucene-net-u...@lucene.apache.org
   CC: lucene-net-dev@lucene.apache.org
  
   Actually there was some clean up work I started doing
 and
would
   want
to
   complete, and also sign off on the suspected corruption
   issue
 we

Re: Outstanding issues for 3.0.3

2012-08-01 Thread Itamar Syn-Hershko
Yes, we could also release a 3.0.10 or something with the improved spatial
module. Or I can race Prescott's week and get it in before it ends :)

And for heaven's sake, can we move to git when graduating? A live crash
course to all committers is on me.

On Wed, Aug 1, 2012 at 7:42 PM, Christopher Currens currens.ch...@gmail.com
 wrote:

 Ah, I did overlook that.  I imagine that the move from 3.0.3 to 3.6 will
 realistically take a while, so if we can't get spatial stuff out before
 then, would it take until 3.6 to be able to release new functionality into
 the spatial contrib project?  Along those lines, I propose that we move
 3.0.3 into a new branch instead of just tagging the release and merging in
 3.6.  That way, during the time it takes to port 3.6, we can still do any
 critical bug fixes and features like these and release new versions.  At
 least then, people won't be waiting for months for bug fixes.


 If we did that, then it also might not be critical to get the spatial stuff
 out with this release, since we could get out a new release in a few weeks
 with updated spatial libraries...not that I'm against waiting for it now.
  It was just a suggestion on how we can move forward with the project.
  Thoughts either way on this?



 On Wed, Aug 1, 2012 at 9:31 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:

  I agree
 
  What about the spatial stuff? you guys want to wait for it?
 
  On Wed, Aug 1, 2012 at 7:19 PM, Christopher Currens 
  currens.ch...@gmail.com
   wrote:
 
   I think that while it would be nice to get it done, it's a fairly large
   effort, and we might be better off with doing a release.  The tests are
   massively changed between 3.0.3 and 3.6, so I think a lot of it will
 get
   cleaned up anyway during the port.  Also, a little while back, I did
  clean
   up a lot of the test code to use Assert.Throws and to remove
 unnecessary
   variables, though that might have only been in catch statements.
  Either
   way, I think we just might be ready as it is.
  
   I am eager to start working on porting 3.6.
  
  
   Thanks,
   Christopher
  
   On Wed, Aug 1, 2012 at 9:14 AM, Itamar Syn-Hershko ita...@code972.com
   wrote:
  
I still have plenty to go on, but on a second thought we could do
 that
   work
just the same when we work towards 3.6, so I won't hold you off
 anymore
   
Up to Chris - he wanted to do some tests cleanup
   
Also, I'll be updating the Spatial contrib during the next week or so
   with
polygon support. I think we should hold off the release so we can
  provide
that as well, but I suggest we will take a vote on it, don't let me
  hold
you off.
   
On Wed, Aug 1, 2012 at 6:58 PM, Prescott Nasser 
 geobmx...@hotmail.com
wrote:
   
 Just wanted to check in - where do we feel like we stand? What is
  left
   to
 do - is there anything I can help with specifically? I'll have some
   spare
 cycles this weekend. I want to really make a push to get this ready
  to
roll
 and not let it languish

 ~P

 
  Date: Sat, 28 Jul 2012 20:38:10 +0300
  Subject: Re: Outstanding issues for 3.0.3
  From: ita...@code972.com
  To: lucene-net-dev@lucene.apache.org
 
  Go ahead with contrib and tests, ill resume with core and
  coordinate
  further later
  On Jul 27, 2012 7:04 PM, Christopher Currens 
currens.ch...@gmail.com
  wrote:
 
   I've got resharper and can help with that if you'd like to
   coordinate
 it.
   I can take a one or some of the contrib projects or part of the
   main
   library, or *shudder* the any of the test libraries. The code
 has
 needed
   come cleaning up for a while and some of the clean up work is
 an
   optimization some levels, so I'm definitely okay with spending
  some
 time
   doing that. I'm okay with waiting longer as long as something
 is
 getting
   done.
  
  
   Thanks,
   Christopher
  
   On Fri, Jul 27, 2012 at 9:00 AM, Itamar Syn-Hershko 
 ita...@code972.com
   wrote:
  
The cleanup consists mainly of going file by file with
  ReSharper
and
   trying
to get them as green as possible. Making a lot of fields
   readonly,
   removing
unused vars and stuff like that. There are still loads of
 files
left.
   
I was also hoping to get to updating the spatial module with
  some
 recent
updates, and to also support polygon searches. But that may
  take
   a
 bit
   more
time, so it's really up to you guys (or we can open a vote
 for
   it).
   
On Fri, Jul 27, 2012 at 6:35 PM, Christopher Currens 
currens.ch...@gmail.com wrote:
   
 Itamar,

 Where do we stand on the clean up now? Is there anything in
 particular
 that you're doing that you'd like help with? I have some
 free

Re: Outstanding issues for 3.0.3

2012-08-01 Thread Itamar Syn-Hershko
On that note, see git-flow
http://nvie.com/posts/a-successful-git-branching-model/  :)

On Wed, Aug 1, 2012 at 7:49 PM, Prescott Nasser geobmx...@hotmail.comwrote:

 That's probably not a bad idea - we should probably move to a structure
 like that anyway going forward so that it's easier to manage bug fixes and
 minor updates in between the big work

 
  Date: Wed, 1 Aug 2012 09:42:40 -0700
  Subject: Re: Outstanding issues for 3.0.3
  From: currens.ch...@gmail.com
  To: lucene-net-dev@lucene.apache.org
 
  Ah, I did overlook that. I imagine that the move from 3.0.3 to 3.6 will
  realistically take a while, so if we can't get spatial stuff out before
  then, would it take until 3.6 to be able to release new functionality
 into
  the spatial contrib project? Along those lines, I propose that we move
  3.0.3 into a new branch instead of just tagging the release and merging
 in
  3.6. That way, during the time it takes to port 3.6, we can still do any
  critical bug fixes and features like these and release new versions. At
  least then, people won't be waiting for months for bug fixes.
 
  If we did that, then it also might not be critical to get the spatial
 stuff
  out with this release, since we could get out a new release in a few
 weeks
  with updated spatial libraries...not that I'm against waiting for it now.
  It was just a suggestion on how we can move forward with the project.
  Thoughts either way on this?
 
 
  On Wed, Aug 1, 2012 at 9:31 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:
 
   I agree
  
   What about the spatial stuff? you guys want to wait for it?
  
   On Wed, Aug 1, 2012 at 7:19 PM, Christopher Currens 
   currens.ch...@gmail.com
wrote:
  
I think that while it would be nice to get it done, it's a fairly
 large
effort, and we might be better off with doing a release. The tests
 are
massively changed between 3.0.3 and 3.6, so I think a lot of it will
 get
cleaned up anyway during the port. Also, a little while back, I did
   clean
up a lot of the test code to use Assert.Throws and to remove
 unnecessary
variables, though that might have only been in catch statements.
 Either
way, I think we just might be ready as it is.
   
I am eager to start working on porting 3.6.
   
   
Thanks,
Christopher
   
On Wed, Aug 1, 2012 at 9:14 AM, Itamar Syn-Hershko 
 ita...@code972.com
wrote:
   
 I still have plenty to go on, but on a second thought we could do
 that
work
 just the same when we work towards 3.6, so I won't hold you off
 anymore

 Up to Chris - he wanted to do some tests cleanup

 Also, I'll be updating the Spatial contrib during the next week or
 so
with
 polygon support. I think we should hold off the release so we can
   provide
 that as well, but I suggest we will take a vote on it, don't let me
   hold
 you off.

 On Wed, Aug 1, 2012 at 6:58 PM, Prescott Nasser 
 geobmx...@hotmail.com
 wrote:

  Just wanted to check in - where do we feel like we stand? What is
   left
to
  do - is there anything I can help with specifically? I'll have
 some
spare
  cycles this weekend. I want to really make a push to get this
 ready
   to
 roll
  and not let it languish
 
  ~P
 
  
   Date: Sat, 28 Jul 2012 20:38:10 +0300
   Subject: Re: Outstanding issues for 3.0.3
   From: ita...@code972.com
   To: lucene-net-dev@lucene.apache.org
  
   Go ahead with contrib and tests, ill resume with core and
   coordinate
   further later
   On Jul 27, 2012 7:04 PM, Christopher Currens 
 currens.ch...@gmail.com
   wrote:
  
I've got resharper and can help with that if you'd like to
coordinate
  it.
I can take a one or some of the contrib projects or part of
 the
main
library, or *shudder* the any of the test libraries. The
 code has
  needed
come cleaning up for a while and some of the clean up work
 is an
optimization some levels, so I'm definitely okay with
 spending
   some
  time
doing that. I'm okay with waiting longer as long as
 something is
  getting
done.
   
   
Thanks,
Christopher
   
On Fri, Jul 27, 2012 at 9:00 AM, Itamar Syn-Hershko 
  ita...@code972.com
wrote:
   
 The cleanup consists mainly of going file by file with
   ReSharper
 and
trying
 to get them as green as possible. Making a lot of fields
readonly,
removing
 unused vars and stuff like that. There are still loads of
 files
 left.

 I was also hoping to get to updating the spatial module
 with
   some
  recent
 updates, and to also support polygon searches. But that may
   take
a
  bit
more
 time, so it's really up

Re: Outstanding issues for 3.0.3

2012-08-01 Thread Itamar Syn-Hershko
I agree

What about the spatial stuff? you guys want to wait for it?

On Wed, Aug 1, 2012 at 7:19 PM, Christopher Currens currens.ch...@gmail.com
 wrote:

 I think that while it would be nice to get it done, it's a fairly large
 effort, and we might be better off with doing a release.  The tests are
 massively changed between 3.0.3 and 3.6, so I think a lot of it will get
 cleaned up anyway during the port.  Also, a little while back, I did clean
 up a lot of the test code to use Assert.Throws and to remove unnecessary
 variables, though that might have only been in catch statements.  Either
 way, I think we just might be ready as it is.

 I am eager to start working on porting 3.6.


 Thanks,
 Christopher

 On Wed, Aug 1, 2012 at 9:14 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:

  I still have plenty to go on, but on a second thought we could do that
 work
  just the same when we work towards 3.6, so I won't hold you off anymore
 
  Up to Chris - he wanted to do some tests cleanup
 
  Also, I'll be updating the Spatial contrib during the next week or so
 with
  polygon support. I think we should hold off the release so we can provide
  that as well, but I suggest we will take a vote on it, don't let me hold
  you off.
 
  On Wed, Aug 1, 2012 at 6:58 PM, Prescott Nasser geobmx...@hotmail.com
  wrote:
 
   Just wanted to check in - where do we feel like we stand? What is left
 to
   do - is there anything I can help with specifically? I'll have some
 spare
   cycles this weekend. I want to really make a push to get this ready to
  roll
   and not let it languish
  
   ~P
  
   
Date: Sat, 28 Jul 2012 20:38:10 +0300
Subject: Re: Outstanding issues for 3.0.3
From: ita...@code972.com
To: lucene-net-...@lucene.apache.org
   
Go ahead with contrib and tests, ill resume with core and coordinate
further later
On Jul 27, 2012 7:04 PM, Christopher Currens 
  currens.ch...@gmail.com
wrote:
   
 I've got resharper and can help with that if you'd like to
 coordinate
   it.
 I can take a one or some of the contrib projects or part of the
 main
 library, or *shudder* the any of the test libraries. The code has
   needed
 come cleaning up for a while and some of the clean up work is an
 optimization some levels, so I'm definitely okay with spending some
   time
 doing that. I'm okay with waiting longer as long as something is
   getting
 done.


 Thanks,
 Christopher

 On Fri, Jul 27, 2012 at 9:00 AM, Itamar Syn-Hershko 
   ita...@code972.com
 wrote:

  The cleanup consists mainly of going file by file with ReSharper
  and
 trying
  to get them as green as possible. Making a lot of fields
 readonly,
 removing
  unused vars and stuff like that. There are still loads of files
  left.
 
  I was also hoping to get to updating the spatial module with some
   recent
  updates, and to also support polygon searches. But that may take
 a
   bit
 more
  time, so it's really up to you guys (or we can open a vote for
 it).
 
  On Fri, Jul 27, 2012 at 6:35 PM, Christopher Currens 
  currens.ch...@gmail.com wrote:
 
   Itamar,
  
   Where do we stand on the clean up now? Is there anything in
   particular
   that you're doing that you'd like help with? I have some free
  time
 today
   and am eager to get this version released.
  
  
   Thanks,
   Christopher
  
  
   On Sat, Jul 21, 2012 at 1:02 PM, Prescott Nasser 
 geobmx...@hotmail.com
   wrote:
  
   
Alright, I'll hold off a bit.
   

 Date: Sat, 21 Jul 2012 22:59:32 +0300
 Subject: Re: Outstanding issues for 3.0.3
 From: ita...@code972.com
 To: lucene-net-u...@lucene.apache.org
 CC: lucene-net-...@lucene.apache.org

 Actually there was some clean up work I started doing and
  would
 want
  to
 complete, and also sign off on the suspected corruption
 issue
   we
   raised.
 I'm afraid I won't have much time this week to properly do
  all
 that,
   but
 I'll keep you posted.

 On Sat, Jul 21, 2012 at 10:20 PM, Prescott Nasser 
   geobmx...@hotmail.com
wrote:

 
  Alright, latest patch fixed what could be done with the
 cls
 issues
  at
  present. With that, I think we are ready to roll with a
   release.
 If
people
  could please take some time to run all the test as well
 as
 whatever
other
  tests they might run. We've had some issues with tests
 only
  happening
on
  some systems so I want to make sure we have those bases
   covered.
   Unless
  there is anything else that should be done, I'll leave
  every
   one

Re: Outstanding issues for 3.0.3

2012-07-28 Thread Itamar Syn-Hershko
Go ahead with contrib and tests, ill resume with core and coordinate
further later
On Jul 27, 2012 7:04 PM, Christopher Currens currens.ch...@gmail.com
wrote:

 I've got resharper and can help with that if you'd like to coordinate it.
  I can take a one or some of the contrib projects or part of the main
 library, or *shudder* the any of the test libraries.  The code has needed
 come cleaning up for a while and some of the clean up work is an
 optimization some levels, so I'm definitely okay with spending some time
 doing that.  I'm okay with waiting longer as long as something is getting
 done.


 Thanks,
 Christopher

 On Fri, Jul 27, 2012 at 9:00 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:

  The cleanup consists mainly of going file by file with ReSharper and
 trying
  to get them as green as possible. Making a lot of fields readonly,
 removing
  unused vars and stuff like that. There are still loads of files left.
 
  I was also hoping to get to updating the spatial module with some recent
  updates, and to also support polygon searches. But that may take a bit
 more
  time, so it's really up to you guys (or we can open a vote for it).
 
  On Fri, Jul 27, 2012 at 6:35 PM, Christopher Currens 
  currens.ch...@gmail.com wrote:
 
   Itamar,
  
   Where do we stand on the clean up now?  Is there anything in particular
   that you're doing that you'd like help with?  I have some free time
 today
   and am eager to get this version released.
  
  
   Thanks,
   Christopher
  
  
   On Sat, Jul 21, 2012 at 1:02 PM, Prescott Nasser 
 geobmx...@hotmail.com
   wrote:
  
   
Alright, I'll hold off a bit.
   

 Date: Sat, 21 Jul 2012 22:59:32 +0300
 Subject: Re: Outstanding issues for 3.0.3
 From: ita...@code972.com
 To: lucene-net-u...@lucene.apache.org
 CC: lucene-net-dev@lucene.apache.org

 Actually there was some clean up work I started doing and would
 want
  to
 complete, and also sign off on the suspected corruption issue we
   raised.
 I'm afraid I won't have much time this week to properly do all
 that,
   but
 I'll keep you posted.

 On Sat, Jul 21, 2012 at 10:20 PM, Prescott Nasser 
   geobmx...@hotmail.com
wrote:

 
  Alright, latest patch fixed what could be done with the cls
 issues
  at
  present. With that, I think we are ready to roll with a release.
 If
people
  could please take some time to run all the test as well as
 whatever
other
  tests they might run. We've had some issues with tests only
  happening
on
  some systems so I want to make sure we have those bases covered.
   Unless
  there is anything else that should be done, I'll leave every one
 a
week to
  run their tests. Next saturday I will tag the trunk and cut a
  release
with
  both 3.5 and 4.0 binaries. Great work everyone. ~P
   Date: Mon, 9 Jul 2012 18:02:30 -0700
   Subject: Re: Outstanding issues for 3.0.3
   From: currens.ch...@gmail.com
   To: lucene-net-dev@lucene.apache.org
  
   I can set a different build target, but I can't set the actual
framework
  to
   3.5 without doing it for all build configurations. On top of
  that,
3.5
   needs System.Core to be referenced, which is done automatically
  in
.NET 4
   (I'm not sure if MSBuild v4 does it automatically?). I did
 kinda
   get
it
   working by putting a TargetFrameworkVersion tag of 4.0 in Debug
  and
  Release
   configurations and 3.5 in Debug 3.5 and Release 3.5
  configurations,
but
   that's a little...well, difficult to maintain by hand since
  visual
studio
   doesn't allow you to set different framework versions per
configuration,
   and visual studio seemed to be having trouble with references,
   since
both
   frameworks were being referenced.
  
   On Mon, Jul 9, 2012 at 5:57 PM, Prescott Nasser 
geobmx...@hotmail.com
  wrote:
  
   
What do you mean doesn't work at the project level? I
 created a
  different
build target NET35 and then we had Debug and Release still,
  that
  seemed to
work for me. But I feel like I'm missing something in your
  explaination.
Good work though!
 Date: Mon, 9 Jul 2012 17:51:36 -0700
 Subject: Re: Outstanding issues for 3.0.3
 From: currens.ch...@gmail.com
 To: lucene-net-dev@lucene.apache.org

 I've got it working, compiling and all test passing...The
  only
  caveat is
 that I'm not sure the best way to multi-target. It doesn't
   really
  work
on
 a project level, so you'd have to create two separate
  projects,
one
  for
 .NET 4 and the other for 3.5. To aid me, I wrote a small
 tool
that
creates
 copies of all of the 4.0 projects and solutions to work
  against

Re: Outstanding issues for 3.0.3

2012-07-21 Thread Itamar Syn-Hershko
 in the community. I
 would
 love to
see that make it into 3.0.3, and would be able to pick up
 where
 anyone
   had
left off or take part of it, if they don't have time to work
 on
   it.
 In
regards to LUCENENET-446, I agree that it is pretty much
   complete. I
   think
I've looked several times at it to confirm most/all methods
 have
   been
converted, so this week I'll do a final check and close it
 out.
   
   
Thanks,
Christopher
   
On Sun, Jul 8, 2012 at 12:28 PM, Simon Svensson 
   si...@devhost.se
   wrote:
   
 The tests that failed when using culture=sv-se seems fixed.


 On 2012-07-08 20:44, Itamar Syn-Hershko wrote:

 What's the status on the failing tests we had?

 On Sun, Jul 8, 2012 at 9:02 PM, Prescott Nasser 
   geobmx...@hotmail.com
 wrote:

 Three issues left that I see:



 Fixing the build output, I did some work, but I'm good on
   this,
 we
   can
 move the rest of work to 3.6
 https://issues.apache.org/**jira/browse/LUCENENET-456
   https://issues.apache.org/jira/browse/LUCENENET-456



 CLS Compliance
 https://issues.apache.org/**jira/browse/LUCENENET-446
   https://issues.apache.org/jira/browse/LUCENENET-446.
 Are
 we ok with this as for now? There are still a good
 number of
 issues
 where,
 some we can't really fix (sbyte and volatile are out of
 scope
 imo).
   In a
 similiar vein, our own code uses some obsolete methods
 and we
 have a
   lot
 of
 variable declared but never used warnings (mentally, I
 treat
   most
   warning
 as an error)



 GetX/SetX -
 https://issues.apache.org/**jira/browse/LUCENENET-470
   https://issues.apache.org/jira/browse/LUCENENET-470.
 I think
 much of this has been removed, there are probably some
 pieces
 that
   left
 (and we have a difference of opinion in the group as
 well).





 I really think the only outstanding issue is the CLS
   compliance
 one,
   the
 rest can be moved to 3.6. With CLS compliance we have to
 ask
   if
 we've
 done
 enough for that so far, or if more is needed. I
 personally
   would
   like to
 see us make any API changes now, with the 3.0.3 release,
 but
   if
 we
   are
 comfortable with it, lets roll.



 What are your thoughts?



 ~P





 --**--

 From: thowar...@gmail.com
 Date: Mon, 25 Jun 2012 10:34:37 -0700
 Subject: Re: Outstanding issues for 3.0.3
 To: lucene-net-dev@lucene.apache.**org
   lucene-net-dev@lucene.apache.org

 Assuming we're talking about the packaging/filesystem
   structure
 in
   the
 releases, the structure is a little of both (ours vs
 Apache's)...
 Basically, I went through most of the Apache projects to
   see how
   they
 packaged releases and developed a structure that was
 very
 similar
   but
 encompassed everything we needed. So, it's informed by
 the
   organically
 emergent structures that ASF uses.

 -T


 On Mon, Jun 25, 2012 at 7:32 AM, Prescott Nasser 
   geobmx...@hotmail.com
 

 wrote:

 I have no idea why I thought we were using Nant.
 I think it's just our release structure. I figured a
   little
 out
   this

 weekend, splitting the XML and .dll files into separate
   directories. The
 documentation you have on the wiki was actually pretty
   helpful.

 Whatever more you can add would be great

 ~P

 Date: Mon, 25 Jun 2012 10:04:21 -0400
 Subject: Re: Outstanding issues for 3.0.3
 From: mhern...@wickedsoftware.net
 To: lucene-net-dev@lucene.apache.**org
   lucene-net-dev@lucene.apache.org

 On Sat, Jun 23, 2012 at 1:38 AM, Prescott Nasser 

 geobmx...@hotmail.comwrote:


 -- Task 470, a non-serious one, is listed only
 because
   it's

 mostly done

 and

 just need a few loose ends tied up. I'll hopefully
 have
 time to

 take care

 of that this weekend.


 How many GetX/SetX are left? I did a quick search for
 'public *

 Get*()'

 Most

Re: Outstanding issues for 3.0.3

2012-07-08 Thread Itamar Syn-Hershko
What's the status on the failing tests we had?

On Sun, Jul 8, 2012 at 9:02 PM, Prescott Nasser geobmx...@hotmail.comwrote:


 Three issues left that I see:



 Fixing the build output, I did some work, but I'm good on this, we can
 move the rest of work to 3.6
 https://issues.apache.org/jira/browse/LUCENENET-456



 CLS Compliance https://issues.apache.org/jira/browse/LUCENENET-446. Are
 we ok with this as for now? There are still a good number of issues where,
 some we can't really fix (sbyte and volatile are out of scope imo). In a
 similiar vein, our own code uses some obsolete methods and we have a lot of
 variable declared but never used warnings (mentally, I treat most warning
 as an error)



 GetX/SetX - https://issues.apache.org/jira/browse/LUCENENET-470. I think
 much of this has been removed, there are probably some pieces that left
 (and we have a difference of opinion in the group as well).





 I really think the only outstanding issue is the CLS compliance one, the
 rest can be moved to 3.6. With CLS compliance we have to ask if we've done
 enough for that so far, or if more is needed. I personally would like to
 see us make any API changes now, with the 3.0.3 release, but if we are
 comfortable with it, lets roll.



 What are your thoughts?



 ~P





 
  From: thowar...@gmail.com
  Date: Mon, 25 Jun 2012 10:34:37 -0700
  Subject: Re: Outstanding issues for 3.0.3
  To: lucene-net-dev@lucene.apache.org
 
  Assuming we're talking about the packaging/filesystem structure in the
  releases, the structure is a little of both (ours vs Apache's)...
  Basically, I went through most of the Apache projects to see how they
  packaged releases and developed a structure that was very similar but
  encompassed everything we needed. So, it's informed by the organically
  emergent structures that ASF uses.
 
  -T
 
 
  On Mon, Jun 25, 2012 at 7:32 AM, Prescott Nasser geobmx...@hotmail.com
 wrote:
  
   I have no idea why I thought we were using Nant.
   I think it's just our release structure. I figured a little out this
 weekend, splitting the XML and .dll files into separate directories. The
 documentation you have on the wiki was actually pretty helpful.
   Whatever more you can add would be great
  
   ~P
  
   Date: Mon, 25 Jun 2012 10:04:21 -0400
   Subject: Re: Outstanding issues for 3.0.3
   From: mhern...@wickedsoftware.net
   To: lucene-net-dev@lucene.apache.org
  
   On Sat, Jun 23, 2012 at 1:38 AM, Prescott Nasser 
 geobmx...@hotmail.comwrote:
  
   
   
 -- Task 470, a non-serious one, is listed only because it's
 mostly done
and
 just need a few loose ends tied up. I'll hopefully have time to
 take care
 of that this weekend.
   
   
How many GetX/SetX are left? I did a quick search for 'public *
 Get*()'
Most of them looked to be actual methods - perhaps a few to replace
   
   
 -- Task 446 (CLS Compliance), is important, but there's no way we
 can get
 this done quickly. The current state of this issue is that all of
 the
 names of public members are now compliant. There are a few things
 that
 aren't, the use of sbyte (particularly those related to the
 FieldCache)
and
 some conflicts with *protected or internal* fields (some with
 public
 members). Opinions on this one will be appreciated the most. My
 opinion
 is that we should draw a line on the amount of CLS compliance to
 have in
 this release, and push the rest into 3.5.
   
   
   
I count roughly 53 CLS compliant issues. the sbyte stuff will run
 into
trouble when you do bit shifting (I ran into this issue when trying
 to do
this for 2.9.4. I'd like to see if we can't get rid of the easier
 stuff
(internal/protected stuff). I would not try getting rid of sbyte or
volatile for thile release. It's going to take some serious
 consideration
to get rid of those
   
   
 -- Improvement 337 - Are we going to add this code (not present
 in java)
to
 the core library?
   
   
   
I'd skip it and re-evaluate the community desire for this in 3.5.
   
   
 -- Improvement 456 - This is related to builds being output in
 Apache's
 release format. Do we want to do this for this release?

   
   
I looked into this last weekend - I'm terrible with Nant, so I
 didn't get
anywhere. It would be nice to have, but I don't think I'll figure
 it out.
If Michael has some time to maybe make the adjustment, he knows
 these
scripts best. If not I'm going to look into it, but I don't call
 this a
show stopper - either we have it or we don't when the rest is done.
   
  
   With some Flo Rida and expresso shots, anything is possible.
  
   Did we switch to Nant?
  
   I saw the jira ticket for this. Is there an official apache release
   structure or this just our* apache release structure that we are
 using?
   Can I take the latest release and use that to model the structure you
 

Re: [VOTE] Apache Lucene.Net ready for graduation?

2012-07-08 Thread Itamar Syn-Hershko
+1 for graduation

I still think graduation should be in sync with the 3.0.3 release and a
press release on work towards 3.6 and 4.0 releases.

On Sun, Jul 8, 2012 at 8:44 PM, Prescott Nasser geobmx...@hotmail.comwrote:


 Hey All,

 This is the first step for graduation for the Apache Lucene.Net project
 (incubating of course..). We're taking a vote for the Lucene.Net community
 to see if the community is ready to govern itself as a top level project.


 Here is a short list of our accomplishments which I believe make us ready
 for graduation:
 - Released 2.9.4

 - Released 2.9.4g (Generics version)

 - created a new website, with a new logo (a 99designs contest gracious
 supported by stackoverflow)

 - Added two new committers bringing our total to 9.

 - Preparing for 3.0.3 Release within the next couple of weeks

 - Started work on 3.5 release.

 This is the process we will follow:
 - Community vote (this email). All votes count, there is no non-binding /
 binding status for this
 - We will propose a resolution for review (
 https://cwiki.apache.org/confluence/display/LUCENENET/Graduation+-+Resolution+Template
 )
 - We will call a vote on the resolution in general @ incubator
 - A Board resolution will be submitted.





 As a community, if you would please vote:



 [1] Ready for graduation

 [-1] Not ready because...




 I know I speak for all the developers on this project, we appreciate (and
 will continue to appreciate) everyone's contributions via the mailing list
 and jira.




 ~Prescott



Re: svn commit: r1353075 - /incubator/lucene.net/branches/Lucene.Net_3_5/

2012-06-23 Thread Itamar Syn-Hershko
Why 3.5 and not 3.6?

In my opinion we should skip all versions in between 3.0.3 and 3.6, and
just port 3.6 after we released 3.0.3. Lucene 4 will probably be released
by the time we are done, and then we could move on to porting it.

On Sat, Jun 23, 2012 at 9:35 AM, pnas...@apache.org wrote:

 Author: pnasser
 Date: Sat Jun 23 06:35:44 2012
 New Revision: 1353075

 URL: http://svn.apache.org/viewvc?rev=1353075view=rev
 Log:
 Branching for 3.5

 Added:
incubator/lucene.net/branches/Lucene.Net_3_5/   (props changed)
  - copied from r1353074, incubator/lucene.net/trunk/

 Propchange: incubator/lucene.net/branches/Lucene.Net_3_5/

 --
 --- svn:mergeinfo (added)
 +++ svn:mergeinfo Sat Jun 23 06:35:44 2012
 @@ -0,0 +1,2 @@
 +/incubator/lucene.net/branches/Lucene.Net.3.0.3/trunk:1199075-1294851*
 +/incubator/lucene.net/trunk:1199072-1294798*






Re: Endian types

2012-06-20 Thread Itamar Syn-Hershko
To add to this - Lucene 4x is still being worked on in the Java front. We
rather put efforts on porting v3.6 and start on v4 once there is an
official Java release

Thanks for your efforts!

On Wed, Jun 20, 2012 at 6:19 PM, Prescott Nasser geobmx...@hotmail.comwrote:

 How much are you trying to port? I've got it on my roadmap to work with
 sharpen to try and get most of it auto ported. Any porting help is of
 course appreciated and welcome - but if you so have some time and are so
 inclined we could use more people helping on the sharpen front.
 
 From: Oren Eini (Ayende Rahien)
 Sent: 6/20/2012 7:52 AM
 To: lucene-net-...@lucene.apache.org
 Subject: Re: Endian types

 I would assume that you would have to match the java behavior, if only to
 make sure that the index format matched.

 On Wed, Jun 20, 2012 at 5:47 PM, Kim Christensen k...@dubex.dk wrote:

  Hi all,
 
  I was looking into porting some Lucene 4x code, and ran into the issue
  about Big-Endian and Little-Endian.
  What is the standpoint on this? Always Big-Endian as Java does it?
 
  Regards,
  Kim
 



[jira] [Commented] (LUCENENET-495) Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations

2012-06-17 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13393629#comment-13393629
 ] 

Itamar Syn-Hershko commented on LUCENENET-495:
--

1. IMO, if there is a thread safety bug, it needs to be fixed

2. Why do we have AddIfNotContains(Hashtable, object), and we are not using 
ConcurrentDictionary?

 Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime 
 object allocations
 --

 Key: LUCENENET-495
 URL: https://issues.apache.org/jira/browse/LUCENENET-495
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Core
Affects Versions: Lucene.Net 2.9.4, Lucene.Net 3.0.3
Reporter: Christopher Currens
Assignee: Christopher Currens
Priority: Critical
 Fix For: Lucene.Net 3.0.3


 This issue mostly just affects RAMDirectory.  However, RAMFile and 
 RAMOutputStream are used in other (all?) directory implementations, including 
 FSDirectory types.
 In RAMOutputStream, the file last modified property for the RAMFile is 
 updated when the stream is flushed.  It's calculated using 
 {{DateTime.Now.Ticks / TimeSpan.TicksPerMillisecond}}.  I've read before that 
 Microsoft has regretted making DateTime.Now a property instead of a method, 
 and after seeing what it's doing, I'm starting to understand why.  
 DateTime.Now is returning local time.  In order for it to calculate that, it 
 has to get the utf offset for the machine, which requires the creation of a 
 _class_, System.Globalization.DaylightTime.  This is bad for performance.
 Using code to write 10,000 small documents to an index (4kb sizes), it 
 created 1,570,157 of these DaylightTime classes, a total of 62MB of extra 
 memory...clearly RAMOutputStream.Flush() is called a lot.
 A fix I'd like to propose is to change the RAMFile from storing the 
 LastModified date to UTC instead of local.  DateTime.UtcNow doesn't create 
 any additional objects and is very fast.  For this small benchmark, the 
 performance increase is 31%.
 I've set it to convert to local-time, when {{RAMDirectory.LastModified(string 
 name)}} is called to make sure it has the same behavior (tests fail 
 otherwise).  Are there any other side-effects to making this change?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (LUCENENET-495) Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations

2012-06-17 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13393633#comment-13393633
 ] 

Itamar Syn-Hershko commented on LUCENENET-495:
--

Makes sense

 Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime 
 object allocations
 --

 Key: LUCENENET-495
 URL: https://issues.apache.org/jira/browse/LUCENENET-495
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Core
Affects Versions: Lucene.Net 2.9.4, Lucene.Net 3.0.3
Reporter: Christopher Currens
Assignee: Christopher Currens
Priority: Critical
 Fix For: Lucene.Net 3.0.3


 This issue mostly just affects RAMDirectory.  However, RAMFile and 
 RAMOutputStream are used in other (all?) directory implementations, including 
 FSDirectory types.
 In RAMOutputStream, the file last modified property for the RAMFile is 
 updated when the stream is flushed.  It's calculated using 
 {{DateTime.Now.Ticks / TimeSpan.TicksPerMillisecond}}.  I've read before that 
 Microsoft has regretted making DateTime.Now a property instead of a method, 
 and after seeing what it's doing, I'm starting to understand why.  
 DateTime.Now is returning local time.  In order for it to calculate that, it 
 has to get the utf offset for the machine, which requires the creation of a 
 _class_, System.Globalization.DaylightTime.  This is bad for performance.
 Using code to write 10,000 small documents to an index (4kb sizes), it 
 created 1,570,157 of these DaylightTime classes, a total of 62MB of extra 
 memory...clearly RAMOutputStream.Flush() is called a lot.
 A fix I'd like to propose is to change the RAMFile from storing the 
 LastModified date to UTC instead of local.  DateTime.UtcNow doesn't create 
 any additional objects and is very fast.  For this small benchmark, the 
 performance increase is 31%.
 I've set it to convert to local-time, when {{RAMDirectory.LastModified(string 
 name)}} is called to make sure it has the same behavior (tests fail 
 otherwise).  Are there any other side-effects to making this change?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (LUCENENET-495) Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations

2012-06-16 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13393363#comment-13393363
 ] 

Itamar Syn-Hershko commented on LUCENENET-495:
--

+1

Take a look at DateTimeOffset as well - this becomes the standard for .NET 4 +

 Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime 
 object allocations
 --

 Key: LUCENENET-495
 URL: https://issues.apache.org/jira/browse/LUCENENET-495
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Core
Affects Versions: Lucene.Net 2.9.4, Lucene.Net 3.0.3
Reporter: Christopher Currens
Assignee: Christopher Currens
Priority: Critical
 Fix For: Lucene.Net 3.0.3


 This issue mostly just affects RAMDirectory.  However, RAMFile and 
 RAMOutputStream are used in other (all?) directory implementations, including 
 FSDirectory types.
 In RAMOutputStream, the file last modified property for the RAMFile is 
 updated when the stream is flushed.  It's calculated using 
 {{DateTime.Now.Ticks / TimeSpan.TicksPerMillisecond}}.  I've read before that 
 Microsoft has regretted making DateTime.Now a property instead of a method, 
 and after seeing what it's doing, I'm starting to understand why.  
 DateTime.Now is returning local time.  In order for it to calculate that, it 
 has to get the utf offset for the machine, which requires the creation of a 
 _class_, System.Globalization.DaylightTime.  This is bad for performance.
 Using code to write 10,000 small documents to an index (4kb sizes), it 
 created 1,570,157 of these DaylightTime classes, a total of 62MB of extra 
 memory...clearly RAMOutputStream.Flush() is called a lot.
 A fix I'd like to propose is to change the RAMFile from storing the 
 LastModified date to UTC instead of local.  DateTime.UtcNow doesn't create 
 any additional objects and is very fast.  For this small benchmark, the 
 performance increase is 31%.
 I've set it to convert to local-time, when {{RAMDirectory.LastModified(string 
 name)}} is called to make sure it has the same behavior (tests fail 
 otherwise).  Are there any other side-effects to making this change?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Lets talk graduation

2012-06-15 Thread Itamar Syn-Hershko
+1 for releasing after graduation, then

With some careful PR and our sponsorship offer, we can get the project
flying

There's still some work to do anyway

On Fri, Jun 15, 2012 at 1:59 PM, Stefan Bodewig bode...@apache.org wrote:

 On 2012-06-14, Christopher Currens wrote:

  I've gone back and forth on whether I think we're ready for graduation or
  not.  I had always felt like we weren't because the project isn't as
 active
  as I'd like it to be.  However, I think I've been looking at it wrong.
   We've got a good enough process and we *have* made progress.

 Absolutely, and I think you are ready to graduate as well.

 As a response to Itamar: Lucene.Net could get more exposure by becoming
 a top level project.  In particular you could craft a press release
 together with the ASF's PR folks to celebrate the re-birth.

 The sponsoring offer is a great thing, IMHO.

  I'm up for starting this process, but I don't want it to take any time
  away from getting 3.0.3 released.

 Understood.  OTOH if you'd graduate first then 3.0.3 would be an
 official Apache release and didn't have to wear the incubating tag.
 Your call.

 If you want to do the 3.0.3 release first, I don't think that will be
 much of delay as it seems to be around the corner anyway.

 Stefan




Releasing 3.0.3

2012-06-14 Thread Itamar Syn-Hershko
Where do we stand with this?

I want to push to a 3.0.3 release, what items are still pending?

Itamar.


Re: Lets talk graduation

2012-06-14 Thread Itamar Syn-Hershko
IMHO, whatever brings more attention to the project, and I'm not sure
graduation is what this project needs right now. In the end it's just
semantics.

I'd focus those efforts on getting more work done and having more frequent
releases. Hence our proposition to sponsor dev, which still stands.

On Thu, Jun 14, 2012 at 6:24 PM, Prescott Nasser geobmx...@hotmail.comwrote:


 I think with the addition of two new committers we've made some progress
 in community growth. I think we'll have 3.0.3 out the door soon - are there
 any other items we think we need to address before looking to graduate?
 ~P


Re: Releasing 3.0.3

2012-06-14 Thread Itamar Syn-Hershko
Ok, and is the code in 100% compliance with the 3.0.3 Java code?

I'll be spending some time on fixing the index corruption issue, and it is
probably best for Chris to wrap up the work he has started

Anyone else on board to close some tickets?

On Thu, Jun 14, 2012 at 6:19 PM, Prescott Nasser geobmx...@hotmail.comwrote:


 Agreed -
 JIRA for 3.0.3
 https://issues.apache.org/jira/browse/LUCENENET/fixforversion/12316215#selectedTab=com.atlassian.jira.plugin.system.project%3Aversion-issues-panel
 We should evaluate all of these - fix them, mark as won't fix, or move
 them to another release version. I think the biggest hold up currently is:
 https://issues.apache.org/jira/browse/LUCENENET-484. Chris has made a
 huge dent, but there are two test cases that are still listed as failing (I
 can't even duplicate those failures to know where to start)
 Also we should look at all the other jira tickets and make updates where
 appropriate
 ~P
  Date: Thu, 14 Jun 2012 13:21:04 +0300
  Subject: Releasing 3.0.3
  From: ita...@code972.com
  To: lucene-net-dev@lucene.apache.org
 
  Where do we stand with this?
 
  I want to push to a 3.0.3 release, what items are still pending?
 
  Itamar.




Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko
I'm quite certain this shouldn't happen also when Commit wasn't called.

Mike, can you comment on that?

On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens 
currens.ch...@gmail.com wrote:

 Well, the only thing I see is that there is no place where writer.Commit()
 is called in the delegate assigned to corpusReader.OnDocument.  I know that
 lucene is very transactional, and at least in 3.x, the writer will never
 auto commit to the index.  You can write millions of documents, but if
 commit is never called, those documents aren't actually part of the index.
  Committing isn't a cheap operation, so you definitely don't want to do it
 on every document.

 You can test it yourself with this (naive) solution.  Right below the
 writer.SetUseCompoundFile(false) line, add int numDocsAdded = 0;.  At the
 end of the corpusReader.OnDocument delegate add:

 // Example only.  I wouldn't suggest committing this often
 if(++numDocsAdded % 5 == 0)
 {
writer.Commit();
 }

 I had the application crash for real on this file:

 http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2
 ,
 about 20% into the operation.  Without the commit, the index is empty.  Add
 it in, and I get 755 files in the index after it crashes.


 Thanks,
 Christopher

 On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko ita...@code972.com
 wrote:

  Yes, reproduced in first try. See attached program - I referenced it to
  current trunk.
 
 
  On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:
 
  Christopher,
 
  I used the IndexBuilder app from here
  https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a
  8.5GB wikipedia dump.
 
  After running for 2.5 days I had to forcefully close it (infinite loop
 in
  the wiki-markdown parser at 92%, go figure), and the 40-something GB
 index
  I had by then was unusable. I then was able to reproduce this
 
  Please note I now added a few safe-guards you might want to remove to
  make sure the app really crashes on process kill.
 
  I'll try to come up with a better way to reproduce this - hopefully Mike
  will be able to suggest better ways than manual process kill...
 
  On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens 
  currens.ch...@gmail.com wrote:
 
  Mike, The codebase for lucene.net should be almost identical to java's
  3.0.3 release, and LUCENE-1044 is included in that.
 
  Itamar, are you committing the index regularly?  I only ask because I
  can't
  reproduce it myself by forcibly terminating the process while it's
  indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at all
 and
  terminate the process (even with a 10,000 4K documents created), there
  will
  be no documents in the index when I open it in luke, which I expect.
  If
  I
  commit at 10,000 documents, and terminate it a few thousand after that,
  the
  index has the first ten thousand that were committed.  I've even
  terminated
  it *while* a second commit was taking place, and it still had all of
 the
  documents I expected.
 
  It may be that I'm not trying to reproducing it correctly.  Do you
 have a
  minimal amount of code that can reproduce it?
 
 
  Thanks,
  Christopher
 
  On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless 
  luc...@mikemccandless.com wrote:
 
   Hi Itamar,
  
   One quick question: does Lucene.Net include the fixes done for
   LUCENE-1044 (to fsync files on commit)?  Those are very important for
   an index to be intact after OS/JVM crash or power loss.
  
   More responses below:
  
   On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko 
  ita...@code972.com
   wrote:
  
I'm a Lucene.Net committer, and there is a chance we have a bug in
  our
FSDirectory implementation that causes indexes to get corrupted
 when
indexing is cut while the IW is still open. As it roots from some
retroactive fixes you made, I'd appreciate your feedback.
   
Correct me if I'm wrong, but by design Lucene should be able to
  recover
rather quickly from power failures or app crashes. Since existing
  segment
files are read only, only new segments that are still being written
  can
   get
corrupted. Hence, recovering from worst-case scenarios is done by
  simply
removing the write.lock file. The worst that could happen then is
  having
   the
last segment damaged, and that can be fixed by removing those
 files,
possibly by running CheckIndex on the index.
  
   You shouldn't even have to run CheckIndex ... because (as of
   LUCENE-1044) we now fsync all segment files before writing the new
   segments_N file, and then removing old segments_N files (and any
   segments that are no longer referenced).
  
   You do have to remove the write.lock if you aren't using
   NativeFSLockFactory (but this has been the default lock impl for a
   while now).
  
Last week I have been playing with rather large indexes and crashed
  my
   app
while it was indexing. I wasn't able

Re: Releasing 3.0.3

2012-06-14 Thread Itamar Syn-Hershko
Sorry, misread your question

This can be easily done with xUnit, using Theories.

On Thu, Jun 14, 2012 at 9:26 PM, Itamar Syn-Hershko ita...@code972.comwrote:

 Something like:

 Thread.CurrentThread.CurrentCulture = cultureInfo;
 Thread.CurrentThread.CurrentUICulture = cultureInfo;

 And setting it back later when the test is done.

 You can easily do this with an IDisposable like this:

 using(new TemporaryCulture(culture)){
 ...
 }

 On Thu, Jun 14, 2012 at 9:10 PM, Simon Svensson si...@devhost.se wrote:

 I've been thinking about LUCENENET-493 (Make Lucene.Net culture
 insensitive). It's easy to fix the code, and verify it on my machine
 (running CurrentCulture=sv-SE), but there are no tests to confirm the
 changes. I've been looking for ways to build test cases for different
 cultures, like the overridden runBare method used originally in the java
 code, but NUnit does not seem to have any such abilities within the tests
 themselves.

 1) It is possible to build NUnit addins that could execute every test
 [with special annotation?] once for every culture. Resharper supports NUnit
 addins, provided they are manually placed in the correct folder under the
 resharper application folder.
 2) We could rewrite culture sensitive tests into method that holds the
 logic, and several test methods with [SetCulture(...)], but this requires
 knowledge about what tests are culture sensitive. We could also rewrite
 every method into a foreach-loop, executing the test logic with every
 culture.
 3) Change unit testing framework.

 Any thoughts?


 On 2012-06-14 17:58, Prescott Nasser wrote:

 I'm going to try and review some of them - looking at the 3.5 ticket
 atm. The code should be in compliance with 3.0.3. We might want to do some
 spot checking various parts of the code. I'm not sure about the tests.
 Also, we should probably run some code coverage tools to see how much
 coverage we have.
 ~P

 Date: Thu, 14 Jun 2012 18:37:12 +0300
 Subject: Re: Releasing 3.0.3
 From: ita...@code972.com
 To: lucene-net-dev@lucene.apache.**orglucene-net-dev@lucene.apache.org

 Ok, and is the code in 100% compliance with the 3.0.3 Java code?

 I'll be spending some time on fixing the index corruption issue, and it
 is
 probably best for Chris to wrap up the work he has started

 Anyone else on board to close some tickets?

 On Thu, Jun 14, 2012 at 6:19 PM, Prescott Nassergeobmx...@hotmail.com
 **wrote:

  Agreed -
 JIRA for 3.0.3
 https://issues.apache.org/**jira/browse/LUCENENET/**
 fixforversion/12316215#**selectedTab=com.atlassian.**
 jira.plugin.system.project%**3Aversion-issues-panelhttps://issues.apache.org/jira/browse/LUCENENET/fixforversion/12316215#selectedTab=com.atlassian.jira.plugin.system.project%3Aversion-issues-panel
 We should evaluate all of these - fix them, mark as won't fix, or move
 them to another release version. I think the biggest hold up currently
 is:
 https://issues.apache.org/**jira/browse/LUCENENET-484https://issues.apache.org/jira/browse/LUCENENET-484.
 Chris has made a
 huge dent, but there are two test cases that are still listed as
 failing (I
 can't even duplicate those failures to know where to start)
 Also we should look at all the other jira tickets and make updates
 where
 appropriate
 ~P

 Date: Thu, 14 Jun 2012 13:21:04 +0300
 Subject: Releasing 3.0.3
 From: ita...@code972.com
 To: lucene-net-dev@lucene.apache.**orglucene-net-dev@lucene.apache.org

 Where do we stand with this?

 I want to push to a 3.0.3 release, what items are still pending?

 Itamar.









Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko
I can confirm 2.9.4 had autoCommit, but it is gone in 3.0.3 already, so
Lucene.Net doesn't have autoCommit.

So I don't have autoCommit set to true, but I can clearly see a segments_1
file there along with the other files. If that helpes, it always keeps with
the name segments_1 with 32 bytes, never changes.

And as again, if I kill the process and try to open the index with Luke
3.3, the index folder is being wiped out.

Not sure what to make of all that.

On Fri, Jun 15, 2012 at 3:21 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Hmm, OK: in 2.9.4 / 3.0.x, if you open IW on a new directory, it will
 make a zero-segment commit.  This was changed/fixed in 3.1 with
 LUCENE-2386.

 In 2.9.x (not 3.0.x) there is still an autoCommit parameter,
 defaulting to false, but if you set it to true then IndexWriter will
 periodically commit.

 Seeing segment files created and merge is definitely expected, but
 it's not expected to see segments_N files unless you pass
 autoCommit=true.

 Mike McCandless

 http://blog.mikemccandless.com

 On Thu, Jun 14, 2012 at 8:14 PM, Itamar Syn-Hershko ita...@code972.com
 wrote:
  Not what I'm seeing. I actually see a lot of segments created and merged
  while it operates. Expected?
 
  Reminding you, this is 2.9.4 / 3.0.3
 
  On Fri, Jun 15, 2012 at 3:10 AM, Michael McCandless
  luc...@mikemccandless.com wrote:
 
  Right: Lucene never autocommits anymore ...
 
  If you create a new index, add a bunch of docs, and things crash
  before you have a chance to commit, then there is no index (not even a
  0 doc one) in that directory.
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
  On Thu, Jun 14, 2012 at 1:41 PM, Itamar Syn-Hershko ita...@code972.com
 
  wrote:
   I'm quite certain this shouldn't happen also when Commit wasn't
 called.
  
   Mike, can you comment on that?
  
   On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
   currens.ch...@gmail.com wrote:
  
   Well, the only thing I see is that there is no place where
   writer.Commit()
   is called in the delegate assigned to corpusReader.OnDocument.  I
 know
   that
   lucene is very transactional, and at least in 3.x, the writer will
   never
   auto commit to the index.  You can write millions of documents, but
 if
   commit is never called, those documents aren't actually part of the
   index.
Committing isn't a cheap operation, so you definitely don't want to
 do
   it
   on every document.
  
   You can test it yourself with this (naive) solution.  Right below the
   writer.SetUseCompoundFile(false) line, add int numDocsAdded = 0;.
  At
   the
   end of the corpusReader.OnDocument delegate add:
  
   // Example only.  I wouldn't suggest committing this often
   if(++numDocsAdded % 5 == 0)
   {
  writer.Commit();
   }
  
   I had the application crash for real on this file:
  
  
  
 http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2
 ,
   about 20% into the operation.  Without the commit, the index is
 empty.
Add
   it in, and I get 755 files in the index after it crashes.
  
  
   Thanks,
   Christopher
  
   On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko
   ita...@code972.comwrote:
  
  
Yes, reproduced in first try. See attached program - I referenced
 it
to
current trunk.
   
   
On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko
ita...@code972.comwrote:
   
Christopher,
   
I used the IndexBuilder app from here
https://github.com/synhershko/Talks/tree/master/LuceneNeatThings
with a
8.5GB wikipedia dump.
   
After running for 2.5 days I had to forcefully close it (infinite
loop
in
the wiki-markdown parser at 92%, go figure), and the 40-something
 GB
index
I had by then was unusable. I then was able to reproduce this
   
Please note I now added a few safe-guards you might want to remove
to
make sure the app really crashes on process kill.
   
I'll try to come up with a better way to reproduce this -
 hopefully
Mike
will be able to suggest better ways than manual process kill...
   
On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens 
currens.ch...@gmail.com wrote:
   
Mike, The codebase for lucene.net should be almost identical to
java's
3.0.3 release, and LUCENE-1044 is included in that.
   
Itamar, are you committing the index regularly?  I only ask
 because
I
can't
reproduce it myself by forcibly terminating the process while
 it's
indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at
all
and
terminate the process (even with a 10,000 4K documents created),
there
will
be no documents in the index when I open it in luke, which I
expect.
 If
I
commit at 10,000 documents, and terminate it a few thousand after
that,
the
index has the first ten thousand that were committed.  I've even
terminated
it *while* a second commit was taking

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko
I'm quite certain this shouldn't happen also when Commit wasn't called.

Mike, can you comment on that?

On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens 
currens.ch...@gmail.com wrote:

 Well, the only thing I see is that there is no place where writer.Commit()
 is called in the delegate assigned to corpusReader.OnDocument.  I know that
 lucene is very transactional, and at least in 3.x, the writer will never
 auto commit to the index.  You can write millions of documents, but if
 commit is never called, those documents aren't actually part of the index.
  Committing isn't a cheap operation, so you definitely don't want to do it
 on every document.

 You can test it yourself with this (naive) solution.  Right below the
 writer.SetUseCompoundFile(false) line, add int numDocsAdded = 0;.  At the
 end of the corpusReader.OnDocument delegate add:

 // Example only.  I wouldn't suggest committing this often
 if(++numDocsAdded % 5 == 0)
 {
writer.Commit();
 }

 I had the application crash for real on this file:

 http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2
 ,
 about 20% into the operation.  Without the commit, the index is empty.  Add
 it in, and I get 755 files in the index after it crashes.


 Thanks,
 Christopher

 On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko ita...@code972.com
 wrote:

  Yes, reproduced in first try. See attached program - I referenced it to
  current trunk.
 
 
  On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:
 
  Christopher,
 
  I used the IndexBuilder app from here
  https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a
  8.5GB wikipedia dump.
 
  After running for 2.5 days I had to forcefully close it (infinite loop
 in
  the wiki-markdown parser at 92%, go figure), and the 40-something GB
 index
  I had by then was unusable. I then was able to reproduce this
 
  Please note I now added a few safe-guards you might want to remove to
  make sure the app really crashes on process kill.
 
  I'll try to come up with a better way to reproduce this - hopefully Mike
  will be able to suggest better ways than manual process kill...
 
  On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens 
  currens.ch...@gmail.com wrote:
 
  Mike, The codebase for lucene.net should be almost identical to java's
  3.0.3 release, and LUCENE-1044 is included in that.
 
  Itamar, are you committing the index regularly?  I only ask because I
  can't
  reproduce it myself by forcibly terminating the process while it's
  indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at all
 and
  terminate the process (even with a 10,000 4K documents created), there
  will
  be no documents in the index when I open it in luke, which I expect.
  If
  I
  commit at 10,000 documents, and terminate it a few thousand after that,
  the
  index has the first ten thousand that were committed.  I've even
  terminated
  it *while* a second commit was taking place, and it still had all of
 the
  documents I expected.
 
  It may be that I'm not trying to reproducing it correctly.  Do you
 have a
  minimal amount of code that can reproduce it?
 
 
  Thanks,
  Christopher
 
  On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless 
  luc...@mikemccandless.com wrote:
 
   Hi Itamar,
  
   One quick question: does Lucene.Net include the fixes done for
   LUCENE-1044 (to fsync files on commit)?  Those are very important for
   an index to be intact after OS/JVM crash or power loss.
  
   More responses below:
  
   On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko 
  ita...@code972.com
   wrote:
  
I'm a Lucene.Net committer, and there is a chance we have a bug in
  our
FSDirectory implementation that causes indexes to get corrupted
 when
indexing is cut while the IW is still open. As it roots from some
retroactive fixes you made, I'd appreciate your feedback.
   
Correct me if I'm wrong, but by design Lucene should be able to
  recover
rather quickly from power failures or app crashes. Since existing
  segment
files are read only, only new segments that are still being written
  can
   get
corrupted. Hence, recovering from worst-case scenarios is done by
  simply
removing the write.lock file. The worst that could happen then is
  having
   the
last segment damaged, and that can be fixed by removing those
 files,
possibly by running CheckIndex on the index.
  
   You shouldn't even have to run CheckIndex ... because (as of
   LUCENE-1044) we now fsync all segment files before writing the new
   segments_N file, and then removing old segments_N files (and any
   segments that are no longer referenced).
  
   You do have to remove the write.lock if you aren't using
   NativeFSLockFactory (but this has been the default lock impl for a
   while now).
  
Last week I have been playing with rather large indexes and crashed
  my
   app
while it was indexing. I wasn't able

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko
Not what I'm seeing. I actually see a lot of segments created and merged
while it operates. Expected?

Reminding you, this is 2.9.4 / 3.0.3

On Fri, Jun 15, 2012 at 3:10 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Right: Lucene never autocommits anymore ...

 If you create a new index, add a bunch of docs, and things crash
 before you have a chance to commit, then there is no index (not even a
 0 doc one) in that directory.

 Mike McCandless

 http://blog.mikemccandless.com

 On Thu, Jun 14, 2012 at 1:41 PM, Itamar Syn-Hershko ita...@code972.com
 wrote:
  I'm quite certain this shouldn't happen also when Commit wasn't called.
 
  Mike, can you comment on that?
 
  On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
  currens.ch...@gmail.com wrote:
 
  Well, the only thing I see is that there is no place where
 writer.Commit()
  is called in the delegate assigned to corpusReader.OnDocument.  I know
  that
  lucene is very transactional, and at least in 3.x, the writer will never
  auto commit to the index.  You can write millions of documents, but if
  commit is never called, those documents aren't actually part of the
 index.
   Committing isn't a cheap operation, so you definitely don't want to do
 it
  on every document.
 
  You can test it yourself with this (naive) solution.  Right below the
  writer.SetUseCompoundFile(false) line, add int numDocsAdded = 0;.  At
  the
  end of the corpusReader.OnDocument delegate add:
 
  // Example only.  I wouldn't suggest committing this often
  if(++numDocsAdded % 5 == 0)
  {
 writer.Commit();
  }
 
  I had the application crash for real on this file:
 
 
 http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2
 ,
  about 20% into the operation.  Without the commit, the index is empty.
   Add
  it in, and I get 755 files in the index after it crashes.
 
 
  Thanks,
  Christopher
 
  On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko
  ita...@code972.comwrote:
 
 
   Yes, reproduced in first try. See attached program - I referenced it
 to
   current trunk.
  
  
   On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko
   ita...@code972.comwrote:
  
   Christopher,
  
   I used the IndexBuilder app from here
   https://github.com/synhershko/Talks/tree/master/LuceneNeatThingswith a
   8.5GB wikipedia dump.
  
   After running for 2.5 days I had to forcefully close it (infinite
 loop
   in
   the wiki-markdown parser at 92%, go figure), and the 40-something GB
   index
   I had by then was unusable. I then was able to reproduce this
  
   Please note I now added a few safe-guards you might want to remove to
   make sure the app really crashes on process kill.
  
   I'll try to come up with a better way to reproduce this - hopefully
   Mike
   will be able to suggest better ways than manual process kill...
  
   On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens 
   currens.ch...@gmail.com wrote:
  
   Mike, The codebase for lucene.net should be almost identical to
 java's
   3.0.3 release, and LUCENE-1044 is included in that.
  
   Itamar, are you committing the index regularly?  I only ask because
 I
   can't
   reproduce it myself by forcibly terminating the process while it's
   indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at
 all
   and
   terminate the process (even with a 10,000 4K documents created),
 there
   will
   be no documents in the index when I open it in luke, which I expect.
If
   I
   commit at 10,000 documents, and terminate it a few thousand after
   that,
   the
   index has the first ten thousand that were committed.  I've even
   terminated
   it *while* a second commit was taking place, and it still had all of
   the
   documents I expected.
  
   It may be that I'm not trying to reproducing it correctly.  Do you
   have a
   minimal amount of code that can reproduce it?
  
  
   Thanks,
   Christopher
  
   On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless 
   luc...@mikemccandless.com wrote:
  
Hi Itamar,
   
One quick question: does Lucene.Net include the fixes done for
LUCENE-1044 (to fsync files on commit)?  Those are very important
for
an index to be intact after OS/JVM crash or power loss.
   
More responses below:
   
On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko 
   ita...@code972.com
wrote:
   
 I'm a Lucene.Net committer, and there is a chance we have a bug
 in
   our
 FSDirectory implementation that causes indexes to get corrupted
 when
 indexing is cut while the IW is still open. As it roots from
 some
 retroactive fixes you made, I'd appreciate your feedback.

 Correct me if I'm wrong, but by design Lucene should be able to
   recover
 rather quickly from power failures or app crashes. Since
 existing
   segment
 files are read only, only new segments that are still being
 written
   can
get
 corrupted. Hence, recovering from worst-case scenarios is done

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko
I can confirm 2.9.4 had autoCommit, but it is gone in 3.0.3 already, so
Lucene.Net doesn't have autoCommit.

So I don't have autoCommit set to true, but I can clearly see a segments_1
file there along with the other files. If that helpes, it always keeps with
the name segments_1 with 32 bytes, never changes.

And as again, if I kill the process and try to open the index with Luke
3.3, the index folder is being wiped out.

Not sure what to make of all that.

On Fri, Jun 15, 2012 at 3:21 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Hmm, OK: in 2.9.4 / 3.0.x, if you open IW on a new directory, it will
 make a zero-segment commit.  This was changed/fixed in 3.1 with
 LUCENE-2386.

 In 2.9.x (not 3.0.x) there is still an autoCommit parameter,
 defaulting to false, but if you set it to true then IndexWriter will
 periodically commit.

 Seeing segment files created and merge is definitely expected, but
 it's not expected to see segments_N files unless you pass
 autoCommit=true.

 Mike McCandless

 http://blog.mikemccandless.com

 On Thu, Jun 14, 2012 at 8:14 PM, Itamar Syn-Hershko ita...@code972.com
 wrote:
  Not what I'm seeing. I actually see a lot of segments created and merged
  while it operates. Expected?
 
  Reminding you, this is 2.9.4 / 3.0.3
 
  On Fri, Jun 15, 2012 at 3:10 AM, Michael McCandless
  luc...@mikemccandless.com wrote:
 
  Right: Lucene never autocommits anymore ...
 
  If you create a new index, add a bunch of docs, and things crash
  before you have a chance to commit, then there is no index (not even a
  0 doc one) in that directory.
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
  On Thu, Jun 14, 2012 at 1:41 PM, Itamar Syn-Hershko ita...@code972.com
 
  wrote:
   I'm quite certain this shouldn't happen also when Commit wasn't
 called.
  
   Mike, can you comment on that?
  
   On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
   currens.ch...@gmail.com wrote:
  
   Well, the only thing I see is that there is no place where
   writer.Commit()
   is called in the delegate assigned to corpusReader.OnDocument.  I
 know
   that
   lucene is very transactional, and at least in 3.x, the writer will
   never
   auto commit to the index.  You can write millions of documents, but
 if
   commit is never called, those documents aren't actually part of the
   index.
Committing isn't a cheap operation, so you definitely don't want to
 do
   it
   on every document.
  
   You can test it yourself with this (naive) solution.  Right below the
   writer.SetUseCompoundFile(false) line, add int numDocsAdded = 0;.
  At
   the
   end of the corpusReader.OnDocument delegate add:
  
   // Example only.  I wouldn't suggest committing this often
   if(++numDocsAdded % 5 == 0)
   {
  writer.Commit();
   }
  
   I had the application crash for real on this file:
  
  
  
 http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2
 ,
   about 20% into the operation.  Without the commit, the index is
 empty.
Add
   it in, and I get 755 files in the index after it crashes.
  
  
   Thanks,
   Christopher
  
   On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko
   ita...@code972.comwrote:
  
  
Yes, reproduced in first try. See attached program - I referenced
 it
to
current trunk.
   
   
On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko
ita...@code972.comwrote:
   
Christopher,
   
I used the IndexBuilder app from here
https://github.com/synhershko/Talks/tree/master/LuceneNeatThings
with a
8.5GB wikipedia dump.
   
After running for 2.5 days I had to forcefully close it (infinite
loop
in
the wiki-markdown parser at 92%, go figure), and the 40-something
 GB
index
I had by then was unusable. I then was able to reproduce this
   
Please note I now added a few safe-guards you might want to remove
to
make sure the app really crashes on process kill.
   
I'll try to come up with a better way to reproduce this -
 hopefully
Mike
will be able to suggest better ways than manual process kill...
   
On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens 
currens.ch...@gmail.com wrote:
   
Mike, The codebase for lucene.net should be almost identical to
java's
3.0.3 release, and LUCENE-1044 is included in that.
   
Itamar, are you committing the index regularly?  I only ask
 because
I
can't
reproduce it myself by forcibly terminating the process while
 it's
indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at
all
and
terminate the process (even with a 10,000 4K documents created),
there
will
be no documents in the index when I open it in luke, which I
expect.
 If
I
commit at 10,000 documents, and terminate it a few thousand after
that,
the
index has the first ten thousand that were committed.  I've even
terminated
it *while* a second commit was taking

Re: Corrupt index

2012-06-13 Thread Itamar Syn-Hershko
Christopher,

I used the IndexBuilder app from here
https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a
8.5GB wikipedia dump.

After running for 2.5 days I had to forcefully close it (infinite loop in
the wiki-markdown parser at 92%, go figure), and the 40-something GB index
I had by then was unusable. I then was able to reproduce this

Please note I now added a few safe-guards you might want to remove to make
sure the app really crashes on process kill.

I'll try to come up with a better way to reproduce this - hopefully Mike
will be able to suggest better ways than manual process kill...

On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens 
currens.ch...@gmail.com wrote:

 Mike, The codebase for lucene.net should be almost identical to java's
 3.0.3 release, and LUCENE-1044 is included in that.

 Itamar, are you committing the index regularly?  I only ask because I can't
 reproduce it myself by forcibly terminating the process while it's
 indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at all and
 terminate the process (even with a 10,000 4K documents created), there will
 be no documents in the index when I open it in luke, which I expect.  If I
 commit at 10,000 documents, and terminate it a few thousand after that, the
 index has the first ten thousand that were committed.  I've even terminated
 it *while* a second commit was taking place, and it still had all of the
 documents I expected.

 It may be that I'm not trying to reproducing it correctly.  Do you have a
 minimal amount of code that can reproduce it?


 Thanks,
 Christopher

 On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless 
 luc...@mikemccandless.com wrote:

  Hi Itamar,
 
  One quick question: does Lucene.Net include the fixes done for
  LUCENE-1044 (to fsync files on commit)?  Those are very important for
  an index to be intact after OS/JVM crash or power loss.
 
  More responses below:
 
  On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko ita...@code972.com
  wrote:
 
   I'm a Lucene.Net committer, and there is a chance we have a bug in our
   FSDirectory implementation that causes indexes to get corrupted when
   indexing is cut while the IW is still open. As it roots from some
   retroactive fixes you made, I'd appreciate your feedback.
  
   Correct me if I'm wrong, but by design Lucene should be able to recover
   rather quickly from power failures or app crashes. Since existing
 segment
   files are read only, only new segments that are still being written can
  get
   corrupted. Hence, recovering from worst-case scenarios is done by
 simply
   removing the write.lock file. The worst that could happen then is
 having
  the
   last segment damaged, and that can be fixed by removing those files,
   possibly by running CheckIndex on the index.
 
  You shouldn't even have to run CheckIndex ... because (as of
  LUCENE-1044) we now fsync all segment files before writing the new
  segments_N file, and then removing old segments_N files (and any
  segments that are no longer referenced).
 
  You do have to remove the write.lock if you aren't using
  NativeFSLockFactory (but this has been the default lock impl for a
  while now).
 
   Last week I have been playing with rather large indexes and crashed my
  app
   while it was indexing. I wasn't able to open the index, and Luke was
 even
   kind enough to wipe the index folder clean even though I opened it in
   read-only mode. I re-ran this, and after another crash running
 CheckIndex
   revealed nothing - the index was detected to be an empty one. I am not
   entirely sure what could be the cause for this, but I suspect it has
   been corrupted by the crash.
 
  Had no commit completed (no segments file written)?
 
  If you don't fsync then all sorts of crazy things are possible...
 
   I've been looking at these:
  
  
 
 https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  
 
 https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 
  (And LUCENE-1044 before that ... it was LUCENE-1044 that LUCENE-2328
  broke...).
 
   And it seems like this is what I was experiencing. Mike and Mark will
   probably be able to tell if this is what they saw or not, but as far
 as I
   can tell this is not an expected behavior of a Lucene index.
 
  Definitely not expected behavior: assuming nothing is flipping bits,
  then on OS/JVM crash or power loss your index should be fine, just
  reverted to the last successful commit.
 
   What I'm looking for at the moment is some advice on what FSDirectory
   implementation to use to make sure no corruption can happen. The 3.4
  version
   (which is where LUCENE-3418 was committed to) seems to handle a lot of
   things the 3.0 doesn't, but on the other hand LUCENE-3418 was
 introduced
  by
   changes made to the 3.0 codebase.
 
  Hopefully it's just that you are missing fsync!
 
   Also

Corrupt index

2012-06-13 Thread Itamar Syn-Hershko
Hi Java devs,

I'm a Lucene.Net committer, and there is a chance we have a bug in our
FSDirectory implementation that causes indexes to get corrupted when
indexing is cut while the IW is still open. As it roots from some
retroactive fixes you made, I'd appreciate your feedback.

Correct me if I'm wrong, but by design Lucene should be able to recover
rather quickly from power failures or app crashes. Since existing segment
files are read only, only new segments that are still being written can get
corrupted. Hence, recovering from worst-case scenarios is done by simply
removing the write.lock file. The worst that could happen then is having
the last segment damaged, and that can be fixed by removing those files,
possibly by running CheckIndex on the index.

Last week I have been playing with rather large indexes and crashed my app
while it was indexing. I wasn't able to open the index, and Luke was even
kind enough to wipe the index folder clean even though I opened it in
read-only mode. I re-ran this, and after another crash running CheckIndex
revealed nothing - the index was detected to be an empty one. I am not
entirely sure what could be the cause for this, but I suspect it has
been corrupted by the crash.

I've been looking at these:

https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

And it seems like this is what I was experiencing. Mike and Mark will
probably be able to tell if this is what they saw or not, but as far as I
can tell this is not an expected behavior of a Lucene index.

What I'm looking for at the moment is some advice on what FSDirectory
implementation to use to make sure no corruption can happen. The 3.4
version (which is where LUCENE-3418 was committed to) seems to handle a lot
of things the 3.0 doesn't, but on the other hand LUCENE-3418 was introduced
by changes made to the 3.0 codebase.

Also, is there any test in the suite checking for those scenarios?

Will appreciate any help on this,

Itamar.


Re: Corrupt index

2012-06-13 Thread Itamar Syn-Hershko
Mike,

On Wed, Jun 13, 2012 at 7:31 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Hi Itamar,

 One quick question: does Lucene.Net include the fixes done for
 LUCENE-1044 (to fsync files on commit)?  Those are very important for
 an index to be intact after OS/JVM crash or power loss.


Definitely, as Christopher noted we are about to release a 3.0.3 compatible
version, which is line-by-line port of the Java version.


 You shouldn't even have to run CheckIndex ... because (as of
 LUCENE-1044) we now fsync all segment files before writing the new
 segments_N file, and then removing old segments_N files (and any
 segments that are no longer referenced).

 You do have to remove the write.lock if you aren't using
 NativeFSLockFactory (but this has been the default lock impl for a
 while now).


Somewhat unrelated to this thread, but what should I expect to see? from
time to time we do see write.lock present after an app-crash or power
failure. Also, what are the steps that are expected to be performed in such
cases?



  Last week I have been playing with rather large indexes and crashed my
 app
  while it was indexing. I wasn't able to open the index, and Luke was even
  kind enough to wipe the index folder clean even though I opened it in
  read-only mode. I re-ran this, and after another crash running CheckIndex
  revealed nothing - the index was detected to be an empty one. I am not
  entirely sure what could be the cause for this, but I suspect it has
  been corrupted by the crash.

 Had no commit completed (no segments file written)?

 If you don't fsync then all sorts of crazy things are possible...


Ok, so we do have fsync since LUCENE-1044 is present, and there were
segments present from previous commits. Any idea what went wrong?


  I've been looking at these:
 
 
 https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 
 https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

 (And LUCENE-1044 before that ... it was LUCENE-1044 that LUCENE-2328broke...).


So 2328 broke 1044, and this was fixed only in 3.4, right? so 2328 made it
to a 3.0.x release while the fix for it (3418) was only released in 3.4. Am
I right?

If this is the case, 2328 probably made it's way to Lucene.Net since we are
using the released sources for porting, and we now need to apply 3418 in
the current version.

Does it make sense to just port FSDirectory from 3.4 to 3.0.3? or were
there API or other changes that will make our life miserable if we do that?



  And it seems like this is what I was experiencing. Mike and Mark will
  probably be able to tell if this is what they saw or not, but as far as I
  can tell this is not an expected behavior of a Lucene index.

 Definitely not expected behavior: assuming nothing is flipping bits,
 then on OS/JVM crash or power loss your index should be fine, just
 reverted to the last successful commit.


What I suspected. Will try to reproduce reliably - any recommendations? not
really feeling like reinventing the wheel here...

MockDirectoryWrapper wasn't ported yet as it appears to only appear in 3.4,
and as you said it won't really help here anyway



  What I'm looking for at the moment is some advice on what FSDirectory
  implementation to use to make sure no corruption can happen. The 3.4
 version
  (which is where LUCENE-3418 was committed to) seems to handle a lot of
  things the 3.0 doesn't, but on the other hand LUCENE-3418 was
 introduced by
  changes made to the 3.0 codebase.

 Hopefully it's just that you are missing fsync!

  Also, is there any test in the suite checking for those scenarios?

 Our test framework has a sneaky MockDirectoryWrapper that, after a
 test finishes, goes and corrupts any unsync'd files and then verifies
 the index is still OK... it's good because it'll catch any times we
 are missing calls t sync, but, it's not low level enough such that if
 FSDir is failing to actually call fsync (that wsa the bug in
 LUCENE-3418) then it won't catch that...

 Mike McCandless

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





Re: Corrupt index

2012-06-13 Thread Itamar Syn-Hershko
Christopher,

I used the IndexBuilder app from here
https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a
8.5GB wikipedia dump.

After running for 2.5 days I had to forcefully close it (infinite loop in
the wiki-markdown parser at 92%, go figure), and the 40-something GB index
I had by then was unusable. I then was able to reproduce this

Please note I now added a few safe-guards you might want to remove to make
sure the app really crashes on process kill.

I'll try to come up with a better way to reproduce this - hopefully Mike
will be able to suggest better ways than manual process kill...

On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens 
currens.ch...@gmail.com wrote:

 Mike, The codebase for lucene.net should be almost identical to java's
 3.0.3 release, and LUCENE-1044 is included in that.

 Itamar, are you committing the index regularly?  I only ask because I can't
 reproduce it myself by forcibly terminating the process while it's
 indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at all and
 terminate the process (even with a 10,000 4K documents created), there will
 be no documents in the index when I open it in luke, which I expect.  If I
 commit at 10,000 documents, and terminate it a few thousand after that, the
 index has the first ten thousand that were committed.  I've even terminated
 it *while* a second commit was taking place, and it still had all of the
 documents I expected.

 It may be that I'm not trying to reproducing it correctly.  Do you have a
 minimal amount of code that can reproduce it?


 Thanks,
 Christopher

 On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless 
 luc...@mikemccandless.com wrote:

  Hi Itamar,
 
  One quick question: does Lucene.Net include the fixes done for
  LUCENE-1044 (to fsync files on commit)?  Those are very important for
  an index to be intact after OS/JVM crash or power loss.
 
  More responses below:
 
  On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko ita...@code972.com
  wrote:
 
   I'm a Lucene.Net committer, and there is a chance we have a bug in our
   FSDirectory implementation that causes indexes to get corrupted when
   indexing is cut while the IW is still open. As it roots from some
   retroactive fixes you made, I'd appreciate your feedback.
  
   Correct me if I'm wrong, but by design Lucene should be able to recover
   rather quickly from power failures or app crashes. Since existing
 segment
   files are read only, only new segments that are still being written can
  get
   corrupted. Hence, recovering from worst-case scenarios is done by
 simply
   removing the write.lock file. The worst that could happen then is
 having
  the
   last segment damaged, and that can be fixed by removing those files,
   possibly by running CheckIndex on the index.
 
  You shouldn't even have to run CheckIndex ... because (as of
  LUCENE-1044) we now fsync all segment files before writing the new
  segments_N file, and then removing old segments_N files (and any
  segments that are no longer referenced).
 
  You do have to remove the write.lock if you aren't using
  NativeFSLockFactory (but this has been the default lock impl for a
  while now).
 
   Last week I have been playing with rather large indexes and crashed my
  app
   while it was indexing. I wasn't able to open the index, and Luke was
 even
   kind enough to wipe the index folder clean even though I opened it in
   read-only mode. I re-ran this, and after another crash running
 CheckIndex
   revealed nothing - the index was detected to be an empty one. I am not
   entirely sure what could be the cause for this, but I suspect it has
   been corrupted by the crash.
 
  Had no commit completed (no segments file written)?
 
  If you don't fsync then all sorts of crazy things are possible...
 
   I've been looking at these:
  
  
 
 https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  
 
 https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 
  (And LUCENE-1044 before that ... it was LUCENE-1044 that LUCENE-2328
  broke...).
 
   And it seems like this is what I was experiencing. Mike and Mark will
   probably be able to tell if this is what they saw or not, but as far
 as I
   can tell this is not an expected behavior of a Lucene index.
 
  Definitely not expected behavior: assuming nothing is flipping bits,
  then on OS/JVM crash or power loss your index should be fine, just
  reverted to the last successful commit.
 
   What I'm looking for at the moment is some advice on what FSDirectory
   implementation to use to make sure no corruption can happen. The 3.4
  version
   (which is where LUCENE-3418 was committed to) seems to handle a lot of
   things the 3.0 doesn't, but on the other hand LUCENE-3418 was
 introduced
  by
   changes made to the 3.0 codebase.
 
  Hopefully it's just that you are missing fsync!
 
   Also

Re: Corrupt index

2012-06-13 Thread Itamar Syn-Hershko
Yes, reproduced in first try. See attached program - I referenced it to
current trunk.

On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko ita...@code972.comwrote:

 Christopher,

 I used the IndexBuilder app from here
 https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a
 8.5GB wikipedia dump.

 After running for 2.5 days I had to forcefully close it (infinite loop in
 the wiki-markdown parser at 92%, go figure), and the 40-something GB index
 I had by then was unusable. I then was able to reproduce this

 Please note I now added a few safe-guards you might want to remove to make
 sure the app really crashes on process kill.

 I'll try to come up with a better way to reproduce this - hopefully Mike
 will be able to suggest better ways than manual process kill...

 On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens 
 currens.ch...@gmail.com wrote:

 Mike, The codebase for lucene.net should be almost identical to java's
 3.0.3 release, and LUCENE-1044 is included in that.

 Itamar, are you committing the index regularly?  I only ask because I
 can't
 reproduce it myself by forcibly terminating the process while it's
 indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at all and
 terminate the process (even with a 10,000 4K documents created), there
 will
 be no documents in the index when I open it in luke, which I expect.  If I
 commit at 10,000 documents, and terminate it a few thousand after that,
 the
 index has the first ten thousand that were committed.  I've even
 terminated
 it *while* a second commit was taking place, and it still had all of the
 documents I expected.

 It may be that I'm not trying to reproducing it correctly.  Do you have a
 minimal amount of code that can reproduce it?


 Thanks,
 Christopher

 On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless 
 luc...@mikemccandless.com wrote:

  Hi Itamar,
 
  One quick question: does Lucene.Net include the fixes done for
  LUCENE-1044 (to fsync files on commit)?  Those are very important for
  an index to be intact after OS/JVM crash or power loss.
 
  More responses below:
 
  On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko ita...@code972.com
 
  wrote:
 
   I'm a Lucene.Net committer, and there is a chance we have a bug in our
   FSDirectory implementation that causes indexes to get corrupted when
   indexing is cut while the IW is still open. As it roots from some
   retroactive fixes you made, I'd appreciate your feedback.
  
   Correct me if I'm wrong, but by design Lucene should be able to
 recover
   rather quickly from power failures or app crashes. Since existing
 segment
   files are read only, only new segments that are still being written
 can
  get
   corrupted. Hence, recovering from worst-case scenarios is done by
 simply
   removing the write.lock file. The worst that could happen then is
 having
  the
   last segment damaged, and that can be fixed by removing those files,
   possibly by running CheckIndex on the index.
 
  You shouldn't even have to run CheckIndex ... because (as of
  LUCENE-1044) we now fsync all segment files before writing the new
  segments_N file, and then removing old segments_N files (and any
  segments that are no longer referenced).
 
  You do have to remove the write.lock if you aren't using
  NativeFSLockFactory (but this has been the default lock impl for a
  while now).
 
   Last week I have been playing with rather large indexes and crashed my
  app
   while it was indexing. I wasn't able to open the index, and Luke was
 even
   kind enough to wipe the index folder clean even though I opened it in
   read-only mode. I re-ran this, and after another crash running
 CheckIndex
   revealed nothing - the index was detected to be an empty one. I am not
   entirely sure what could be the cause for this, but I suspect it has
   been corrupted by the crash.
 
  Had no commit completed (no segments file written)?
 
  If you don't fsync then all sorts of crazy things are possible...
 
   I've been looking at these:
  
  
 
 https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  
 
 https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 
  (And LUCENE-1044 before that ... it was LUCENE-1044 that LUCENE-2328
  broke...).
 
   And it seems like this is what I was experiencing. Mike and Mark will
   probably be able to tell if this is what they saw or not, but as far
 as I
   can tell this is not an expected behavior of a Lucene index.
 
  Definitely not expected behavior: assuming nothing is flipping bits,
  then on OS/JVM crash or power loss your index should be fine, just
  reverted to the last successful commit.
 
   What I'm looking for at the moment is some advice on what FSDirectory
   implementation to use to make sure no corruption can happen. The 3.4
  version
   (which is where LUCENE-3418 was committed to) seems to handle a lot

[jira] [Commented] (LUCENENET-438) replace java doc notation with ms style xml comments notation.

2012-06-12 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293983#comment-13293983
 ] 

Itamar Syn-Hershko commented on LUCENENET-438:
--

This should be made by a tool, really

 replace java doc notation with ms style xml comments notation.  
 

 Key: LUCENENET-438
 URL: https://issues.apache.org/jira/browse/LUCENENET-438
 Project: Lucene.Net
  Issue Type: Improvement
  Components: Lucene.Net Contrib, Lucene.Net Core
Affects Versions: Lucene.Net 2.9.4g
 Environment: all
Reporter: michael herndon
  Labels: documentation,

 The are a ton of java doc style notations inside the xml code comments, i.e. 
 {@link #IncrementToken} 
 These need to use the ms xml code comment style if there is an existing 
 equivalent.  I'm not assigning this one. If you come across this on code you 
 are working on, please take an extra few minutes to fix up the comments. 
 If you need help, grab me on #lucene.net on irc or michaelherndon on skype. 
 Just let me know who you are and what help you need. 
 A guide for code documentation, it includes a table that shows JavaDoc and 
 XML doc comment equivalents:
 https://cwiki.apache.org/confluence/display/LUCENENET/Documenting+Lucene.Net  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Sponsoring porting work

2012-06-11 Thread Itamar Syn-Hershko
Hi devs,

We are looking to sponsor porting work, to help with keeping up the pace of
development and help Lucene.Net be closer to Java Lucene. Unfortunately the
amount of work I can put on this is very limited, and being up to speed
with Lucene is important to us, hence the idea to offer sponsorship.

I'm not entirely sure how these things work under the Apache umbrella, but
I'd imagine there isn't a real issue doing that. All work will be handed
back to the project under the ASL of course. I'd appreciate any guidance if
needed.

In the meantime, interested parties are welcome to contact me privately.

Itamar.


Re: EOLs in Code

2012-06-02 Thread Itamar Syn-Hershko
Yes, seems like all of them. Will look into it.

On Sat, Jun 2, 2012 at 9:25 PM, Stefan Bodewig bode...@apache.org wrote:

 On 2012-06-02, Itamar Syn-Hershko wrote:

  I'm using git-svn, with auto-crlf set to true

 I'm not familiar enough with git-svn.  auto-crlf will cover the git side
 but I don't think it sets the eol-style property in svn.

  - I think this will cover it but let me know if my commits are bad...

 I think some of the files I touched with
 https://svn.apache.org/viewvc?view=revisionrevision=1344562 are from
 your commit.

 Thanks

Stefan




New Spatial module checked in

2012-05-30 Thread Itamar Syn-Hershko
I was finally able to get git and svn talk to one another, and pushed my
recent changes into trunk.

The new Spatial contrib is bearing the non-standard version of 2.9.9, on
purpose. It also contains Spatial4n in a binary form, mimicking the way it
works in Java Lucene.

The few tests that present pass, but when run in a chain I get the
following failure - hadn't had time to run it down:

Test
'Lucene.Net.Contrib.Spatial.Test.Prefix.TestRecursivePrefixTreeStrategy.BaseRecursivePrefixTreeStrategyTestCase.testFilterWithVariableScanLevel'
failed:
Lucene.Net.Store.AlreadyClosedException : this IndexReader is closed
Index\IndexReader.cs(204,0): at Lucene.Net.Index.IndexReader.EnsureOpen()
Index\DirectoryReader.cs(497,0): at
Lucene.Net.Index.DirectoryReader.DoReopen(Boolean openReadOnly, IndexCommit
commit)
Index\DirectoryReader.cs(462,0): at
Lucene.Net.Index.DirectoryReader.Reopen()
SpatialTestCase.cs(111,0): at
Lucene.Net.Contrib.Spatial.Test.SpatialTestCase.commit()
SpatialTestCase.cs(94,0): at
Lucene.Net.Contrib.Spatial.Test.SpatialTestCase.addDocumentsAndCommit(List`1
documents)
StrategyTestCase.cs(67,0): at
Lucene.Net.Contrib.Spatial.Test.StrategyTestCase`1.getAddAndVerifyIndexedDocuments(String
testDataFile)
Prefix\BaseRecursivePrefixTreeStrategyTestCase.cs(53,0): at
Lucene.Net.Contrib.Spatial.Test.Prefix.BaseRecursivePrefixTreeStrategyTestCase.testFilterWithVariableScanLevel()

Ideas welcome.


Re: Welcome Simon Svensson as a new committer

2012-05-24 Thread Itamar Syn-Hershko
Welcome!

On Thu, May 24, 2012 at 9:40 PM, Digy digyd...@gmail.com wrote:

 Welcome Simon

 DIGY

 -Original Message-
 From: Prescott Nasser [mailto:geobmx...@hotmail.com]
 Sent: Thursday, May 24, 2012 10:06 AM
 To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org
 Subject: Welcome Simon Svensson as a new committer





  Hey All, Our roster is growing a bit, I'd like to welcome Simon as a new
 committer. Simon has been quite active on the user mailing list helping
 answer community questions, he also maintains a C# port of the
 lucene-hunspell project (java: http://code.google.com/p/lucene-hunspell/,
 Simons c# port: https://github.com/sisve/Lucene.Net.Analysis.Hunspell)
 which
 is commonly used for spell checking (but has a wide array of purposes.
 Please join me in welcoming Simon to the team, ~Prescott

 -

 Checked by AVG - www.avg.com
 Version: 2012.0.1913 / Virus Database: 2425/5019 - Release Date: 05/24/12





Re: Welcome Itamar Syn-Hershk​o as a new committer

2012-05-23 Thread Itamar Syn-Hershko
Thanks guys

On Wed, May 23, 2012 at 1:14 AM, zoolette gaufre...@gmail.com wrote:

 Welcome in Itamar !

 2012/5/22 Prescott Nasser geobmx...@hotmail.com

 
  Hey all,
  I'd like to officially welcome Itamar as a new committer. I know the
  community appreciates the work you've been doing with the Spatial contrib
  project and the past help you've provided on the mailing lists.
  Please join me in welcoming Itamar,
  ~Prescott



[jira] [Commented] (LUCENENET-483) Spatial Search skipping records when one location is close to origin, another one is away and radius is wider

2012-05-21 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280126#comment-13280126
 ] 

Itamar Syn-Hershko commented on LUCENENET-483:
--

That's not the newest spatial module

Get it from here https://github.com/synhershko/lucene.net/tree/spatial2trunk or 
the 2.9.4 compatible version here 
https://github.com/synhershko/lucene.net/tree/spatial

 Spatial Search skipping records when one location is close to origin, another 
 one is away and radius is wider
 -

 Key: LUCENENET-483
 URL: https://issues.apache.org/jira/browse/LUCENENET-483
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g
 Environment: .Net framework 4.0
Reporter: Aleksandar Panov
  Labels: lucene, spatialsearch
 Fix For: Lucene.Net 3.0.3


 Running a spatial query against two locations where one location is close to 
 origin (less than a mile), another one is away (24 miles) and radius is wider 
 (52 miles) returns only one result. Running query with a bit wider radius 
 (53.8) returns 2 results.
 IMPORTANT UPDATE: Problem can't be reproduced in Java, with using original 
 Lucene.Spatial (2.9.4 version) library.
 {code}
 // Origin
 private double _lat = 42.350153;
 private double _lng = -71.061667;
 private const string LatField = lat;
 private const string LngField = lng;
 //Locations
 AddPoint(writer, Location 1, 42.0, -71.0); //24 miles away from 
 origin
 AddPoint(writer, Location 2, 42.35, -71.06); //less than a mile
 [TestMethod]
 public void TestAntiM()
 {
 _directory = new RAMDirectory();
 var writer = new IndexWriter(_directory, new 
 WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);
 SetUpPlotter(2, 15);
 AddData(writer);
 _searcher = new IndexSearcher(_directory, true);
 //const double miles = 53.8; // Correct. Returns 2 Locations.
 const double miles = 52; // Incorrect. Returns 1 Location.
 Console.WriteLine(testAntiM);
 // create a distance query
 var dq = new DistanceQueryBuilder(_lat, _lng, miles, LatField, 
 LngField, CartesianTierPlotter.DefaltFieldPrefix, true);
 Console.WriteLine(dq);
 //create a term query to search against all documents
 Query tq = new TermQuery(new Term(metafile, doc));
 var dsort = new DistanceFieldComparatorSource(dq.DistanceFilter);
 Sort sort = new Sort(new SortField(foo, dsort, false));
 // Perform the search, using the term query, the distance filter, 
 and the
 // distance sort
 TopDocs hits = _searcher.Search(tq, dq.Filter, 1000, sort);
 int results = hits.TotalHits;
 ScoreDoc[] scoreDocs = hits.ScoreDocs;
 // Get a list of distances
 Dictionaryint, Double distances = dq.DistanceFilter.Distances;
 Console.WriteLine(Distance Filter filtered:  + distances.Count);
 Console.WriteLine(Results:  + results);
 Console.WriteLine(=);
 Console.WriteLine(Distances should be 2  + distances.Count);
 Console.WriteLine(Results should be 2  + results);
 Assert.AreEqual(2, distances.Count); // fixed a store of only 
 needed distances
 Assert.AreEqual(2, results);
 }
 {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (LUCENENET-483) Spatial Search skipping records when one location is close to origin, another one is away and radius is wider

2012-05-21 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280266#comment-13280266
 ] 

Itamar Syn-Hershko commented on LUCENENET-483:
--

This code isn't mine :)

Try using the code from the spatial branch instead, this is what I'm using. The 
DLLs I linked to above are compile that way.

 Spatial Search skipping records when one location is close to origin, another 
 one is away and radius is wider
 -

 Key: LUCENENET-483
 URL: https://issues.apache.org/jira/browse/LUCENENET-483
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g
 Environment: .Net framework 4.0
Reporter: Aleksandar Panov
  Labels: lucene, spatialsearch
 Fix For: Lucene.Net 3.0.3


 Running a spatial query against two locations where one location is close to 
 origin (less than a mile), another one is away (24 miles) and radius is wider 
 (52 miles) returns only one result. Running query with a bit wider radius 
 (53.8) returns 2 results.
 IMPORTANT UPDATE: Problem can't be reproduced in Java, with using original 
 Lucene.Spatial (2.9.4 version) library.
 {code}
 // Origin
 private double _lat = 42.350153;
 private double _lng = -71.061667;
 private const string LatField = lat;
 private const string LngField = lng;
 //Locations
 AddPoint(writer, Location 1, 42.0, -71.0); //24 miles away from 
 origin
 AddPoint(writer, Location 2, 42.35, -71.06); //less than a mile
 [TestMethod]
 public void TestAntiM()
 {
 _directory = new RAMDirectory();
 var writer = new IndexWriter(_directory, new 
 WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);
 SetUpPlotter(2, 15);
 AddData(writer);
 _searcher = new IndexSearcher(_directory, true);
 //const double miles = 53.8; // Correct. Returns 2 Locations.
 const double miles = 52; // Incorrect. Returns 1 Location.
 Console.WriteLine(testAntiM);
 // create a distance query
 var dq = new DistanceQueryBuilder(_lat, _lng, miles, LatField, 
 LngField, CartesianTierPlotter.DefaltFieldPrefix, true);
 Console.WriteLine(dq);
 //create a term query to search against all documents
 Query tq = new TermQuery(new Term(metafile, doc));
 var dsort = new DistanceFieldComparatorSource(dq.DistanceFilter);
 Sort sort = new Sort(new SortField(foo, dsort, false));
 // Perform the search, using the term query, the distance filter, 
 and the
 // distance sort
 TopDocs hits = _searcher.Search(tq, dq.Filter, 1000, sort);
 int results = hits.TotalHits;
 ScoreDoc[] scoreDocs = hits.ScoreDocs;
 // Get a list of distances
 Dictionaryint, Double distances = dq.DistanceFilter.Distances;
 Console.WriteLine(Distance Filter filtered:  + distances.Count);
 Console.WriteLine(Results:  + results);
 Console.WriteLine(=);
 Console.WriteLine(Distances should be 2  + distances.Count);
 Console.WriteLine(Results should be 2  + results);
 Assert.AreEqual(2, distances.Count); // fixed a store of only 
 needed distances
 Assert.AreEqual(2, results);
 }
 {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (LUCENENET-483) Spatial Search skipping records when one location is close to origin, another one is away and radius is wider

2012-05-21 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280552#comment-13280552
 ] 

Itamar Syn-Hershko commented on LUCENENET-483:
--

Great.

I updated the Lucene.Net.Contrib.Spatial.dll to be of version 2.9.9 to avoid 
future confusion. This is a unique version number suggesting this is a 
non-standard issued contrib - Java Lucene will only have this module in version 
4.0.

 Spatial Search skipping records when one location is close to origin, another 
 one is away and radius is wider
 -

 Key: LUCENENET-483
 URL: https://issues.apache.org/jira/browse/LUCENENET-483
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g
 Environment: .Net framework 4.0
Reporter: Aleksandar Panov
  Labels: lucene, spatialsearch
 Fix For: Lucene.Net 3.0.3


 Running a spatial query against two locations where one location is close to 
 origin (less than a mile), another one is away (24 miles) and radius is wider 
 (52 miles) returns only one result. Running query with a bit wider radius 
 (53.8) returns 2 results.
 IMPORTANT UPDATE: Problem can't be reproduced in Java, with using original 
 Lucene.Spatial (2.9.4 version) library.
 {code}
 // Origin
 private double _lat = 42.350153;
 private double _lng = -71.061667;
 private const string LatField = lat;
 private const string LngField = lng;
 //Locations
 AddPoint(writer, Location 1, 42.0, -71.0); //24 miles away from 
 origin
 AddPoint(writer, Location 2, 42.35, -71.06); //less than a mile
 [TestMethod]
 public void TestAntiM()
 {
 _directory = new RAMDirectory();
 var writer = new IndexWriter(_directory, new 
 WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);
 SetUpPlotter(2, 15);
 AddData(writer);
 _searcher = new IndexSearcher(_directory, true);
 //const double miles = 53.8; // Correct. Returns 2 Locations.
 const double miles = 52; // Incorrect. Returns 1 Location.
 Console.WriteLine(testAntiM);
 // create a distance query
 var dq = new DistanceQueryBuilder(_lat, _lng, miles, LatField, 
 LngField, CartesianTierPlotter.DefaltFieldPrefix, true);
 Console.WriteLine(dq);
 //create a term query to search against all documents
 Query tq = new TermQuery(new Term(metafile, doc));
 var dsort = new DistanceFieldComparatorSource(dq.DistanceFilter);
 Sort sort = new Sort(new SortField(foo, dsort, false));
 // Perform the search, using the term query, the distance filter, 
 and the
 // distance sort
 TopDocs hits = _searcher.Search(tq, dq.Filter, 1000, sort);
 int results = hits.TotalHits;
 ScoreDoc[] scoreDocs = hits.ScoreDocs;
 // Get a list of distances
 Dictionaryint, Double distances = dq.DistanceFilter.Distances;
 Console.WriteLine(Distance Filter filtered:  + distances.Count);
 Console.WriteLine(Results:  + results);
 Console.WriteLine(=);
 Console.WriteLine(Distances should be 2  + distances.Count);
 Console.WriteLine(Results should be 2  + results);
 Assert.AreEqual(2, distances.Count); // fixed a store of only 
 needed distances
 Assert.AreEqual(2, results);
 }
 {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (LUCENENET-462) Spatial Search skipping records with small radius e.g. 1 mile

2012-05-19 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13279642#comment-13279642
 ] 

Itamar Syn-Hershko commented on LUCENENET-462:
--

This is now fixed with the new spatial module

https://issues.apache.org/jira/browse/LUCENENET-489

 Spatial Search skipping records with small radius e.g. 1 mile
 -

 Key: LUCENENET-462
 URL: https://issues.apache.org/jira/browse/LUCENENET-462
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.4
 Environment: .Net framework 4.0
Reporter: Mark Rodseth
  Labels: lucene, spatialsearch

 Running a spatial query against a list of locations all within 1 mile of a 
 location returns correct results for 2 miles, but incorrect results for 1 
 mile. For the one mile query, only 2 of the 8 rows are returned. 
 Locations  Test below:
 {code}
 // Origin
 private double _lat = 51.508129;
 private double _lng = -0.128005;
 private const string LatField = lat;
 private const string LngField = lng;
 // Locations
 AddPoint(writer, Location 1, 51.5073802128877, -0.124669075012207);
 AddPoint(writer, Location 2, 51.5091, -0.1235);
 AddPoint(writer, Location 3, 51.5093, -0.1232);
 AddPoint(writer, Location 4, 51.5112531582845, -0.12509822845459);
 AddPoint(writer, Location 5, 51.5107, -0.123);
 AddPoint(writer, Location 6, 51.512, -0.1246);
 AddPoint(writer, Location 8, 51.5088760101322, -0.143165588378906);
 AddPoint(writer, Location 9, 51.5087958793819, -0.143508911132813);
 {code}
 {code}
 [Test]
 public void TestAntiM()
 {
   _searcher = new IndexSearcher(_directory, true);
   const double miles = 1.0; // Bug? Only returns 2 locations. Should 
 return 8. 
   // const double miles = 2.0; // Correct. Returns 8 Locations.
   Console.WriteLine(testAntiM);
   // create a distance query
   var dq = new DistanceQueryBuilder(_lat, _lng, miles, LatField, 
 LngField, CartesianTierPlotter.DefaltFieldPrefix, true);
   Console.WriteLine(dq);
   //create a term query to search against all documents
   Query tq = new TermQuery(new Term(metafile, doc));
   var dsort = new DistanceFieldComparatorSource(dq.DistanceFilter);
   Sort sort = new Sort(new SortField(foo, dsort, false));
   // Perform the search, using the term query, the distance filter, and 
 the
   // distance sort
   TopDocs hits = _searcher.Search(tq, dq.Filter, 1000, sort);
   int results = hits.totalHits;
   ScoreDoc[] scoreDocs = hits.scoreDocs;
   // Get a list of distances
   Dictionaryint, Double distances = dq.DistanceFilter.Distances;
   Console.WriteLine(Distance Filter filtered:  + distances.Count);
   Console.WriteLine(Results:  + results);
   Console.WriteLine(=);
   Console.WriteLine(Distances should be 8  + distances.Count);
   Console.WriteLine(Results should be 8  + results);
   Assert.AreEqual(8, distances.Count); // fixed a store of only needed 
 distances
   Assert.AreEqual(8, results);
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (LUCENENET-483) Spatial Search skipping records when one location is close to origin, another one is away and radius is wider

2012-05-19 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13279643#comment-13279643
 ] 

Itamar Syn-Hershko commented on LUCENENET-483:
--

This should be fixed with the new spatial module, can you check?

https://issues.apache.org/jira/browse/LUCENENET-489

 Spatial Search skipping records when one location is close to origin, another 
 one is away and radius is wider
 -

 Key: LUCENENET-483
 URL: https://issues.apache.org/jira/browse/LUCENENET-483
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g
 Environment: .Net framework 4.0
Reporter: Aleksandar Panov
  Labels: lucene, spatialsearch
 Fix For: Lucene.Net 3.0.3


 Running a spatial query against two locations where one location is close to 
 origin (less than a mile), another one is away (24 miles) and radius is wider 
 (52 miles) returns only one result. Running query with a bit wider radius 
 (53.8) returns 2 results.
 IMPORTANT UPDATE: Problem can't be reproduced in Java, with using original 
 Lucene.Spatial (2.9.4 version) library.
 {code}
 // Origin
 private double _lat = 42.350153;
 private double _lng = -71.061667;
 private const string LatField = lat;
 private const string LngField = lng;
 //Locations
 AddPoint(writer, Location 1, 42.0, -71.0); //24 miles away from 
 origin
 AddPoint(writer, Location 2, 42.35, -71.06); //less than a mile
 [TestMethod]
 public void TestAntiM()
 {
 _directory = new RAMDirectory();
 var writer = new IndexWriter(_directory, new 
 WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);
 SetUpPlotter(2, 15);
 AddData(writer);
 _searcher = new IndexSearcher(_directory, true);
 //const double miles = 53.8; // Correct. Returns 2 Locations.
 const double miles = 52; // Incorrect. Returns 1 Location.
 Console.WriteLine(testAntiM);
 // create a distance query
 var dq = new DistanceQueryBuilder(_lat, _lng, miles, LatField, 
 LngField, CartesianTierPlotter.DefaltFieldPrefix, true);
 Console.WriteLine(dq);
 //create a term query to search against all documents
 Query tq = new TermQuery(new Term(metafile, doc));
 var dsort = new DistanceFieldComparatorSource(dq.DistanceFilter);
 Sort sort = new Sort(new SortField(foo, dsort, false));
 // Perform the search, using the term query, the distance filter, 
 and the
 // distance sort
 TopDocs hits = _searcher.Search(tq, dq.Filter, 1000, sort);
 int results = hits.TotalHits;
 ScoreDoc[] scoreDocs = hits.ScoreDocs;
 // Get a list of distances
 Dictionaryint, Double distances = dq.DistanceFilter.Distances;
 Console.WriteLine(Distance Filter filtered:  + distances.Count);
 Console.WriteLine(Results:  + results);
 Console.WriteLine(=);
 Console.WriteLine(Distances should be 2  + distances.Count);
 Console.WriteLine(Results should be 2  + results);
 Assert.AreEqual(2, distances.Count); // fixed a store of only 
 needed distances
 Assert.AreEqual(2, results);
 }
 {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SOLR-3304) Add Solr support for the new Lucene spatial module

2012-05-19 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13279609#comment-13279609
 ] 

Itamar Syn-Hershko commented on SOLR-3304:
--

In continuation to the discussion on the spatial4j list, +1 for having all the 
tests with actual spatial logic reside in the Lucene spatial module, and have 
the Solr tests rely on that

 Add Solr support for the new Lucene spatial module
 --

 Key: SOLR-3304
 URL: https://issues.apache.org/jira/browse/SOLR-3304
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: David Smiley
  Labels: spatial
 Attachments: SOLR-3304_Solr_fields_for_Lucene_spatial_module.patch


 Get the Solr spatial module integrated with the lucene spatial module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Spatial4n

2012-05-17 Thread Itamar Syn-Hershko
What are you trying to do - to work on it or to incorporate my changes?

I'm not done yet - everything was ported but there's some nasty failing
test I'm hunting down atm. You should be able to commit all my changes back
to SVN with gitdsvn, but you can also get the latest sources from here as a
zipball https://github.com/synhershko/lucene.net/zipball/spatial

There are some very good git tutorials - worth reading. Check github's for
example. Basically you just do git clone git://
github.com/synhershko/lucene.net.git and git checkout spatial and you
are done. You'll never want to go back to SVN :)

On Thu, May 17, 2012 at 8:52 PM, Prescott Nasser geobmx...@hotmail.comwrote:


 Itamar -  I'm terrible with git, the last two weekends I tried cutting out
 and making a patch of the work you've done with spatial with no luck (I do
 like learning new things so I wanted to give it a shot before reaching
 out). Do you know how to do that? Or some way to see all the changes files
 / lines in git? Sorry, I'm slow this month ;) ~P
   From: geobmx...@hotmail.com
  To: lucene-net-...@lucene.apache.org
  Subject: RE: Spatial4n
  Date: Thu, 3 May 2012 20:13:50 -0700
 
 
  I'll try to give you a hand this weekend great work ~P
Date: Fri, 4 May 2012 05:48:51 +0300
   Subject: Re: Spatial4n
   From: ita...@code972.com
   To: lucene-net-...@lucene.apache.org
  
   Status update:
  
   The Spatial4j project is completely ported to .NET, including tests,
 all of
   which green. It is available from
 https://github.com/synhershko/Spatial4n
  
   The Lucene spatial module which takes dependency on spatial4j is also
   ported now: https://github.com/synhershko/lucene.net/tree/spatial . I
 had
   to hack around quite a lot there, and created many compatibility
 classes
   and methods, since that module was originally written for the Lucene 4
 API.
  
   There is only one issue in FixedBitSet preventing it from compiling,
 I'll
   take a look at it sometime soon (or if any of you can have a look,
 that'd
   be great...)
  
   I'm now working on porting the spatial test suite. As before, any help
 will
   be appreciated.
  
   Itamar.
  
   On Thu, Apr 26, 2012 at 6:45 PM, Itamar Syn-Hershko 
 ita...@code972.comwrote:
  
Hi again,
   
I completed the port of the external Spatial library, and now am
 moving to
porting the Lucene integration.
   
The library, Spatial4n, is under ASL2 and can be found here
https://github.com/synhershko/Spatial4n
   
Anyone who can chip in and help port the tests, that would greatly
 help.
There are not so many :)
   
Itamar.
   
 




Re: including external code under apache 2.0

2012-04-30 Thread Itamar Syn-Hershko
ICLA signed and sent

On Mon, Apr 30, 2012 at 11:27 AM, Stefan Bodewig bode...@apache.org wrote:

 On 2012-04-28, Itamar Syn-Hershko wrote:

  That mail from  Stephan got lost in my inbox, so I never followed up on
  that. I guess now would be a good chance to tie up all loose ends.

  How do I do the ICLA?

 In addition to what Troy said, you can also fill in the text form and
 PGP-sign it when you send it by email.

 See http://www.apache.org/licenses/#clas

 Stefan




Re: Spatial4n

2012-04-28 Thread Itamar Syn-Hershko
No, but let me know what do I need to do.

On Sat, Apr 28, 2012 at 1:20 AM, Prescott Nasser geobmx...@hotmail.comwrote:


 Itamar, have you filed an ICLA? If so we are good to go on this, and I'll
 put this in place of the current spatial code in contrib
   From: geobmx...@hotmail.com
  To: lucene-net-dev@lucene.apache.org
  Subject: RE: Spatial4n
  Date: Thu, 26 Apr 2012 16:46:05 -0700
 
 
  Hey Stefan - can you confirm that porting Spatial4n is ok to include in
 our contrib? It is also under the apache 2.0 license, but we wanted to be
 100%. ~P
Date: Thu, 26 Apr 2012 18:45:48 +0300
   Subject: Spatial4n
   From: ita...@code972.com
   To: lucene-net-dev@lucene.apache.org
  
   Hi again,
  
   I completed the port of the external Spatial library, and now am
 moving to
   porting the Lucene integration.
  
   The library, Spatial4n, is under ASL2 and can be found here
   https://github.com/synhershko/Spatial4n
  
   Anyone who can chip in and help port the tests, that would greatly
 help.
   There are not so many :)
  
   Itamar.
 




[jira] [Commented] (LUCENENET-484) Some possibly major tests intermittently fail

2012-04-28 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264399#comment-13264399
 ] 

Itamar Syn-Hershko commented on LUCENENET-484:
--

That's probably a matter of things not being cleaned up properly in some tests? 
(didn't actually look at the tests, just the immediate thing that comes to mind)

 Some possibly major tests intermittently fail 
 --

 Key: LUCENENET-484
 URL: https://issues.apache.org/jira/browse/LUCENENET-484
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Core, Lucene.Net Test
Affects Versions: Lucene.Net 3.0.3
Reporter: Christopher Currens
 Fix For: Lucene.Net 3.0.3


 These tests will fail intermittently in Debug or Release mode, in the core 
 test suite:
 # -Lucene.Net.Index:-
 #- -TestConcurrentMergeScheduler.TestFlushExceptions-
 # Lucene.Net.Store:
 #- TestLockFactory.TestStressLocks
 # Lucene.Net.Search:
 #- TestSort.TestParallelMultiSort
 # Lucene.Net.Util:
 #- TestFieldCacheSanityChecker.TestInsanity1
 #- TestFieldCacheSanityChecker.TestInsanity2
 #- (It's possible all of the insanity tests fail at one point or another)
 # Lucene.Net.Support
 #- TestWeakHashTableMultiThreadAccess.Test
 TestWeakHashTableMultiThreadAccess should be fine to remove along with the 
 WeakHashTable in the Support namespace, since it's been replaced with 
 WeakDictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Spatial4n

2012-04-26 Thread Itamar Syn-Hershko
Hi again,

I completed the port of the external Spatial library, and now am moving to
porting the Lucene integration.

The library, Spatial4n, is under ASL2 and can be found here
https://github.com/synhershko/Spatial4n

Anyone who can chip in and help port the tests, that would greatly help.
There are not so many :)

Itamar.


Spatial4n

2012-04-26 Thread Itamar Syn-Hershko
Hi again,

I completed the port of the external Spatial library, and now am moving to
porting the Lucene integration.

The library, Spatial4n, is under ASL2 and can be found here
https://github.com/synhershko/Spatial4n

Anyone who can chip in and help port the tests, that would greatly help.
There are not so many :)

Itamar.


Re: Spatial contrib bug fixing

2012-04-24 Thread Itamar Syn-Hershko
Great

The actual library lives outside of Lucene (
https://github.com/spatial4j/spatial4j ) and only some integration classes
are within the Lucene project itself. I linked to the (long) discussions
about this in my previous message. I will be following that approach with
this port, and really hope there will be no API differences I won't be able
to overcome.

I'm going to start doing this sometime tomorrow, but my main efforts will
be on Thursday. I can certainly use any help in dividing work etc - please
anyone who can join on Thursday for live collaboration or later chip in the
discussion.

I'll keep you posted.

On Wed, Apr 25, 2012 at 12:00 AM, Christopher Currens 
currens.ch...@gmail.com wrote:

 Yes, the contrib is a MESS.  I've been favoring complete re-implementations
 over porting changes, since contrib has been in such a poor, overlooked
 state for so long.

 I'm not opposed to porting LSP over the Spatial contrib project in Java,
 though it will might some porting challenges both now, since Lucene
 versions are different, and as Lucene.NET evolves.  It also might not, I'm
 not familiar with the LSP code.  Contrib is just that, contributed software
 that is not part of the core library, and there will be projects in Java we
 can't port over.  In fact, I think there are .NET specific contrib projects
 that aren't in java.  Either way, my point is that I'm am happy and willing
 to have LSP included if that's going to wind up being better than Spatial.
  I think we can use all the help and contributions we can get in
 Lucene.NET.

 Of course, we'd need to look and see what is possible, with porting over
 LSP (not sure if it relies on any version specific features that may not
 yet be in 3.0.3).  So, I say let's go for it, and if you need any help/want
 to divide work between other committers, we can arrange that, and create
 issues for it, that is, if the other committers don't object to this.

 On Tue, Apr 24, 2012 at 1:45 PM, Itamar Syn-Hershko ita...@code972.com
 wrote:

  Thanks for your reply.
 
 
Aside from the original port which had many divergences from java, the
   only other issue applied to spatial is LUCENENET-431, which would be
 easy
   to include.
  
  
  That is not correct. LUCENENET-431 was committed, but some fixed from
 Java
  Lucene 3.0.3 are in as well. The whole thing is a mess.
 
  The reason for this mess is the amount of bugs in the original Java
  implementation of Spatial. This is also why it has been deprecated in
 3.6:
  https://issues.apache.org/jira/browse/LUCENE-2599
 
  I think the best route at this point is to port LSP aka Spatial4j to .NET
  and start using it as the Spatial module for Lucene.NET
  https://issues.apache.org/jira/browse/LUCENE-3795
 
  This is a Java Lucene 4 feature, but the current spatial implementation
 is
  pretty unisable.
 
  I'm going to start looking into this, and would definitely appreciate
 your
  input.
 
  Itamar.
 



Re: guestimation on -pre nuget package.

2012-04-24 Thread Itamar Syn-Hershko
If it is known to be stable for actual use, we RavenDB dev team will update
to it in a branch and provide feedback. A -Pre nuget package released for
every RC can definitely help here.

On Tue, Apr 24, 2012 at 5:25 PM, Michael Herndon 
mhern...@wickedsoftware.net wrote:

 Do you all think we're at a point to do a -pre nuget package that users can
 tinker with and provide feedback?  The -pre flag means that it is only
 meant to be a pre release in order to get feedback. We might get more
 feedback if we package the binaries.

 Those that pushed the last package, what do you think the amount of effort
 / time that will take to get something like this done?  (I'm asking so that
 I can block off enough time in my schedule to do this. )

 I'm guessing shouldn't be as rigorous as a typical apache release as its
 meant just to package a alpha/beta binaries, not an official RTW.

 - michael.



Re: Spatial contrib bug fixing

2012-04-24 Thread Itamar Syn-Hershko
Uhm.. I was referring to the .NET port, which I can see DIGY ported

Nevermind I will get it from the original commit

@Prescott any idea re  CartesianPolyFilterBuilder.GetBoxShape() is not an
exact port - do you remember why? ?

On Tue, Apr 24, 2012 at 12:26 AM, Christopher Currens 
currens.ch...@gmail.com wrote:

 It's in a weird place.  And for the 3.0.3 version, its easiest to find the
 code in the tags, rather than branches.


 http://svn.apache.org/viewvc/lucene/java/tags/lucene_3_0_3/contrib/misc/src/java/org/apache/lucene/misc/


 On Mon, Apr 23, 2012 at 2:20 PM, Prescott Nasser geobmx...@hotmail.com
 wrote:

 
  I'm having trouble finding chained filter in the java lucene svn...
 
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/?pathrev=990167amI
  looking around in the wrong place?
Date: Mon, 23 Apr 2012 11:19:51 +0300
   Subject: Re: Spatial contrib bug fixing
   From: ita...@code972.com
   To: lucene-net-...@lucene.apache.org
  
   One more thing - what's the deal with ChainedFilter? I can see a commit
  by
   DIGY on 7/7/2011 but it seems to have been removed since?
  
   On Mon, Apr 23, 2012 at 11:06 AM, Itamar Syn-Hershko 
 ita...@code972.com
  wrote:
  
For starters - CartesianPolyFilterBuilder.GetBoxShape() is not an
 exact
port - do you remember why?
   
Anyway, if it was never fully ported as you say maybe I'll just go
  ahead
and complete that
   
For your reference, here are 2 failing tests which pass in Java
 Lucene
(can send the java file) -
   
 
 https://github.com/synhershko/lucene.net/commit/234da7eca7cb08be5a0c2a7375ffc3f4a03bfd92
   
   
   
On Mon, Apr 23, 2012 at 1:39 AM, Prescott Nasser 
  geobmx...@hotmail.comwrote:
   
   
I think that was a while ago, and I don't even recall if I fully
  ported
it or just put up the start. I had some other stuff to deal with the
  last
few months, so my memory is a bit lacking. I'll review the code,
  meanwhile
ask whatever questions you have - lets get this fixed up. ~P
  Date: Sun, 22 Apr 2012 22:10:27 +0300
 Subject: Spatial contrib bug fixing
 From: ita...@code972.com
 To: lucene-net-...@lucene.apache.org

 Hi all,

 We encountered several bugs with the Sparial contrb, and the ones
 we
tested
 with Java Lucene worked there (with 2.9.4). There are about 3 open
tickets
 in the Jira bug tracker on similar issues.

 I'm now sitting with the ultimate goal of fixing this once and for
  all,
but
 some code parts are commented out in favor of other not
 line-by-line
port
 of some implementations, without a comment giving reasons. I was
wondering
 if there's anyone who could answer a few questions there, instead
  of me
 changing things back and forth?

 Git history (I use the Git mirror, yes) tells me Prescott Nasser
 is
behind
 porting this - maybe he will have the answers?

 Cheers,

 Itamar.
   
   
   
   
 
 



Re: Spatial contrib bug fixing

2012-04-24 Thread Itamar Syn-Hershko
Thanks for your reply.


  Aside from the original port which had many divergences from java, the
 only other issue applied to spatial is LUCENENET-431, which would be easy
 to include.


That is not correct. LUCENENET-431 was committed, but some fixed from Java
Lucene 3.0.3 are in as well. The whole thing is a mess.

The reason for this mess is the amount of bugs in the original Java
implementation of Spatial. This is also why it has been deprecated in 3.6:
https://issues.apache.org/jira/browse/LUCENE-2599

I think the best route at this point is to port LSP aka Spatial4j to .NET
and start using it as the Spatial module for Lucene.NET
https://issues.apache.org/jira/browse/LUCENE-3795

This is a Java Lucene 4 feature, but the current spatial implementation is
pretty unisable.

I'm going to start looking into this, and would definitely appreciate your
input.

Itamar.


Re: Spatial contrib bug fixing

2012-04-23 Thread Itamar Syn-Hershko
For starters - CartesianPolyFilterBuilder.GetBoxShape() is not an exact
port - do you remember why?

Anyway, if it was never fully ported as you say maybe I'll just go ahead
and complete that

For your reference, here are 2 failing tests which pass in Java Lucene (can
send the java file) -
https://github.com/synhershko/lucene.net/commit/234da7eca7cb08be5a0c2a7375ffc3f4a03bfd92


On Mon, Apr 23, 2012 at 1:39 AM, Prescott Nasser geobmx...@hotmail.comwrote:


 I think that was a while ago, and I don't even recall if I fully ported it
 or just put up the start. I had some other stuff to deal with the last few
 months, so my memory is a bit lacking. I'll review the code, meanwhile ask
 whatever questions you have - lets get this fixed up. ~P
   Date: Sun, 22 Apr 2012 22:10:27 +0300
  Subject: Spatial contrib bug fixing
  From: ita...@code972.com
  To: lucene-net-dev@lucene.apache.org
 
  Hi all,
 
  We encountered several bugs with the Sparial contrb, and the ones we
 tested
  with Java Lucene worked there (with 2.9.4). There are about 3 open
 tickets
  in the Jira bug tracker on similar issues.
 
  I'm now sitting with the ultimate goal of fixing this once and for all,
 but
  some code parts are commented out in favor of other not line-by-line port
  of some implementations, without a comment giving reasons. I was
 wondering
  if there's anyone who could answer a few questions there, instead of me
  changing things back and forth?
 
  Git history (I use the Git mirror, yes) tells me Prescott Nasser is
 behind
  porting this - maybe he will have the answers?
 
  Cheers,
 
  Itamar.




  1   2   >