from:"Itamar Syn\-Hershko"

[jira] [Commented] (LUCENE-8565) SimpleQueryParser to support field filtering (aka Add field:text operator)

2019-08-16 Thread Itamar Syn-Hershko (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909145#comment-16909145
 ] 

Itamar Syn-Hershko commented on LUCENE-8565:


Heya - is this waiting for anything in particular that I can help in 
finalizing? Would really like to see this merged in. Thanks

> SimpleQueryParser to support field filtering (aka Add field:text operator)
> --
>
> Key: LUCENE-8565
> URL: https://issues.apache.org/jira/browse/LUCENE-8565
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>        Reporter: Itamar Syn-Hershko
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimpleQueryParser lacks support for the `field:` operator for creating 
> queries which operate on fields other than the default field. Seems like one 
> can either get the parsed query to operate on a single field, or on ALL 
> defined fields (+ weights). No support for specifying `field:value` in the 
> query.
> It probably wasn't forgotten, but rather left out for simplicity, but since 
> we are using this QP implementation more and more (mostly through 
> Elasticsearch) we thought it would be useful to have it in.
> Seems like this is not too hard to pull off and I'll be happy to contribute a 
> patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8565) SimpleQueryParser to support field filtering (aka Add field:text operator)

2019-02-19 Thread Itamar Syn-Hershko (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771783#comment-16771783
 ] 

Itamar Syn-Hershko commented on LUCENE-8565:


I'm not sure what the Lucene versioning policy about that would be; but we can 
always change the default flag to turn off field filtering support

> SimpleQueryParser to support field filtering (aka Add field:text operator)
> --
>
> Key: LUCENE-8565
> URL: https://issues.apache.org/jira/browse/LUCENE-8565
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>        Reporter: Itamar Syn-Hershko
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimpleQueryParser lacks support for the `field:` operator for creating 
> queries which operate on fields other than the default field. Seems like one 
> can either get the parsed query to operate on a single field, or on ALL 
> defined fields (+ weights). No support for specifying `field:value` in the 
> query.
> It probably wasn't forgotten, but rather left out for simplicity, but since 
> we are using this QP implementation more and more (mostly through 
> Elasticsearch) we thought it would be useful to have it in.
> Seems like this is not too hard to pull off and I'll be happy to contribute a 
> patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SimpleQueryParser to support field filtering?

2019-02-19 Thread Itamar Syn-Hershko

Anyone?

--

Itamar Syn-Hershko
CTO, Founder
BigData Boutique <http://bigdataboutique.com/>
Elasticsearch Consulting Partner
Microsoft MVP | Lucene.NET PMC
http://code972.com | @synhershko <https://twitter.com/synhershko>


On Mon, Jan 14, 2019 at 10:19 AM Itamar Syn-Hershko 
wrote:

> Hi all,
>
> I sent a PR back in November to resolve the title and would appreciate
> feedback.
>
> Summary:
>
> SimpleQueryParser lacks support for the `field:` operator for creating
> queries which operate on fields other than the default field. Seems
> like one can either get the parsed query to operate on a single field, or
> on ALL defined fields (+ weights). No support for specifying `field:value`
> in the query.
>
> It probably wasn't forgotten, but rather left out for simplicity, but
> since we are using this QP implementation more and more (mostly through
> Elasticsearch) we thought it would be useful to have it in.
>
> JIRA: https://issues.apache.org/jira/browse/LUCENE-8565
>
> PR: https://github.com/apache/lucene-solr/pull/498
>
> What do people think?
>
> Cheers,
>
> --
>
> Itamar Syn-Hershko
> CTO, Founder
> BigData Boutique <http://bigdataboutique.com/>
> Elasticsearch Consulting Partner
> http://code972.com | @synhershko <https://twitter.com/synhershko>
>
>

SimpleQueryParser to support field filtering?

2019-01-14 Thread Itamar Syn-Hershko

Hi all,

I sent a PR back in November to resolve the title and would appreciate
feedback.

Summary:

SimpleQueryParser lacks support for the `field:` operator for creating
queries which operate on fields other than the default field. Seems
like one can either get the parsed query to operate on a single field, or
on ALL defined fields (+ weights). No support for specifying `field:value`
in the query.

It probably wasn't forgotten, but rather left out for simplicity, but since
we are using this QP implementation more and more (mostly through
Elasticsearch) we thought it would be useful to have it in.

JIRA: https://issues.apache.org/jira/browse/LUCENE-8565

PR: https://github.com/apache/lucene-solr/pull/498

What do people think?

Cheers,

--

Itamar Syn-Hershko
CTO, Founder
BigData Boutique <http://bigdataboutique.com/>
Elasticsearch Consulting Partner
http://code972.com | @synhershko <https://twitter.com/synhershko>

[jira] [Updated] (LUCENE-8565) SimpleQueryParser to support field filtering (aka Add field:text operator)

2018-11-14 Thread Itamar Syn-Hershko (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itamar Syn-Hershko updated LUCENE-8565:
---
Summary: SimpleQueryParser to support field filtering (aka Add field:text 
operator)  (was: SimpleQueryString to support field filtering (aka Add 
field:text operator))

> SimpleQueryParser to support field filtering (aka Add field:text operator)
> --
>
> Key: LUCENE-8565
> URL: https://issues.apache.org/jira/browse/LUCENE-8565
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>        Reporter: Itamar Syn-Hershko
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimpleQueryString lacks support for the `field:` operator for creating 
> queries which operate on fields other than the default field. Seems like one 
> can either get the parsed query to operate on a single field, or on ALL 
> defined fields (+ weights). No support for specifying `field:value` in the 
> query.
> It probably wasn't forgotten, but rather left out for simplicity, but since 
> we are using this QP implementation more and more (mostly through 
> Elasticsearch) we thought it would be useful to have it in.
> Seems like this is not too hard to pull off and I'll be happy to contribute a 
> patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8565) SimpleQueryParser to support field filtering (aka Add field:text operator)

2018-11-14 Thread Itamar Syn-Hershko (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itamar Syn-Hershko updated LUCENE-8565:
---
Description: 
SimpleQueryParser lacks support for the `field:` operator for creating queries 
which operate on fields other than the default field. Seems like one can either 
get the parsed query to operate on a single field, or on ALL defined fields (+ 
weights). No support for specifying `field:value` in the query.

It probably wasn't forgotten, but rather left out for simplicity, but since we 
are using this QP implementation more and more (mostly through Elasticsearch) 
we thought it would be useful to have it in.

Seems like this is not too hard to pull off and I'll be happy to contribute a 
patch for it.

  was:
SimpleQueryString lacks support for the `field:` operator for creating queries 
which operate on fields other than the default field. Seems like one can either 
get the parsed query to operate on a single field, or on ALL defined fields (+ 
weights). No support for specifying `field:value` in the query.

It probably wasn't forgotten, but rather left out for simplicity, but since we 
are using this QP implementation more and more (mostly through Elasticsearch) 
we thought it would be useful to have it in.

Seems like this is not too hard to pull off and I'll be happy to contribute a 
patch for it.


> SimpleQueryParser to support field filtering (aka Add field:text operator)
> --
>
> Key: LUCENE-8565
> URL: https://issues.apache.org/jira/browse/LUCENE-8565
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>        Reporter: Itamar Syn-Hershko
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimpleQueryParser lacks support for the `field:` operator for creating 
> queries which operate on fields other than the default field. Seems like one 
> can either get the parsed query to operate on a single field, or on ALL 
> defined fields (+ weights). No support for specifying `field:value` in the 
> query.
> It probably wasn't forgotten, but rather left out for simplicity, but since 
> we are using this QP implementation more and more (mostly through 
> Elasticsearch) we thought it would be useful to have it in.
> Seems like this is not too hard to pull off and I'll be happy to contribute a 
> patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8565) SimpleQueryString to support field filtering (aka Add field:text operator)

2018-11-14 Thread Itamar Syn-Hershko (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686301#comment-16686301
 ] 

Itamar Syn-Hershko commented on LUCENE-8565:


PR submitted on github: [https://github.com/apache/lucene-solr/pull/498.] 
Reviews appreciated.

> SimpleQueryString to support field filtering (aka Add field:text operator)
> --
>
> Key: LUCENE-8565
> URL: https://issues.apache.org/jira/browse/LUCENE-8565
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>        Reporter: Itamar Syn-Hershko
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimpleQueryString lacks support for the `field:` operator for creating 
> queries which operate on fields other than the default field. Seems like one 
> can either get the parsed query to operate on a single field, or on ALL 
> defined fields (+ weights). No support for specifying `field:value` in the 
> query.
> It probably wasn't forgotten, but rather left out for simplicity, but since 
> we are using this QP implementation more and more (mostly through 
> Elasticsearch) we thought it would be useful to have it in.
> Seems like this is not too hard to pull off and I'll be happy to contribute a 
> patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8565) SimpleQueryString to support field filtering (aka Add field:text operator)

2018-11-14 Thread Itamar Syn-Hershko (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itamar Syn-Hershko updated LUCENE-8565:
---
Description: 
SimpleQueryString lacks support for the `field:` operator for creating queries 
which operate on fields other than the default field. Seems like one can either 
get the parsed query to operate on a single field, or on ALL defined fields (+ 
weights). No support for specifying `field:value` in the query.

It probably wasn't forgotten, but rather left out for simplicity, but since we 
are using this QP implementation more and more (mostly through Elasticsearch) 
we thought it would be useful to have it in.

Seems like this is not too hard to pull off and I'll be happy to contribute a 
patch for it.

  was:
SimpleQueryString lacks support for the `field:` operator for creating queries 
which operate on fields other than the default field. Seems like one can either 
get the parsed query to operate on a single field, or on ALL defined fields (+ 
weights). No support for specifying `field:value` in the query.

It probably wasn't forgotten, but rather left out for simplicity, but since we 
are using this QP implementation more and more (mostly through Elasticsearch) 
we thought it would be 

Seems like this is not too hard to pull off and I'll be happy to contribute a 
patch for it.


> SimpleQueryString to support field filtering (aka Add field:text operator)
> --
>
> Key: LUCENE-8565
> URL: https://issues.apache.org/jira/browse/LUCENE-8565
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>        Reporter: Itamar Syn-Hershko
>Priority: Minor
>
> SimpleQueryString lacks support for the `field:` operator for creating 
> queries which operate on fields other than the default field. Seems like one 
> can either get the parsed query to operate on a single field, or on ALL 
> defined fields (+ weights). No support for specifying `field:value` in the 
> query.
> It probably wasn't forgotten, but rather left out for simplicity, but since 
> we are using this QP implementation more and more (mostly through 
> Elasticsearch) we thought it would be useful to have it in.
> Seems like this is not too hard to pull off and I'll be happy to contribute a 
> patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8565) SimpleQueryString to support field filtering (aka Add field:text operator)

2018-11-13 Thread Itamar Syn-Hershko (JIRA)

Itamar Syn-Hershko created LUCENE-8565:
--

 Summary: SimpleQueryString to support field filtering (aka Add 
field:text operator)
 Key: LUCENE-8565
 URL: https://issues.apache.org/jira/browse/LUCENE-8565
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Itamar Syn-Hershko


SimpleQueryString lacks support for the `field:` operator for creating queries 
which operate on fields other than the default field. Seems like one can either 
get the parsed query to operate on a single field, or on ALL defined fields (+ 
weights). No support for specifying `field:value` in the query.

It probably wasn't forgotten, but rather left out for simplicity, but since we 
are using this QP implementation more and more (mostly through Elasticsearch) 
we thought it would be 

Seems like this is not too hard to pull off and I'll be happy to contribute a 
patch for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6302) Adding Date Math support to Lucene Expressions module

2015-02-26 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338677#comment-14338677
 ] 

Itamar Syn-Hershko commented on LUCENE-6302:


Sent a PR for the latter https://github.com/apache/lucene-solr/pull/129

 Adding Date Math support to Lucene Expressions module
 -

 Key: LUCENE-6302
 URL: https://issues.apache.org/jira/browse/LUCENE-6302
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/expressions
Affects Versions: 4.10.3
Reporter: Itamar Syn-Hershko

 Lucene Expressions are great, but they don't allow for date math. More 
 specifically, they don't allow to infer date parts from a numeric 
 representation of a date stamp, nor they allow to parse strings 
 representations to dates.
 Some of the features requested here easy to implement via ValueSource 
 implementation (and potentially minor changes to the lexer definition) , some 
 are more involved. I'll be happy if we could get half of those in, and will 
 be happy to work on a PR for the parts we can agree on.
 The items we will be happy to have:
 - A now() function (with or without TZ support) to return a current long 
 date/time value as numeric, that we could use against indexed datetime fields 
 (which are infact numerics)
 - Parsing methods - to allow to express datetime as strings, and / or read it 
 from stored fields and parse it from there. Parse errors would render a value 
 of zero.
 - Given a numeric value, allow to specify it is a date value and then infer 
 date parts - e.g. Date(1424963520).Year == 2015, or Date(now()) - 
 Date(1424963520).Year. Basically methods which return numerics but internally 
 create and use Date objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6302) Adding Date Math support to Lucene Expressions module

2015-02-26 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338563#comment-14338563
 ] 

Itamar Syn-Hershko commented on LUCENE-6302:


I actually expected the main objection would be to adding date parsing methods 
:)

Maybe it would make sense to explain the use cases this is trying to solve.

We are using Elasticsearch  Kibana and since the latest version switched to 
using Lucene Expressions (from Groovy) we found ourselves blocked by the things 
we can do with Kibana's scripted fields

For example, given a user's DOB, how can we do aggregations on their age? or 
compute how many years (or days) have passed between 2 given days?

Yes we can subtract the epochs (except that it doesn't seem to work 
https://github.com/elasticsearch/elasticsearch/issues/9884) but translating the 
result to terms of days, hours or years is even uglier using an expression.

I think introducing ValueSources to do this should be enough, but if changing 
the lexer will be the preferred way I can go and do that as well. With regards 
to syntax - I'm not locked on any preferred syntax.

Either way it seems like adding a now() function is the easiest change and can 
send a PR with this change alone to start with.

 Adding Date Math support to Lucene Expressions module
 -

 Key: LUCENE-6302
 URL: https://issues.apache.org/jira/browse/LUCENE-6302
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/expressions
Affects Versions: 4.10.3
Reporter: Itamar Syn-Hershko

 Lucene Expressions are great, but they don't allow for date math. More 
 specifically, they don't allow to infer date parts from a numeric 
 representation of a date stamp, nor they allow to parse strings 
 representations to dates.
 Some of the features requested here easy to implement via ValueSource 
 implementation (and potentially minor changes to the lexer definition) , some 
 are more involved. I'll be happy if we could get half of those in, and will 
 be happy to work on a PR for the parts we can agree on.
 The items we will be happy to have:
 - A now() function (with or without TZ support) to return a current long 
 date/time value as numeric, that we could use against indexed datetime fields 
 (which are infact numerics)
 - Parsing methods - to allow to express datetime as strings, and / or read it 
 from stored fields and parse it from there. Parse errors would render a value 
 of zero.
 - Given a numeric value, allow to specify it is a date value and then infer 
 date parts - e.g. Date(1424963520).Year == 2015, or Date(now()) - 
 Date(1424963520).Year. Basically methods which return numerics but internally 
 create and use Date objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6302) Adding Date Math support to Lucene Expressions module

2015-02-26 Thread Itamar Syn-Hershko (JIRA)

Itamar Syn-Hershko created LUCENE-6302:
--

 Summary: Adding Date Math support to Lucene Expressions module
 Key: LUCENE-6302
 URL: https://issues.apache.org/jira/browse/LUCENE-6302
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/expressions
Affects Versions: 4.10.3
Reporter: Itamar Syn-Hershko


Lucene Expressions are great, but they don't allow for date math. More 
specifically, they don't allow to infer date parts from a numeric 
representation of a date stamp, nor they allow to parse strings representations 
to dates.

Some of the features requested here easy to implement via ValueSource 
implementation (and potentially minor changes to the lexer definition) , some 
are more involved. I'll be happy if we could get half of those in, and will be 
happy to work on a PR for the parts we can agree on.

The items we will be happy to have:

- A now() function (with or without TZ support) to return a current long 
date/time value as numeric, that we could use against indexed datetime fields 
(which are infact numerics)
- Parsing methods - to allow to express datetime as strings, and / or read it 
from stored fields and parse it from there. Parse errors would render a value 
of zero.
- Given a numeric value, allow to specify it is a date value and then infer 
date parts - e.g. Date(1424963520).Year == 2015, or Date(now()) - 
Date(1424963520).Year. Basically methods which return numerics but internally 
create and use Date objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: FSDirectory and creating directory

2015-02-04 Thread Itamar Syn-Hershko

Thanks guys, we will mimic the current behavior and ignore the comment.
Mike I did promise to find bugs!

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Lucene.NET committer and PMC member

On Wed, Feb 4, 2015 at 11:20 AM, Uwe Schindler u...@thetaphi.de wrote:

 Hi Mike,

 This is why I ask here! So I think we should fix this before release of
 5.0! Maybe Robert has an explanation why he does the createDirectories() on
 ctor.
 In any case I will now commit the removal of the bogus comment in 4.10
 branch.

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


  -Original Message-
  From: Michael McCandless [mailto:luc...@mikemccandless.com]
  Sent: Wednesday, February 04, 2015 10:07 AM
  To: Lucene/Solr dev
  Cc: d...@lucenenet.apache.org
  Subject: Re: FSDirectory and creating directory
 
  In the past we considered this (mkdir when creating FSDir) a bug:
  https://issues.apache.org/jira/browse/LUCENE-1464
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Wed, Feb 4, 2015 at 4:03 AM, Uwe Schindler uschind...@apache.org
  wrote:
   Hi,
  
   on the Lucene.NET mailing list there were some issues with porting over
  Lucene 4.8's FSDirectory class to .NET. In fact the following comment on
 a
  method caused confusion:
  
 // returns the canonical version of the directory, creating it if it
 doesn't
  exist.
 private static File getCanonicalPath(File file) throws IOException {
   return new File(file.getCanonicalPath());
 }
  
   In fact, the comment is not correct (and the whole method is useless -
 one
  could call file.getCanonicalFile() to do the same. According to Javadocs
 and
  my tests, this method does *not* generate the directory. If the directory
  does not exists, it just returns a synthetic canonical name (modifying
 only
  known parts of the path). In fact we should maybe fix this comment and
  remove this method in 4.10.x (if we get a further bugfix release).
  
   We also have a test that validates that a directory is not created by
  FSDirectory's ctor (a side effect of some IndexWriter test).
  
   Nevertheless, in Lucene 5 we changed the behavior of the FSDirectory
  CTOR with NIO.2:
  
 protected FSDirectory(Path path, LockFactory lockFactory) throws
  IOException {
   super(lockFactory);
   Files.createDirectories(path);  // create directory, if it doesn't
 exist
   directory = path.toRealPath();
 }
  
   The question is now: Do we really intend to create the directory in
 Lucene 5
  ? What about opening an IndexReader on a non-existent directory on a
 read-
  only filesystem? I know that Robert added this to make path.getRealPath()
  to work correctly?
  
   I just want to discuss this before we release 5.0. To me it sounds
 wrong to
  create the directory in the constructor...
  
   Uwe
  
   -
   Uwe Schindler
   uschind...@apache.org
   Apache Lucene PMC Member / Committer
   Bremen, Germany
   http://lucene.apache.org/
  
  
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
   additional commands, e-mail: dev-h...@lucene.apache.org
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
  commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

Re: FSDirectory and creating directory

2015-02-04 Thread Itamar Syn-Hershko

Rob, what is the intended behavior, and what is the reasoning behind it?

Doesn't this affect only attempts to open a non-existent index directory -
and whether or not there will be an empty folder left behind?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Lucene.NET committer and PMC member

On Wed, Feb 4, 2015 at 2:45 PM, Robert Muir rcm...@gmail.com wrote:

 Personally, I am completely against changing this for 5.0

 This is the worst possible thing you can do, it will trickle into more
 bugs in lockfactory etc. Please don't make this last minute risky
 change. it has no benefits and will only cause bugs.

 On Wed, Feb 4, 2015 at 7:44 AM, Robert Muir rcm...@gmail.com wrote:
  On Wed, Feb 4, 2015 at 4:03 AM, Uwe Schindler uschind...@apache.org
 wrote:
 
  The question is now: Do we really intend to create the directory in
 Lucene 5 ? What about opening an IndexReader on a non-existent directory on
 a read-only filesystem? I know that Robert added this to make
 path.getRealPath() to work correctly?
 
  I just want to discuss this before we release 5.0. To me it sounds
 wrong to create the directory in the constructor...
 
 
  Please dont call this a bug until you understand why the change was
  made. Please, read the behavior of getCanonicalPath and understand
  exactly why and how it fails: and its this nonexistent case.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-10 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241214#comment-14241214
 ] 

Itamar Syn-Hershko commented on LUCENE-6103:


Maybe out of scope of this ticket, but how do we go about #2? will be happy to 
take this discussion offline as well

 StandardTokenizer doesn't tokenize word:word
 

 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko
Assignee: Steve Rowe

 StandardTokenizer (and by result most default analyzers) will not tokenize 
 word:word and will preserve it as one token. This can be easily seen using 
 Elasticsearch's analyze API:
 localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
 If this is the intended behavior, then why? I can't really see the logic 
 behind it.
 If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-10 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241306#comment-14241306
 ] 

Itamar Syn-Hershko commented on LUCENE-6103:


Sent them a request. I'll buy Robert beers if that could help pushing this 
forward!

 StandardTokenizer doesn't tokenize word:word
 

 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko
Assignee: Steve Rowe

 StandardTokenizer (and by result most default analyzers) will not tokenize 
 word:word and will preserve it as one token. This can be easily seen using 
 Elasticsearch's analyze API:
 localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
 If this is the intended behavior, then why? I can't really see the logic 
 behind it.
 If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6103) StandardTokenizer doesn't tokenizer word:word

2014-12-09 Thread Itamar Syn-Hershko (JIRA)

Itamar Syn-Hershko created LUCENE-6103:
--

 Summary: StandardTokenizer doesn't tokenizer word:word
 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko


StandardTokenizer (and by result most default analyzers) will not tokenize 
word:word and will preserve it as one token. This can be easily seen using 
Elasticsearch's analyze API:

localhost:9200/_analyze?tokenizer=standardtext=word%20word:word

If this is the intended behavior, then why? I can't really see the logic behind 
it.

If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-09 Thread Itamar Syn-Hershko (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itamar Syn-Hershko updated LUCENE-6103:
---
Summary: StandardTokenizer doesn't tokenize word:word  (was: 
StandardTokenizer doesn't tokenizer word:word)

 StandardTokenizer doesn't tokenize word:word
 

 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko

 StandardTokenizer (and by result most default analyzers) will not tokenize 
 word:word and will preserve it as one token. This can be easily seen using 
 Elasticsearch's analyze API:
 localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
 If this is the intended behavior, then why? I can't really see the logic 
 behind it.
 If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5997) StandardFilter redundant

2014-12-09 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239697#comment-14239697
 ] 

Itamar Syn-Hershko commented on LUCENE-5997:


Sounds good!

 StandardFilter redundant
 

 Key: LUCENE-5997
 URL: https://issues.apache.org/jira/browse/LUCENE-5997
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.10.1
Reporter: Itamar Syn-Hershko
Priority: Trivial

 Any reason why StandardFilter is still around? its just a no-op class now:
   @Override
   public final boolean incrementToken() throws IOException {
 return input.incrementToken(); // TODO: add some niceties for the new 
 grammar
   }
 https://github.com/apache/lucene-solr/blob/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardFilter.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5723) Performance improvements for FastCharStream

2014-12-09 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239728#comment-14239728
 ] 

Itamar Syn-Hershko commented on LUCENE-5723:


Reported as https://java.net/jira/browse/JAVACC-285

 Performance improvements for FastCharStream
 ---

 Key: LUCENE-5723
 URL: https://issues.apache.org/jira/browse/LUCENE-5723
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Itamar Syn-Hershko
Priority: Minor

 Hello from the .NET land,
 A user of ours has identified an optimization opportunity, although minor I 
 think it points to a valid point - we should avoid using exceptions from 
 controlling flow when possible.
 Here's the original ticket + commits to our codebase. If this looks valid to 
 you too I can go ahead and prepare a PR.
 https://issues.apache.org/jira/browse/LUCENENET-541
 https://github.com/apache/lucene.net/commit/ac8c9fa809110ddb180bf7b2ce93e86270b39ff6
 https://git-wip-us.apache.org/repos/asf?p=lucenenet.git;a=blobdiff;f=src/core/QueryParser/QueryParserTokenManager.cs;h=ec09c8e451f7a7d1572fbdce4c7598e362526a7c;hp=17583d20f660fdb6e4aa86105c7574383f965ebe;hb=41ebbc2d;hpb=ac8c9fa809110ddb180bf7b2ce93e86270b39ff6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-09 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239784#comment-14239784
 ] 

Itamar Syn-Hershko commented on LUCENE-6103:


Yes, I figured it will be down to some Unicode rules. Can you give a rationale 
for this, mainly out of curiosity?

Not a Unicode expert, but I'd assume just like you wouldn't want English words 
to not-break on Hebrew Punctuation Gershayim (e.g. TestWord is actually 2 
tokens and מנכלים is one), maybe this rule is meant for specific scenarios and 
not for the general use case?

On another note, any type of Gershayim should be preserved within Hebrew words, 
not only U+05F4. This is mainly because keyboards and editors used produce the 
standard  character in most cases. I had a chat with Robert a while back where 
he said that's the case, I'm just making sure you didn't follow the specs to 
the letter in that regard...

 StandardTokenizer doesn't tokenize word:word
 

 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko
Assignee: Steve Rowe

 StandardTokenizer (and by result most default analyzers) will not tokenize 
 word:word and will preserve it as one token. This can be easily seen using 
 Elasticsearch's analyze API:
 localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
 If this is the intended behavior, then why? I can't really see the logic 
 behind it.
 If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-09 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240090#comment-14240090
 ] 

Itamar Syn-Hershko commented on LUCENE-6103:


Good stuff, thanks Steve. I'm going to see if the rest of the UAX is good for 
us, and if so see if I can comply with the 6.2.5 version of the specs.

It's a good thing StandardTokenizer is no longer English centric, but I cannot 
imagine what use the colon has especially since in most cases it is not 
something reasonable :)

 StandardTokenizer doesn't tokenize word:word
 

 Key: LUCENE-6103
 URL: https://issues.apache.org/jira/browse/LUCENE-6103
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko
Assignee: Steve Rowe

 StandardTokenizer (and by result most default analyzers) will not tokenize 
 word:word and will preserve it as one token. This can be easily seen using 
 Elasticsearch's analyze API:
 localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
 If this is the intended behavior, then why? I can't really see the logic 
 behind it.
 If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-09 Thread Itamar Syn-Hershko (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240133#comment-14240133
]

Itamar Syn-Hershko commented on LUCENE-6103:

Ok so I did some homework. In swedish, connect is a way to shortcut writings
of words. So C:a is infact cirka which means approximately. I guess it
can be thought of as English acronyms, only apparently its way less commonly
used in Swedish (my source says very very seldomly used; old style and not
used in modern Swedish at all).

Not only it is hardly being used, apparently it's only legal in 3 letter
combinations (c:a but not c:ka).

And also, the affects it has are quite severe at the moment - 2 words with a
colon in between that didn't have space will be outputted as one token even
though its 100% its not applicable to Swedish, since each words has 2
characters.

I'm not aiming at changing the Unicode standards, that's way beyond my limited
powers, but:

1. Given the above, does it really make sense to use this tokenizer in all
language-specific analyzers as well? e.g.
https://github.com/apache/lucene-solr/blob/lucene_solr_4_9_1/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java#L105

I'd think for language specific analyzers we'd want tokenizers aiming for this
language with limited support for others. So, in this case, colon will always
be considered a tokenizing char.

2. Can we change the jflex definition to at least limit the effects of this,
e.g. only support colon as MidLetter if the overall token length == 3, so c:a
is a valid token and word:word is not?

StandardTokenizer doesn't tokenize word:word

Key: LUCENE-6103
URL: https://issues.apache.org/jira/browse/LUCENE-6103
Project: Lucene - Core
Issue Type: Bug
Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko
Assignee: Steve Rowe

StandardTokenizer (and by result most default analyzers) will not tokenize
word:word and will preserve it as one token. This can be easily seen using
Elasticsearch's analyze API:
localhost:9200/_analyze?tokenizer=standardtext=word%20word:word
If this is the intended behavior, then why? I can't really see the logic
behind it.
If not, I'll be happy to join in the effort of fixing this.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6103) StandardTokenizer doesn't tokenize word:word

2014-12-09 Thread Itamar Syn-Hershko (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240392#comment-14240392
]

Itamar Syn-Hershko commented on LUCENE-6103:

0. You mean it implements UAX#29 version 6.3 :)

1. I'll likely be sending a PR for #1 sometime soon. Would you recommend using
UAX#29 minus specific non-English tweaks, or fall back to
ClassicStandardTokenizer which is English specific, or something else?

2. Here's the thing: the standard is wrong, or buggy. Ask any Swedish and they
will tell you, and any non-Swedish corpus wouldn't care. And basically this is
a bug in every Lucene based system today because of the word:word scenario; its
a bit of an edge case but I bet I can find multiple occurrences in every big
enough system. What can we do about that?

We already solved this using char filters, converting colons to a comma. It
feels a bit hacky though, and again - this _is_ a flaw in Lucene's analysis
even though it conforms to a Unicode standard.

StandardTokenizer doesn't tokenize word:word

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

JFlex, tokenization, and custom token exceptions

2014-11-13 Thread Itamar Syn-Hershko

Hey all,

I posted this question also to the JFlex[1] list as it seems a more
appropriate place, but I thought I should raise this here as well.

I'm looking for ways to use Lucene's tokenizers, but preserve some custom
tokens defined by the user. For example, use StandardTokenizer but preserve
C++, C# and i-phone as whole tokens. The gotcha here is I want that list to
be loaded on runtime, and not compiled into the tokenizer - mainly because
it will change over time.

The problem is there's no real way of doing this currently. While I had
implemented this myself, JFlex doesn't seem to support this (other than
defining new macros and regenerating the Java pieces, recompiling etc).

I discussed this with Rob Muir a couple of months back and he seemed
interested, will be happy to see if there's interest in pursuing this, or
get any new ideas on how to enable this more easily on the JFlex layer or
otherwise. I'll be happy to take this on but every approach I'm looking at
currently has some significant flaws.

Cheers,

  [1] http://sourceforge.net/p/jflex/mailman/jflex-users/?viewmonth=201411

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Author of RavenDB in Action http://manning.com/synhershko/

[jira] [Created] (LUCENE-5997) StandardFilter redundant

2014-10-07 Thread Itamar Syn-Hershko (JIRA)

Itamar Syn-Hershko created LUCENE-5997:
--

 Summary: StandardFilter redundant
 Key: LUCENE-5997
 URL: https://issues.apache.org/jira/browse/LUCENE-5997
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.10.1
Reporter: Itamar Syn-Hershko
Priority: Trivial


Any reason why StandardFilter is still around? its just a no-op class now:

  @Override
  public final boolean incrementToken() throws IOException {
return input.incrementToken(); // TODO: add some niceties for the new 
grammar
  }

https://github.com/apache/lucene-solr/blob/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardFilter.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4601) ivy availability check isn't quite right

2014-06-18 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035885#comment-14035885
 ] 

Itamar Syn-Hershko commented on LUCENE-4601:


May not be directly related, but I just tried running this: 
http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ on OSX Mavericks, 
with ant and ivy both installed via homebrew. Ivy was not found by and idea 
even when I placed a manually downloaded jar locally myself.

I had to run ivy-bootstrap to get off the ground - maybe it worths adding that 
to the docs

 ivy availability check isn't quite right
 

 Key: LUCENE-4601
 URL: https://issues.apache.org/jira/browse/LUCENE-4601
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Reporter: Robert Muir
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4601.patch


 remove ivy from your .ant/lib but load it up on a build file like so:
 You have to lie to lucene's build, overriding ivy.available, because for some 
 reason the detection is wrong and will tell you ivy is not available, when it 
 actually is.
 I tried changing the detector to use available classname=some.ivy.class and 
 this didnt work either... so I don't actually know what the correct fix is.
 {noformat}
 project name=test default=test basedir=.
   path id=ivy.lib.path
 fileset dir=/Users/rmuir includes=ivy-2.2.0.jar /
   /path
   taskdef resource=org/apache/ivy/ant/antlib.xml 
 uri=antlib:org.apache.ivy.ant classpathref=ivy.lib.path /
   target name=test
 subant target=test inheritAll=false inheritRefs=false 
 failonerror=true
   fileset dir=lucene-trunk/lucene includes=build.xml/
   !-- lie --
   property name=ivy.available value=true/
 /subant
   /target
 /project
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2841) CommonGramsFilter improvements

2014-06-18 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035978#comment-14035978
 ] 

Itamar Syn-Hershko commented on LUCENE-2841:


Can anyone review and comment?

 CommonGramsFilter improvements
 --

 Key: LUCENE-2841
 URL: https://issues.apache.org/jira/browse/LUCENE-2841
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.1, 4.0-ALPHA
Reporter: Steve Rowe
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: commit-6402a55.patch


 Currently CommonGramsFilter expects users to remove the common words around 
 which output token ngrams are formed, by appending a StopFilter to the 
 analysis pipeline.  This is inefficient in two ways: captureState() is called 
 on (trailing) stopwords, and then the whole stream has to be re-examined by 
 the following StopFilter.
 The current ctor should be deprecated, and another ctor added with a boolean 
 option controlling whether the common words should be output as unigrams.
 If common words *are* configured to be output as unigrams, captureState() 
 will still need to be called, as it is now.
 If the common words are *not* configured to be output as unigrams, rather 
 than calling captureState() for the trailing token in each output token 
 ngram, the term text, position and offset can be maintained in the same way 
 as they are now for the leading token: using a System.arrayCopy()'d term 
 buffer and a few ints for positionIncrement and offsetd.  The user then no 
 longer would need to append a StopFilter to the analysis chain.
 An example illustrating both possibilities should also be added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5723) Performance improvements for FastCharStream

2014-05-31 Thread Itamar Syn-Hershko (JIRA)

Itamar Syn-Hershko created LUCENE-5723:
--

 Summary: Performance improvements for FastCharStream
 Key: LUCENE-5723
 URL: https://issues.apache.org/jira/browse/LUCENE-5723
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Itamar Syn-Hershko
Priority: Minor


Hello from the .NET land,

A user of ours has identified an optimization opportunity, although minor I 
think it points to a valid point - we should avoid using exceptions from 
controlling flow when possible.

Here's the original ticket + commits to our codebase. If this looks valid to 
you too I can go ahead and prepare a PR.

https://issues.apache.org/jira/browse/LUCENENET-541
https://github.com/apache/lucene.net/commit/ac8c9fa809110ddb180bf7b2ce93e86270b39ff6
https://git-wip-us.apache.org/repos/asf?p=lucenenet.git;a=blobdiff;f=src/core/QueryParser/QueryParserTokenManager.cs;h=ec09c8e451f7a7d1572fbdce4c7598e362526a7c;hp=17583d20f660fdb6e4aa86105c7574383f965ebe;hb=41ebbc2d;hpb=ac8c9fa809110ddb180bf7b2ce93e86270b39ff6



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: ICUFoldingFilter obsolete?

2014-03-03 Thread Itamar Syn-Hershko

This makes sense, thanks Rob

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Author of RavenDB in Action http://manning.com/synhershko/


On Sun, Mar 2, 2014 at 3:54 PM, Robert Muir rcm...@gmail.com wrote:

 I use it too, its fine. Its just not really standardized, and never was :)

 that UTR had that status when i wrote it!

 On Sun, Mar 2, 2014 at 8:52 AM, Shawn Heisey s...@elyograg.org wrote:
  On 3/2/2014 6:37 AM, Robert Muir wrote:
  It was always this way. i don't think such kinds of normalization
  should be standards either (what this stuff is doing is heuristical in
  nature).
 
  I use ICUFoldingFilterFactory in my Solr schema, with the idea that it's
  a smart and single-pass way to fold and lowercase.
 
  Is support from IBM and Lucene expected to continue, or should I be
  looking for another solution?
 
  Thanks,
  Shawn
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

ICUFoldingFilter obsolete?

2014-03-02 Thread Itamar Syn-Hershko

Hi all,

I may have missed the train on this, but what is the status of
ICUFoldingFilter?

Documentation suggests it follows foldings specified in UTR#30 (
http://lucene.apache.org/core/4_6_1/analyzers-icu/org/apache/lucene/analysis/icu/ICUFoldingFilter.html),
but UTR#30 is a draft that was later withdrawn (
http://www.unicode.org/reports/tr30/).

I'm not up-to-date with the greatest and latest in the Unicode world so I'm
not sure why it was withdrawn, but given the delicacy of term normalization
I suppose this worth revisiting?

Thanks,

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Author of RavenDB in Action http://manning.com/synhershko/

[jira] [Created] (LUCENE-5358) Code cleanup on KStemmer

2013-12-03 Thread Itamar Syn-Hershko (JIRA)

Itamar Syn-Hershko created LUCENE-5358:
--

 Summary: Code cleanup on KStemmer
 Key: LUCENE-5358
 URL: https://issues.apache.org/jira/browse/LUCENE-5358
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6, 4.5.1, 4.5, 3.0
Reporter: Itamar Syn-Hershko
Priority: Minor


This affects all versions with KStemmer in them

The code of KStemmer needs some intensive cleanup, just to give you some idea 
on something that immediately popped up:

https://github.com/apache/lucene-solr/blob/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/KStemmer.java#L283-286

I'll be happy to do this myself, just wanted to check in advance to see if this 
is something you'd consider accepting in



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5011) MemoryIndex and FVH don't play along with multi-value fields

2013-05-21 Thread Itamar Syn-Hershko (JIRA)

Itamar Syn-Hershko created LUCENE-5011:
--

 Summary: MemoryIndex and FVH don't play along with multi-value 
fields
 Key: LUCENE-5011
 URL: https://issues.apache.org/jira/browse/LUCENE-5011
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Itamar Syn-Hershko


When multi-value fields are indexed to a MemoryIndex, positions are computed 
correctly on search but the start and end offsets and the values array index 
aren't correct.

Comparing the same execution path for IndexReader on a Directory impl  and 
MemoryIndex (same document, same query, same analyzer, different Index impl), 
the difference first shows in FieldTermStack.java line 125:

termList.add( new TermInfo( term, dpEnum.startOffset(), dpEnum.endOffset(), 
pos, weight ) );

dpEnum.startOffset() and dpEnum.endOffset don't match between implementations.

This looks like a bug in MemoryIndex, which doesn't seem to handle tokenized 
multi-value fields all too well when positions and offsets are required.

I should also mention we are using an Analyzer which outputs several tokens at 
a position (a la SynonymFilter), but I don't believe this is related.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5011) MemoryIndex and FVH don't play along with multi-value fields

2013-05-21 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662950#comment-13662950
 ] 

Itamar Syn-Hershko commented on LUCENE-5011:


The actual test case we have now is very tightly coupled with ElasticSearch and 
our custom analysis chain, it may take me some time to decouple it into a 
stand-alone Lucene test. Alternatively, I'll be happy to work this out with you 
via Skype using our existing test case.

 MemoryIndex and FVH don't play along with multi-value fields
 

 Key: LUCENE-5011
 URL: https://issues.apache.org/jira/browse/LUCENE-5011
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Itamar Syn-Hershko

 When multi-value fields are indexed to a MemoryIndex, positions are computed 
 correctly on search but the start and end offsets and the values array index 
 aren't correct.
 Comparing the same execution path for IndexReader on a Directory impl  and 
 MemoryIndex (same document, same query, same analyzer, different Index impl), 
 the difference first shows in FieldTermStack.java line 125:
 termList.add( new TermInfo( term, dpEnum.startOffset(), dpEnum.endOffset(), 
 pos, weight ) );
 dpEnum.startOffset() and dpEnum.endOffset don't match between implementations.
 This looks like a bug in MemoryIndex, which doesn't seem to handle tokenized 
 multi-value fields all too well when positions and offsets are required.
 I should also mention we are using an Analyzer which outputs several tokens 
 at a position (a la SynonymFilter), but I don't believe this is related.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4673) TermQuery.toString() doesn't play nicely with whitespace

2013-01-09 Thread Itamar Syn-Hershko (JIRA)

Itamar Syn-Hershko created LUCENE-4673:
--

 Summary: TermQuery.toString() doesn't play nicely with whitespace
 Key: LUCENE-4673
 URL: https://issues.apache.org/jira/browse/LUCENE-4673
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.6.2, 4.0-BETA, 4.1
Reporter: Itamar Syn-Hershko


A TermQuery where term.text() contains whitespace outputs incorrect string 
representation: field:foo bar instead of field:foo bar. A correct 
representation is such that could be parsed again to the correct Query object 
(using the correct analyzer, yes, but still).

This may not be so critical, but in our system we use Lucene's QP to parse and 
then pre-process and optimize user queries. To do that we use Query.toString on 
some clauses to rebuild the query string.

This can be easily resolved by always adding quote marks before and after the 
term text in TermQuery.toString. Testing to see if they are required or not  is 
too much work and TermQuery is ignorant of quote marks anyway.

Some other scenarios which could benefit from this change is places where 
escaped characters are used, such as URLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4673) TermQuery.toString() doesn't play nicely with whitespace

2013-01-09 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548874#comment-13548874
 ] 

Itamar Syn-Hershko commented on LUCENE-4673:


I figured as much, yet we would definitely like to have use this behavior 
built-in. Are there any plans on making such an interface to perform a proper 
Query - String conversion?

 TermQuery.toString() doesn't play nicely with whitespace
 

 Key: LUCENE-4673
 URL: https://issues.apache.org/jira/browse/LUCENE-4673
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0-BETA, 4.1, 3.6.2
Reporter: Itamar Syn-Hershko

 A TermQuery where term.text() contains whitespace outputs incorrect string 
 representation: field:foo bar instead of field:foo bar. A correct 
 representation is such that could be parsed again to the correct Query object 
 (using the correct analyzer, yes, but still).
 This may not be so critical, but in our system we use Lucene's QP to parse 
 and then pre-process and optimize user queries. To do that we use 
 Query.toString on some clauses to rebuild the query string.
 This can be easily resolved by always adding quote marks before and after the 
 term text in TermQuery.toString. Testing to see if they are required or not  
 is too much work and TermQuery is ignorant of quote marks anyway.
 Some other scenarios which could benefit from this change is places where 
 escaped characters are used, such as URLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2841) CommonGramsFilter improvements

2012-12-24 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539310#comment-13539310
 ] 

Itamar Syn-Hershko commented on LUCENE-2841:


Attached is a patch to fix this, including tests. There is no regression, and 
the new behavior when keepOrig is set to true is as described in the comments 
here.

The only thing I wasn't sure about was CommonGramsQueryFilter - should it be 
deprecated? or how should it be made to work with this change?

 CommonGramsFilter improvements
 --

 Key: LUCENE-2841
 URL: https://issues.apache.org/jira/browse/LUCENE-2841
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.1, 4.0-ALPHA
Reporter: Steven Rowe
Priority: Minor
 Fix For: 4.1

 Attachments: commit-6402a55.patch


 Currently CommonGramsFilter expects users to remove the common words around 
 which output token ngrams are formed, by appending a StopFilter to the 
 analysis pipeline.  This is inefficient in two ways: captureState() is called 
 on (trailing) stopwords, and then the whole stream has to be re-examined by 
 the following StopFilter.
 The current ctor should be deprecated, and another ctor added with a boolean 
 option controlling whether the common words should be output as unigrams.
 If common words *are* configured to be output as unigrams, captureState() 
 will still need to be called, as it is now.
 If the common words are *not* configured to be output as unigrams, rather 
 than calling captureState() for the trailing token in each output token 
 ngram, the term text, position and offset can be maintained in the same way 
 as they are now for the leading token: using a System.arrayCopy()'d term 
 buffer and a few ints for positionIncrement and offsetd.  The user then no 
 longer would need to append a StopFilter to the analysis chain.
 An example illustrating both possibilities should also be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: pro coding style

2012-12-01 Thread Itamar Syn-Hershko

In the past git had bad tooling, that is not the case today. I've been
using git also without github screens - and while they definitely add a
lot, it is still ten times more usable than SVN.

As I told the Lucene.NET mailing list, you should all watch the following
video and give git a few days of your time before continuing with this
discussion: http://www.youtube.com/watch?v=4XpnKHJAok8

Also, Apache mirrors to github, so basically you work against github all
the time


On Fri, Nov 30, 2012 at 4:15 PM, Robert Muir rcm...@gmail.com wrote:



 On Fri, Nov 30, 2012 at 9:10 AM, Mark Miller markrmil...@gmail.comwrote:


 On Nov 30, 2012, at 8:56 AM, Robert Muir rcm...@gmail.com wrote:

  but git by itself, is pretty unusable.

 Given the number of committers that eat some pain to use git when
 developing lucene/solr, and have no github or pull requests, I'm not sure
 that's a common though :)


 Sure, some people might disagree with me.
 I'm more than willing to eat some pain if it makes contributions easier.

 I just feel like a lot of what makes github successful is
 unfortunately actually in github and not git.

 Its like if your development team is screaming for linux machines. You
 have to be careful how to interpret that. If you hand them a bunch of
 machines with just linux kernels, they probably won't be productive. When
 they scream for linux they want a userland with a shell, compiler,
 X-windows, editor and so on too.

[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance

2012-09-08 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451430#comment-13451430
 ] 

Itamar Syn-Hershko commented on LUCENE-4208:


What's the status of this? are query results being properly sorted based on 
distance?

 Spatial distance relevancy should use score of 1/distance
 -

 Key: LUCENE-4208
 URL: https://issues.apache.org/jira/browse/LUCENE-4208
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
 Fix For: 4.0


 The SpatialStrategy.makeQuery() at the moment uses the distance as the score 
 (although some strategies -- TwoDoubles if I recall might not do anything 
 which would be a bug).  The distance is a poor value to use as the score 
 because the score should be related to relevancy, and the distance itself is 
 inversely related to that.  A score of 1/distance would be nice.  Another 
 alternative is earthCircumference/2 - distance, although I like 1/distance 
 better.  Maybe use a different constant than 1.
 Credit: this is Chris Male's idea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4186) Lucene spatial's distErrPct is treated as a fraction, not a percent.

2012-09-02 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447037#comment-13447037
 ] 

Itamar Syn-Hershko commented on LUCENE-4186:


distErrPct makes sense to me - it makes more sense to talk about the expected 
error rate rather than actual given precision. Hence the name Distance Error 
Percentage makes perfect sense, although is tough to make an acronym of...

And while at it throw a bug fix in: SpatialArgs.toString should multiply 
distPrecision by 100, not divide it.

 Lucene spatial's distErrPct is treated as a fraction, not a percent.
 --

 Key: LUCENE-4186
 URL: https://issues.apache.org/jira/browse/LUCENE-4186
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
Priority: Critical
 Fix For: 4.0


 The distance-error-percent of a query shape in Lucene spatial is, in a 
 nutshell, the percent of the shape's area that is an error epsilon when 
 considering search detail at its edges.  The default is 2.5%, for reference.  
 However, as configured, it is read in as a fraction:
 {code:xml}
 fieldType name=location_2d_trie 
 class=solr.SpatialRecursivePrefixTreeFieldType
distErrPct=0.025 maxDetailDist=0.001 /
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4342) Issues with prefix tree's Distance Error Percentage

2012-08-31 Thread Itamar Syn-Hershko (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445807#comment-13445807
 ] 

Itamar Syn-Hershko commented on LUCENE-4342:


I can confirm this is fixed now. Thanks for the fast turnaround!

 Issues with prefix tree's Distance Error Percentage 
 

 Key: LUCENE-4342
 URL: https://issues.apache.org/jira/browse/LUCENE-4342
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Affects Versions: 4.0-ALPHA, 4.0-BETA
Reporter: Itamar Syn-Hershko
Assignee: David Smiley
 Fix For: 4.0

 Attachments: 
 LUCENE-4342_fix_distance_precision_lookup_for_prefix_trees,_and_modify_the_default_algorit.patch,
  unnamed.patch


 See attached patch for a failing test
 Basically, it's a simple point and radius scenario that works great as long 
 as args.setDistPrecision(0.0); is called. Once the default precision is used 
 (2.5%), it doesn't work as expected.
 The distance between the 2 points in the patch is 35.75 KM. Taking into 
 account the 2.5% error the effective radius without false negatives/positives 
 should be around 34.8 KM. This test fails with a radius of 33 KM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4342) Issues with prefix tree's Distance Error Percentage

2012-08-29 Thread Itamar Syn-Hershko (JIRA)

Itamar Syn-Hershko created LUCENE-4342:
--

 Summary: Issues with prefix tree's Distance Error Percentage 
 Key: LUCENE-4342
 URL: https://issues.apache.org/jira/browse/LUCENE-4342
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Affects Versions: 4.0-BETA, 4.0-ALPHA
Reporter: Itamar Syn-Hershko
 Attachments: unnamed.patch

See attached patch for a failing test

Basically, it's a simple point and radius scenario that works great as long as 
args.setDistPrecision(0.0); is called. Once the default precision is used 
(2.5%), it doesn't work as expected.

The distance between the 2 points in the patch is 35.75 KM. Taking into account 
the 2.5% error the effective radius without false negatives/positives should be 
around 34.8 KM. This test fails with a radius of 33 KM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4342) Issues with prefix tree's Distance Error Percentage

2012-08-29 Thread Itamar Syn-Hershko (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itamar Syn-Hershko updated LUCENE-4342:
---

Attachment: unnamed.patch

A failing test

 Issues with prefix tree's Distance Error Percentage 
 

 Key: LUCENE-4342
 URL: https://issues.apache.org/jira/browse/LUCENE-4342
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Affects Versions: 4.0-ALPHA, 4.0-BETA
Reporter: Itamar Syn-Hershko
 Attachments: unnamed.patch


 See attached patch for a failing test
 Basically, it's a simple point and radius scenario that works great as long 
 as args.setDistPrecision(0.0); is called. Once the default precision is used 
 (2.5%), it doesn't work as expected.
 The distance between the 2 points in the patch is 35.75 KM. Taking into 
 account the 2.5% error the effective radius without false negatives/positives 
 should be around 34.8 KM. This test fails with a radius of 33 KM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1375282 - /incubator/lucene.net/trunk/src/core/Util/Parameter.cs

2012-08-20 Thread Itamar Syn-Hershko

This will probably require releasing the core again as well as a new RC...

The spatial module was updated, still doing some integration tests, will
send more updates soon

On Tue, Aug 21, 2012 at 1:14 AM, synhers...@apache.org wrote:

 Author: synhershko
 Date: Mon Aug 20 22:14:01 2012
 New Revision: 1375282

 URL: http://svn.apache.org/viewvc?rev=1375282view=rev
 Log:
 Fixing a possible NRE which can be thrown during a race condition on
 accessing allParameters

 This is not an air-tight solution, as an ArgumentException can still be
 thrown. I don't care much about doing this within a lock as it will never
 be a bottleneck.


 https://groups.google.com/group/ravendb/browse_thread/thread/a5cf07e80f70c856

 Modified:
 incubator/lucene.net/trunk/src/core/Util/Parameter.cs

 Modified: incubator/lucene.net/trunk/src/core/Util/Parameter.cs
 URL:
 http://svn.apache.org/viewvc/incubator/lucene.net/trunk/src/core/Util/Parameter.cs?rev=1375282r1=1375281r2=1375282view=diff

 ==
 --- incubator/lucene.net/trunk/src/core/Util/Parameter.cs (original)
 +++ incubator/lucene.net/trunk/src/core/Util/Parameter.cs Mon Aug 20 22:14:01
 2012
 @@ -39,11 +39,13 @@ namespace Lucene.Net.Util
 // typesafe enum pattern, no public constructor
 this.name = name;
 string key = MakeKey(name);
 -
 -   if (allParameters.ContainsKey(key))
 -   throw new
 System.ArgumentException(Parameter name  + key +  already used!);
 -
 -   allParameters[key] = this;
 +
 +   lock (allParameters)
 +   {
 +   if (allParameters.ContainsKey(key))
 +   throw new
 System.ArgumentException(Parameter name  + key +  already used!);
 +   allParameters[key] = this;
 +   }
 }

 private string MakeKey(string name)

Re: svn commit: r1375282 - /incubator/lucene.net/trunk/src/core/Util/Parameter.cs

2012-08-20 Thread Itamar Syn-Hershko

That won't work, the Occur flags need to be statically and publicly
available

Since the entire point of that Parameter class is to make the enum
serializable, which is infact the case with C# (while it is not in Java 5),
I just removed it and made Occur a native enum again

All core tests pass (aside from 2 in TestOpenBitSet and
TestWeakDictionaryBehavior, but they aren't related to this change).

Commit details: http://svn.apache.org/viewvc?view=revisionrevision=1375296

On Tue, Aug 21, 2012 at 1:21 AM, Oren Eini (Ayende Rahien) 
aye...@ayende.com wrote:

 Instead of doing it this way, do NOT create Occur using separate static
 fields.
 Merge Parameter into Occur (only used there) and create the entire
 dictionary once.
 Otherwise, you run into risk of the ArgumentException.
 If that happens, because this is raised from the static ctor, you'll have
 killed the entire app domain.

 On Tue, Aug 21, 2012 at 1:19 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:

  This will probably require releasing the core again as well as a new
 RC...
 
  The spatial module was updated, still doing some integration tests, will
  send more updates soon
 
  On Tue, Aug 21, 2012 at 1:14 AM, synhers...@apache.org wrote:
 
   Author: synhershko
   Date: Mon Aug 20 22:14:01 2012
   New Revision: 1375282
  
   URL: http://svn.apache.org/viewvc?rev=1375282view=rev
   Log:
   Fixing a possible NRE which can be thrown during a race condition on
   accessing allParameters
  
   This is not an air-tight solution, as an ArgumentException can still be
   thrown. I don't care much about doing this within a lock as it will
 never
   be a bottleneck.
  
  
  
 
 https://groups.google.com/group/ravendb/browse_thread/thread/a5cf07e80f70c856
  
   Modified:
   incubator/lucene.net/trunk/src/core/Util/Parameter.cs
  
   Modified: incubator/lucene.net/trunk/src/core/Util/Parameter.cs
   URL:
  
 
 http://svn.apache.org/viewvc/incubator/lucene.net/trunk/src/core/Util/Parameter.cs?rev=1375282r1=1375281r2=1375282view=diff
  
  
 
 ==
   --- incubator/lucene.net/trunk/src/core/Util/Parameter.cs (original)
   +++ incubator/lucene.net/trunk/src/core/Util/Parameter.cs Mon Aug 20
 22
  :14:01
   2012
   @@ -39,11 +39,13 @@ namespace Lucene.Net.Util
   // typesafe enum pattern, no public constructor
   this.name = name;
   string key = MakeKey(name);
   -
   -   if (allParameters.ContainsKey(key))
   -   throw new
   System.ArgumentException(Parameter name  + key +  already used!);
   -
   -   allParameters[key] = this;
   +
   +   lock (allParameters)
   +   {
   +   if (allParameters.ContainsKey(key))
   +   throw new
   System.ArgumentException(Parameter name  + key +  already used!);
   +   allParameters[key] = this;
   +   }
   }
  
   private string MakeKey(string name)

Re: Outstanding issues for 3.0.3

2012-08-02 Thread Itamar Syn-Hershko

Nowadays git works just great for Windows, and it's much easier to work
with than Hg

On Wed, Aug 1, 2012 at 9:41 PM, Zachary Gramana zgram...@feature23.comwrote:

 On Aug 1, 2012, at 12:51 PM, Itamar Syn-Hershko wrote:

  And for heaven's sake, can we move to git when graduating?

 Given that we're a .NET-focused community, and many of us are likely
 primarily using Windows as both our primary development and deployment
 platforms, I'd suggest looking at Mercurial before committing to git.

 Either way, +1 for any DVCS.

Re: Outstanding issues for 3.0.3

2012-08-02 Thread Itamar Syn-Hershko

The point is to make the code better, not to satisfy R# :)

The main benefit of this process is marking fields as readonly, finding
code paths with stupid behavior and moving simple aggregations to use LINQ.
I don't apply the LINQ syntax to a non-trivial operations, to make it
easier to keep track of the Java version.

My thoughts on the points you raised inline

On Thu, Aug 2, 2012 at 6:53 PM, Zachary Gramana zgram...@gmail.com wrote:

 I would like to pitch into this effort and put my ReSharper license to
 use. I pulled down trunk, and picked a yellow item at random, and started
 to dig in. I quickly generated more questions than answers, realized I
 needed to stop munging code and consult the wiki and list archives. After
 digging through both, I'm still not entirely certain about what the style
 guidelines are for 3.x onward.

 I also noted this[1] discussion regarding some other guidelines, but it
 didn't see if it made it beyond the proposal stage.

 [1]
 http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-dev/201112.mbox/%3ccajtrbsrdbzkocwln6d6ywhzn2fno91mko1acrp-pflx62du...@mail.gmail.com%3E

 Here are some of the things Re# is catching that I'm unsure of:

 1) Usage of this prefix when not required.

 this.blah = blah;  - required this.
 this.aBlah = blah; - optional this, which Re# doesn't like.

 I'm assuming consistency wins here, and 'this.' stays, but wanted to
 double check.


Doesn't really matter IMO. I just hit Alt-enter when I have it in focus,
otherwise I ignore that.



 2) Using different conventions for fields and parameters\local vars.

 blah vs. _blah

 Combined with 1, Re# wants (and I'm personally accustomed to):

 _blah = blah;

 However, that seems to violate the adopted style.


I think we should stick to the Java naming conventions in the private parts
(minus the function casings) as much as possible. Main reason is the
ability to apply patches from Java Lucene and support future ports more
easily. This is why I kept variable names untouched.



 3) Full qualification of type names.

 Re # wants to remove redundant namespace qualifiers. Leave them or remove
 them?


Same as Alt-Enter argument as above...



 4) Removing unreferenced classes.

 Should I remove non-public unreferenced classes? The ones I've come across
 so far are private.


It's .NET, not C++, but I still usually remove them, not really sure why
tho...



 5) var vs. explicit

 I know this has been brought up before, but not sure of the final
 disposition. FWIW, I prefer var.


 There are some non-Re# issues I came across as well that look like
 artifacts of code generation:


I move to var because it *might* help in the future when the API changes,
and it doesn't really affect anything now



 6) Weird param names.

 Param1 vs. directory

 I assume it's okay to replace 'Param1' with something a descriptive name
 like 'directory'.


Yes. Also var names like out_Renamed to @out. This one is important.



 7) Field names that follow local variable naming conventions.

 Lots of issues related to private vars with names like i, j, k, etc. It
 feels like the right thing to do is to change the scope so that they go
 back to being local vars instead of fields. However, this requires a much
 more significant refactoring, and I didn't want to assume it was okay to do
 that.


See above, I don't think we should touch those.



 If these questions have already been answered elsewhere and I missed the
 documentation/FAQ/developer guide, then I apologize and would appreciate
 the links. Alternatively, if someone has a Re# rule config that they are
 willing to post somewhere, I would be glad to use it.

 - Zack


 On Jul 27, 2012, at 12:00 PM, Itamar Syn-Hershko wrote:

  The cleanup consists mainly of going file by file with ReSharper and
 trying
  to get them as green as possible. Making a lot of fields readonly,
 removing
  unused vars and stuff like that. There are still loads of files left.
 
  I was also hoping to get to updating the spatial module with some recent
  updates, and to also support polygon searches. But that may take a bit
 more
  time, so it's really up to you guys (or we can open a vote for it).

Re: Outstanding issues for 3.0.3

2012-08-02 Thread Itamar Syn-Hershko

Prescott - we could make an RC and push it to Nuget as a PreRelease, to get
real feedback.

On Thu, Aug 2, 2012 at 7:13 PM, Prescott Nasser geobmx...@hotmail.comwrote:

 I don't think we ever fully adopted the style guidelines, probably not a
 terrible discussion to have. As for this release, I think that by lazy
 consensus we should branch the trunk at the end of this weekend (say
 monday), and begin the process of cutting a release. - my $.02 below


  1) Usage of this prefix when not required.
 
  this.blah = blah; - required this.
  this.aBlah = blah; - optional this, which Re# doesn't like.
 
  I'm assuming consistency wins here, and 'this.' stays, but wanted to
 double check.

 I'd error with consistency


 
  2) Using different conventions for fields and parameters\local vars.
 
  blah vs. _blah
 

  Combined with 1, Re# wants (and I'm personally accustomed to):
 
  _blah = blah;
 


 For private variables _ is ok, for anything else, don't use _ as it's not
 CLR compliant


  However, that seems to violate the adopted style.
 
  3) Full qualification of type names.
 
  Re # wants to remove redundant namespace qualifiers. Leave them or
 remove them?
 

 I try to remove them

  4) Removing unreferenced classes.
 
  Should I remove non-public unreferenced classes? The ones I've come
 across so far are private.
 

 I'm not sure I understand - are you saying we have classes that are never
 used in random places? If so, I think before removing them we should have a
 conversation; what are they, why are they there, etc. - I'm hoping there
 aren't too many of these..

  5) var vs. explicit
 
  I know this has been brought up before, but not sure of the final
 disposition. FWIW, I prefer var.
 

 I use var with it's plainly obvious the object var obj = new MyClass(). I
 usually use explicit when it's an object returned from some function that
 makes it unclear what the return value is:


 var items = search.GetResults();

 vs

 IListSearchResult items = search.GetResults(); //prefer


 
  There are some non-Re# issues I came across as well that look like
 artifacts of code generation:
 
  6) Weird param names.
 
  Param1 vs. directory
 
  I assume it's okay to replace 'Param1' with something a descriptive name
 like 'directory'.
 

 Weird - I think a rename is OK for this release (Since we're ticking up a
 full version number), but I believe changing param names can potentially
 break code. That said, I don't really think we need to change the names and
 push the 3.0.3 release out, and if it does in fact cause breaking changes,
 I'd be a little careful about how we do it going forward to 3.6.

  7) Field names that follow local variable naming conventions.
 
  Lots of issues related to private vars with names like i, j, k, etc. It
 feels like the right thing to do is to change the scope so that they go
 back to being local vars instead of fields. However, this requires a much
 more significant refactoring, and I didn't want to assume it was okay to do
 that.
 

 I'd avoid this for now - a lot of this is a carry over from the java
 version and to rename all those, it starts to get a bit confusing if we
 have to compare java to C# and these are all changed around.



  If these questions have already been answered elsewhere and I missed the
 documentation/FAQ/developer guide, then I apologize and would appreciate
 the links. Alternatively, if someone has a Re# rule config that they are
 willing to post somewhere, I would be glad to use it.
 

 I think we talked about Re#'s rules at one point, I'll try to dig that
 conversation up and see where it landed. It's probably a good idea for us
 to build rules though.

  - Zack
 
 
  On Jul 27, 2012, at 12:00 PM, Itamar Syn-Hershko wrote:
 
   The cleanup consists mainly of going file by file with ReSharper and
 trying
   to get them as green as possible. Making a lot of fields readonly,
 removing
   unused vars and stuff like that. There are still loads of files left.
  
   I was also hoping to get to updating the spatial module with some
 recent
   updates, and to also support polygon searches. But that may take a bit
 more
   time, so it's really up to you guys (or we can open a vote for it).

Re: Lucene Nuget

2012-08-01 Thread Itamar Syn-Hershko

Yes, with the due release. I, for once, always mistake one for another.

On Wed, Aug 1, 2012 at 4:09 AM, Prescott Nasser geobmx...@hotmail.comwrote:




 There are two packages Lucene packages on nuget that are depreciated. With
 some updates nuget made a while ago, we have the ability to remove those
 packages. Do we want to?

Re: Outstanding issues for 3.0.3

2012-08-01 Thread Itamar Syn-Hershko

+1 from me too, then

On Wed, Aug 1, 2012 at 7:42 PM, Prescott Nasser geobmx...@hotmail.comwrote:

 Spatial could be something cool to look forward to in 3.6 IMO.

 I'm good with tagging what we have and I'd like to take a week to allow
 the community test the tag code against their stuff before cutting release
 binaries.

 +1 to going now.


 
  Date: Wed, 1 Aug 2012 19:31:45 +0300
  Subject: Re: Outstanding issues for 3.0.3
  From: ita...@code972.com
  To: lucene-net-dev@lucene.apache.org
 
  I agree
 
  What about the spatial stuff? you guys want to wait for it?
 
  On Wed, Aug 1, 2012 at 7:19 PM, Christopher Currens 
 currens.ch...@gmail.com
   wrote:
 
   I think that while it would be nice to get it done, it's a fairly large
   effort, and we might be better off with doing a release. The tests are
   massively changed between 3.0.3 and 3.6, so I think a lot of it will
 get
   cleaned up anyway during the port. Also, a little while back, I did
 clean
   up a lot of the test code to use Assert.Throws and to remove
 unnecessary
   variables, though that might have only been in catch statements. Either
   way, I think we just might be ready as it is.
  
   I am eager to start working on porting 3.6.
  
  
   Thanks,
   Christopher
  
   On Wed, Aug 1, 2012 at 9:14 AM, Itamar Syn-Hershko ita...@code972.com
   wrote:
  
I still have plenty to go on, but on a second thought we could do
 that
   work
just the same when we work towards 3.6, so I won't hold you off
 anymore
   
Up to Chris - he wanted to do some tests cleanup
   
Also, I'll be updating the Spatial contrib during the next week or so
   with
polygon support. I think we should hold off the release so we can
 provide
that as well, but I suggest we will take a vote on it, don't let me
 hold
you off.
   
On Wed, Aug 1, 2012 at 6:58 PM, Prescott Nasser 
 geobmx...@hotmail.com
wrote:
   
 Just wanted to check in - where do we feel like we stand? What is
 left
   to
 do - is there anything I can help with specifically? I'll have some
   spare
 cycles this weekend. I want to really make a push to get this
 ready to
roll
 and not let it languish

 ~P

 
  Date: Sat, 28 Jul 2012 20:38:10 +0300
  Subject: Re: Outstanding issues for 3.0.3
  From: ita...@code972.com
  To: lucene-net-dev@lucene.apache.org
 
  Go ahead with contrib and tests, ill resume with core and
 coordinate
  further later
  On Jul 27, 2012 7:04 PM, Christopher Currens 
currens.ch...@gmail.com
  wrote:
 
   I've got resharper and can help with that if you'd like to
   coordinate
 it.
   I can take a one or some of the contrib projects or part of the
   main
   library, or *shudder* the any of the test libraries. The code
 has
 needed
   come cleaning up for a while and some of the clean up work is
 an
   optimization some levels, so I'm definitely okay with spending
 some
 time
   doing that. I'm okay with waiting longer as long as something
 is
 getting
   done.
  
  
   Thanks,
   Christopher
  
   On Fri, Jul 27, 2012 at 9:00 AM, Itamar Syn-Hershko 
 ita...@code972.com
   wrote:
  
The cleanup consists mainly of going file by file with
 ReSharper
and
   trying
to get them as green as possible. Making a lot of fields
   readonly,
   removing
unused vars and stuff like that. There are still loads of
 files
left.
   
I was also hoping to get to updating the spatial module with
 some
 recent
updates, and to also support polygon searches. But that may
 take
   a
 bit
   more
time, so it's really up to you guys (or we can open a vote
 for
   it).
   
On Fri, Jul 27, 2012 at 6:35 PM, Christopher Currens 
currens.ch...@gmail.com wrote:
   
 Itamar,

 Where do we stand on the clean up now? Is there anything in
 particular
 that you're doing that you'd like help with? I have some
 free
time
   today
 and am eager to get this version released.


 Thanks,
 Christopher


 On Sat, Jul 21, 2012 at 1:02 PM, Prescott Nasser 
   geobmx...@hotmail.com
 wrote:

 
  Alright, I'll hold off a bit.
 
  
   Date: Sat, 21 Jul 2012 22:59:32 +0300
   Subject: Re: Outstanding issues for 3.0.3
   From: ita...@code972.com
   To: lucene-net-u...@lucene.apache.org
   CC: lucene-net-dev@lucene.apache.org
  
   Actually there was some clean up work I started doing
 and
would
   want
to
   complete, and also sign off on the suspected corruption
   issue
 we

Re: Outstanding issues for 3.0.3

2012-08-01 Thread Itamar Syn-Hershko

Yes, we could also release a 3.0.10 or something with the improved spatial
module. Or I can race Prescott's week and get it in before it ends :)

And for heaven's sake, can we move to git when graduating? A live crash
course to all committers is on me.

On Wed, Aug 1, 2012 at 7:42 PM, Christopher Currens currens.ch...@gmail.com
 wrote:

 Ah, I did overlook that.  I imagine that the move from 3.0.3 to 3.6 will
 realistically take a while, so if we can't get spatial stuff out before
 then, would it take until 3.6 to be able to release new functionality into
 the spatial contrib project?  Along those lines, I propose that we move
 3.0.3 into a new branch instead of just tagging the release and merging in
 3.6.  That way, during the time it takes to port 3.6, we can still do any
 critical bug fixes and features like these and release new versions.  At
 least then, people won't be waiting for months for bug fixes.


 If we did that, then it also might not be critical to get the spatial stuff
 out with this release, since we could get out a new release in a few weeks
 with updated spatial libraries...not that I'm against waiting for it now.
  It was just a suggestion on how we can move forward with the project.
  Thoughts either way on this?



 On Wed, Aug 1, 2012 at 9:31 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:

  I agree
 
  What about the spatial stuff? you guys want to wait for it?
 
  On Wed, Aug 1, 2012 at 7:19 PM, Christopher Currens 
  currens.ch...@gmail.com
   wrote:
 
   I think that while it would be nice to get it done, it's a fairly large
   effort, and we might be better off with doing a release.  The tests are
   massively changed between 3.0.3 and 3.6, so I think a lot of it will
 get
   cleaned up anyway during the port.  Also, a little while back, I did
  clean
   up a lot of the test code to use Assert.Throws and to remove
 unnecessary
   variables, though that might have only been in catch statements.
  Either
   way, I think we just might be ready as it is.
  
   I am eager to start working on porting 3.6.
  
  
   Thanks,
   Christopher
  
   On Wed, Aug 1, 2012 at 9:14 AM, Itamar Syn-Hershko ita...@code972.com
   wrote:
  
I still have plenty to go on, but on a second thought we could do
 that
   work
just the same when we work towards 3.6, so I won't hold you off
 anymore
   
Up to Chris - he wanted to do some tests cleanup
   
Also, I'll be updating the Spatial contrib during the next week or so
   with
polygon support. I think we should hold off the release so we can
  provide
that as well, but I suggest we will take a vote on it, don't let me
  hold
you off.
   
On Wed, Aug 1, 2012 at 6:58 PM, Prescott Nasser 
 geobmx...@hotmail.com
wrote:
   
 Just wanted to check in - where do we feel like we stand? What is
  left
   to
 do - is there anything I can help with specifically? I'll have some
   spare
 cycles this weekend. I want to really make a push to get this ready
  to
roll
 and not let it languish

 ~P

 
  Date: Sat, 28 Jul 2012 20:38:10 +0300
  Subject: Re: Outstanding issues for 3.0.3
  From: ita...@code972.com
  To: lucene-net-dev@lucene.apache.org
 
  Go ahead with contrib and tests, ill resume with core and
  coordinate
  further later
  On Jul 27, 2012 7:04 PM, Christopher Currens 
currens.ch...@gmail.com
  wrote:
 
   I've got resharper and can help with that if you'd like to
   coordinate
 it.
   I can take a one or some of the contrib projects or part of the
   main
   library, or *shudder* the any of the test libraries. The code
 has
 needed
   come cleaning up for a while and some of the clean up work is
 an
   optimization some levels, so I'm definitely okay with spending
  some
 time
   doing that. I'm okay with waiting longer as long as something
 is
 getting
   done.
  
  
   Thanks,
   Christopher
  
   On Fri, Jul 27, 2012 at 9:00 AM, Itamar Syn-Hershko 
 ita...@code972.com
   wrote:
  
The cleanup consists mainly of going file by file with
  ReSharper
and
   trying
to get them as green as possible. Making a lot of fields
   readonly,
   removing
unused vars and stuff like that. There are still loads of
 files
left.
   
I was also hoping to get to updating the spatial module with
  some
 recent
updates, and to also support polygon searches. But that may
  take
   a
 bit
   more
time, so it's really up to you guys (or we can open a vote
 for
   it).
   
On Fri, Jul 27, 2012 at 6:35 PM, Christopher Currens 
currens.ch...@gmail.com wrote:
   
 Itamar,

 Where do we stand on the clean up now? Is there anything in
 particular
 that you're doing that you'd like help with? I have some
 free

Re: Outstanding issues for 3.0.3

2012-08-01 Thread Itamar Syn-Hershko

On that note, see git-flow
http://nvie.com/posts/a-successful-git-branching-model/  :)

On Wed, Aug 1, 2012 at 7:49 PM, Prescott Nasser geobmx...@hotmail.comwrote:

 That's probably not a bad idea - we should probably move to a structure
 like that anyway going forward so that it's easier to manage bug fixes and
 minor updates in between the big work

 
  Date: Wed, 1 Aug 2012 09:42:40 -0700
  Subject: Re: Outstanding issues for 3.0.3
  From: currens.ch...@gmail.com
  To: lucene-net-dev@lucene.apache.org
 
  Ah, I did overlook that. I imagine that the move from 3.0.3 to 3.6 will
  realistically take a while, so if we can't get spatial stuff out before
  then, would it take until 3.6 to be able to release new functionality
 into
  the spatial contrib project? Along those lines, I propose that we move
  3.0.3 into a new branch instead of just tagging the release and merging
 in
  3.6. That way, during the time it takes to port 3.6, we can still do any
  critical bug fixes and features like these and release new versions. At
  least then, people won't be waiting for months for bug fixes.
 
  If we did that, then it also might not be critical to get the spatial
 stuff
  out with this release, since we could get out a new release in a few
 weeks
  with updated spatial libraries...not that I'm against waiting for it now.
  It was just a suggestion on how we can move forward with the project.
  Thoughts either way on this?
 
 
  On Wed, Aug 1, 2012 at 9:31 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:
 
   I agree
  
   What about the spatial stuff? you guys want to wait for it?
  
   On Wed, Aug 1, 2012 at 7:19 PM, Christopher Currens 
   currens.ch...@gmail.com
wrote:
  
I think that while it would be nice to get it done, it's a fairly
 large
effort, and we might be better off with doing a release. The tests
 are
massively changed between 3.0.3 and 3.6, so I think a lot of it will
 get
cleaned up anyway during the port. Also, a little while back, I did
   clean
up a lot of the test code to use Assert.Throws and to remove
 unnecessary
variables, though that might have only been in catch statements.
 Either
way, I think we just might be ready as it is.
   
I am eager to start working on porting 3.6.
   
   
Thanks,
Christopher
   
On Wed, Aug 1, 2012 at 9:14 AM, Itamar Syn-Hershko 
 ita...@code972.com
wrote:
   
 I still have plenty to go on, but on a second thought we could do
 that
work
 just the same when we work towards 3.6, so I won't hold you off
 anymore

 Up to Chris - he wanted to do some tests cleanup

 Also, I'll be updating the Spatial contrib during the next week or
 so
with
 polygon support. I think we should hold off the release so we can
   provide
 that as well, but I suggest we will take a vote on it, don't let me
   hold
 you off.

 On Wed, Aug 1, 2012 at 6:58 PM, Prescott Nasser 
 geobmx...@hotmail.com
 wrote:

  Just wanted to check in - where do we feel like we stand? What is
   left
to
  do - is there anything I can help with specifically? I'll have
 some
spare
  cycles this weekend. I want to really make a push to get this
 ready
   to
 roll
  and not let it languish
 
  ~P
 
  
   Date: Sat, 28 Jul 2012 20:38:10 +0300
   Subject: Re: Outstanding issues for 3.0.3
   From: ita...@code972.com
   To: lucene-net-dev@lucene.apache.org
  
   Go ahead with contrib and tests, ill resume with core and
   coordinate
   further later
   On Jul 27, 2012 7:04 PM, Christopher Currens 
 currens.ch...@gmail.com
   wrote:
  
I've got resharper and can help with that if you'd like to
coordinate
  it.
I can take a one or some of the contrib projects or part of
 the
main
library, or *shudder* the any of the test libraries. The
 code has
  needed
come cleaning up for a while and some of the clean up work
 is an
optimization some levels, so I'm definitely okay with
 spending
   some
  time
doing that. I'm okay with waiting longer as long as
 something is
  getting
done.
   
   
Thanks,
Christopher
   
On Fri, Jul 27, 2012 at 9:00 AM, Itamar Syn-Hershko 
  ita...@code972.com
wrote:
   
 The cleanup consists mainly of going file by file with
   ReSharper
 and
trying
 to get them as green as possible. Making a lot of fields
readonly,
removing
 unused vars and stuff like that. There are still loads of
 files
 left.

 I was also hoping to get to updating the spatial module
 with
   some
  recent
 updates, and to also support polygon searches. But that may
   take
a
  bit
more
 time, so it's really up

Re: Outstanding issues for 3.0.3

2012-08-01 Thread Itamar Syn-Hershko

I agree

What about the spatial stuff? you guys want to wait for it?

On Wed, Aug 1, 2012 at 7:19 PM, Christopher Currens currens.ch...@gmail.com
 wrote:

 I think that while it would be nice to get it done, it's a fairly large
 effort, and we might be better off with doing a release.  The tests are
 massively changed between 3.0.3 and 3.6, so I think a lot of it will get
 cleaned up anyway during the port.  Also, a little while back, I did clean
 up a lot of the test code to use Assert.Throws and to remove unnecessary
 variables, though that might have only been in catch statements.  Either
 way, I think we just might be ready as it is.

 I am eager to start working on porting 3.6.


 Thanks,
 Christopher

 On Wed, Aug 1, 2012 at 9:14 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:

  I still have plenty to go on, but on a second thought we could do that
 work
  just the same when we work towards 3.6, so I won't hold you off anymore
 
  Up to Chris - he wanted to do some tests cleanup
 
  Also, I'll be updating the Spatial contrib during the next week or so
 with
  polygon support. I think we should hold off the release so we can provide
  that as well, but I suggest we will take a vote on it, don't let me hold
  you off.
 
  On Wed, Aug 1, 2012 at 6:58 PM, Prescott Nasser geobmx...@hotmail.com
  wrote:
 
   Just wanted to check in - where do we feel like we stand? What is left
 to
   do - is there anything I can help with specifically? I'll have some
 spare
   cycles this weekend. I want to really make a push to get this ready to
  roll
   and not let it languish
  
   ~P
  
   
Date: Sat, 28 Jul 2012 20:38:10 +0300
Subject: Re: Outstanding issues for 3.0.3
From: ita...@code972.com
To: lucene-net-...@lucene.apache.org
   
Go ahead with contrib and tests, ill resume with core and coordinate
further later
On Jul 27, 2012 7:04 PM, Christopher Currens 
  currens.ch...@gmail.com
wrote:
   
 I've got resharper and can help with that if you'd like to
 coordinate
   it.
 I can take a one or some of the contrib projects or part of the
 main
 library, or *shudder* the any of the test libraries. The code has
   needed
 come cleaning up for a while and some of the clean up work is an
 optimization some levels, so I'm definitely okay with spending some
   time
 doing that. I'm okay with waiting longer as long as something is
   getting
 done.


 Thanks,
 Christopher

 On Fri, Jul 27, 2012 at 9:00 AM, Itamar Syn-Hershko 
   ita...@code972.com
 wrote:

  The cleanup consists mainly of going file by file with ReSharper
  and
 trying
  to get them as green as possible. Making a lot of fields
 readonly,
 removing
  unused vars and stuff like that. There are still loads of files
  left.
 
  I was also hoping to get to updating the spatial module with some
   recent
  updates, and to also support polygon searches. But that may take
 a
   bit
 more
  time, so it's really up to you guys (or we can open a vote for
 it).
 
  On Fri, Jul 27, 2012 at 6:35 PM, Christopher Currens 
  currens.ch...@gmail.com wrote:
 
   Itamar,
  
   Where do we stand on the clean up now? Is there anything in
   particular
   that you're doing that you'd like help with? I have some free
  time
 today
   and am eager to get this version released.
  
  
   Thanks,
   Christopher
  
  
   On Sat, Jul 21, 2012 at 1:02 PM, Prescott Nasser 
 geobmx...@hotmail.com
   wrote:
  
   
Alright, I'll hold off a bit.
   

 Date: Sat, 21 Jul 2012 22:59:32 +0300
 Subject: Re: Outstanding issues for 3.0.3
 From: ita...@code972.com
 To: lucene-net-u...@lucene.apache.org
 CC: lucene-net-...@lucene.apache.org

 Actually there was some clean up work I started doing and
  would
 want
  to
 complete, and also sign off on the suspected corruption
 issue
   we
   raised.
 I'm afraid I won't have much time this week to properly do
  all
 that,
   but
 I'll keep you posted.

 On Sat, Jul 21, 2012 at 10:20 PM, Prescott Nasser 
   geobmx...@hotmail.com
wrote:

 
  Alright, latest patch fixed what could be done with the
 cls
 issues
  at
  present. With that, I think we are ready to roll with a
   release.
 If
people
  could please take some time to run all the test as well
 as
 whatever
other
  tests they might run. We've had some issues with tests
 only
  happening
on
  some systems so I want to make sure we have those bases
   covered.
   Unless
  there is anything else that should be done, I'll leave
  every
   one

Re: Outstanding issues for 3.0.3

2012-07-28 Thread Itamar Syn-Hershko

Go ahead with contrib and tests, ill resume with core and coordinate
further later
On Jul 27, 2012 7:04 PM, Christopher Currens currens.ch...@gmail.com
wrote:

 I've got resharper and can help with that if you'd like to coordinate it.
  I can take a one or some of the contrib projects or part of the main
 library, or *shudder* the any of the test libraries.  The code has needed
 come cleaning up for a while and some of the clean up work is an
 optimization some levels, so I'm definitely okay with spending some time
 doing that.  I'm okay with waiting longer as long as something is getting
 done.


 Thanks,
 Christopher

 On Fri, Jul 27, 2012 at 9:00 AM, Itamar Syn-Hershko ita...@code972.com
 wrote:

  The cleanup consists mainly of going file by file with ReSharper and
 trying
  to get them as green as possible. Making a lot of fields readonly,
 removing
  unused vars and stuff like that. There are still loads of files left.
 
  I was also hoping to get to updating the spatial module with some recent
  updates, and to also support polygon searches. But that may take a bit
 more
  time, so it's really up to you guys (or we can open a vote for it).
 
  On Fri, Jul 27, 2012 at 6:35 PM, Christopher Currens 
  currens.ch...@gmail.com wrote:
 
   Itamar,
  
   Where do we stand on the clean up now?  Is there anything in particular
   that you're doing that you'd like help with?  I have some free time
 today
   and am eager to get this version released.
  
  
   Thanks,
   Christopher
  
  
   On Sat, Jul 21, 2012 at 1:02 PM, Prescott Nasser 
 geobmx...@hotmail.com
   wrote:
  
   
Alright, I'll hold off a bit.
   

 Date: Sat, 21 Jul 2012 22:59:32 +0300
 Subject: Re: Outstanding issues for 3.0.3
 From: ita...@code972.com
 To: lucene-net-u...@lucene.apache.org
 CC: lucene-net-dev@lucene.apache.org

 Actually there was some clean up work I started doing and would
 want
  to
 complete, and also sign off on the suspected corruption issue we
   raised.
 I'm afraid I won't have much time this week to properly do all
 that,
   but
 I'll keep you posted.

 On Sat, Jul 21, 2012 at 10:20 PM, Prescott Nasser 
   geobmx...@hotmail.com
wrote:

 
  Alright, latest patch fixed what could be done with the cls
 issues
  at
  present. With that, I think we are ready to roll with a release.
 If
people
  could please take some time to run all the test as well as
 whatever
other
  tests they might run. We've had some issues with tests only
  happening
on
  some systems so I want to make sure we have those bases covered.
   Unless
  there is anything else that should be done, I'll leave every one
 a
week to
  run their tests. Next saturday I will tag the trunk and cut a
  release
with
  both 3.5 and 4.0 binaries. Great work everyone. ~P
   Date: Mon, 9 Jul 2012 18:02:30 -0700
   Subject: Re: Outstanding issues for 3.0.3
   From: currens.ch...@gmail.com
   To: lucene-net-dev@lucene.apache.org
  
   I can set a different build target, but I can't set the actual
framework
  to
   3.5 without doing it for all build configurations. On top of
  that,
3.5
   needs System.Core to be referenced, which is done automatically
  in
.NET 4
   (I'm not sure if MSBuild v4 does it automatically?). I did
 kinda
   get
it
   working by putting a TargetFrameworkVersion tag of 4.0 in Debug
  and
  Release
   configurations and 3.5 in Debug 3.5 and Release 3.5
  configurations,
but
   that's a little...well, difficult to maintain by hand since
  visual
studio
   doesn't allow you to set different framework versions per
configuration,
   and visual studio seemed to be having trouble with references,
   since
both
   frameworks were being referenced.
  
   On Mon, Jul 9, 2012 at 5:57 PM, Prescott Nasser 
geobmx...@hotmail.com
  wrote:
  
   
What do you mean doesn't work at the project level? I
 created a
  different
build target NET35 and then we had Debug and Release still,
  that
  seemed to
work for me. But I feel like I'm missing something in your
  explaination.
Good work though!
 Date: Mon, 9 Jul 2012 17:51:36 -0700
 Subject: Re: Outstanding issues for 3.0.3
 From: currens.ch...@gmail.com
 To: lucene-net-dev@lucene.apache.org

 I've got it working, compiling and all test passing...The
  only
  caveat is
 that I'm not sure the best way to multi-target. It doesn't
   really
  work
on
 a project level, so you'd have to create two separate
  projects,
one
  for
 .NET 4 and the other for 3.5. To aid me, I wrote a small
 tool
that
creates
 copies of all of the 4.0 projects and solutions to work
  against

Re: Outstanding issues for 3.0.3

2012-07-21 Thread Itamar Syn-Hershko

 in the community. I
 would
 love to
see that make it into 3.0.3, and would be able to pick up
 where
 anyone
   had
left off or take part of it, if they don't have time to work
 on
   it.
 In
regards to LUCENENET-446, I agree that it is pretty much
   complete. I
   think
I've looked several times at it to confirm most/all methods
 have
   been
converted, so this week I'll do a final check and close it
 out.
   
   
Thanks,
Christopher
   
On Sun, Jul 8, 2012 at 12:28 PM, Simon Svensson 
   si...@devhost.se
   wrote:
   
 The tests that failed when using culture=sv-se seems fixed.


 On 2012-07-08 20:44, Itamar Syn-Hershko wrote:

 What's the status on the failing tests we had?

 On Sun, Jul 8, 2012 at 9:02 PM, Prescott Nasser 
   geobmx...@hotmail.com
 wrote:

 Three issues left that I see:



 Fixing the build output, I did some work, but I'm good on
   this,
 we
   can
 move the rest of work to 3.6
 https://issues.apache.org/**jira/browse/LUCENENET-456
   https://issues.apache.org/jira/browse/LUCENENET-456



 CLS Compliance
 https://issues.apache.org/**jira/browse/LUCENENET-446
   https://issues.apache.org/jira/browse/LUCENENET-446.
 Are
 we ok with this as for now? There are still a good
 number of
 issues
 where,
 some we can't really fix (sbyte and volatile are out of
 scope
 imo).
   In a
 similiar vein, our own code uses some obsolete methods
 and we
 have a
   lot
 of
 variable declared but never used warnings (mentally, I
 treat
   most
   warning
 as an error)



 GetX/SetX -
 https://issues.apache.org/**jira/browse/LUCENENET-470
   https://issues.apache.org/jira/browse/LUCENENET-470.
 I think
 much of this has been removed, there are probably some
 pieces
 that
   left
 (and we have a difference of opinion in the group as
 well).





 I really think the only outstanding issue is the CLS
   compliance
 one,
   the
 rest can be moved to 3.6. With CLS compliance we have to
 ask
   if
 we've
 done
 enough for that so far, or if more is needed. I
 personally
   would
   like to
 see us make any API changes now, with the 3.0.3 release,
 but
   if
 we
   are
 comfortable with it, lets roll.



 What are your thoughts?



 ~P





 --**--

 From: thowar...@gmail.com
 Date: Mon, 25 Jun 2012 10:34:37 -0700
 Subject: Re: Outstanding issues for 3.0.3
 To: lucene-net-dev@lucene.apache.**org
   lucene-net-dev@lucene.apache.org

 Assuming we're talking about the packaging/filesystem
   structure
 in
   the
 releases, the structure is a little of both (ours vs
 Apache's)...
 Basically, I went through most of the Apache projects to
   see how
   they
 packaged releases and developed a structure that was
 very
 similar
   but
 encompassed everything we needed. So, it's informed by
 the
   organically
 emergent structures that ASF uses.

 -T


 On Mon, Jun 25, 2012 at 7:32 AM, Prescott Nasser 
   geobmx...@hotmail.com
 

 wrote:

 I have no idea why I thought we were using Nant.
 I think it's just our release structure. I figured a
   little
 out
   this

 weekend, splitting the XML and .dll files into separate
   directories. The
 documentation you have on the wiki was actually pretty
   helpful.

 Whatever more you can add would be great

 ~P

 Date: Mon, 25 Jun 2012 10:04:21 -0400
 Subject: Re: Outstanding issues for 3.0.3
 From: mhern...@wickedsoftware.net
 To: lucene-net-dev@lucene.apache.**org
   lucene-net-dev@lucene.apache.org

 On Sat, Jun 23, 2012 at 1:38 AM, Prescott Nasser 

 geobmx...@hotmail.comwrote:


 -- Task 470, a non-serious one, is listed only
 because
   it's

 mostly done

 and

 just need a few loose ends tied up. I'll hopefully
 have
 time to

 take care

 of that this weekend.


 How many GetX/SetX are left? I did a quick search for
 'public *

 Get*()'

 Most

Re: Outstanding issues for 3.0.3

2012-07-08 Thread Itamar Syn-Hershko

What's the status on the failing tests we had?

On Sun, Jul 8, 2012 at 9:02 PM, Prescott Nasser geobmx...@hotmail.comwrote:


 Three issues left that I see:



 Fixing the build output, I did some work, but I'm good on this, we can
 move the rest of work to 3.6
 https://issues.apache.org/jira/browse/LUCENENET-456



 CLS Compliance https://issues.apache.org/jira/browse/LUCENENET-446. Are
 we ok with this as for now? There are still a good number of issues where,
 some we can't really fix (sbyte and volatile are out of scope imo). In a
 similiar vein, our own code uses some obsolete methods and we have a lot of
 variable declared but never used warnings (mentally, I treat most warning
 as an error)



 GetX/SetX - https://issues.apache.org/jira/browse/LUCENENET-470. I think
 much of this has been removed, there are probably some pieces that left
 (and we have a difference of opinion in the group as well).





 I really think the only outstanding issue is the CLS compliance one, the
 rest can be moved to 3.6. With CLS compliance we have to ask if we've done
 enough for that so far, or if more is needed. I personally would like to
 see us make any API changes now, with the 3.0.3 release, but if we are
 comfortable with it, lets roll.



 What are your thoughts?



 ~P





 
  From: thowar...@gmail.com
  Date: Mon, 25 Jun 2012 10:34:37 -0700
  Subject: Re: Outstanding issues for 3.0.3
  To: lucene-net-dev@lucene.apache.org
 
  Assuming we're talking about the packaging/filesystem structure in the
  releases, the structure is a little of both (ours vs Apache's)...
  Basically, I went through most of the Apache projects to see how they
  packaged releases and developed a structure that was very similar but
  encompassed everything we needed. So, it's informed by the organically
  emergent structures that ASF uses.
 
  -T
 
 
  On Mon, Jun 25, 2012 at 7:32 AM, Prescott Nasser geobmx...@hotmail.com
 wrote:
  
   I have no idea why I thought we were using Nant.
   I think it's just our release structure. I figured a little out this
 weekend, splitting the XML and .dll files into separate directories. The
 documentation you have on the wiki was actually pretty helpful.
   Whatever more you can add would be great
  
   ~P
  
   Date: Mon, 25 Jun 2012 10:04:21 -0400
   Subject: Re: Outstanding issues for 3.0.3
   From: mhern...@wickedsoftware.net
   To: lucene-net-dev@lucene.apache.org
  
   On Sat, Jun 23, 2012 at 1:38 AM, Prescott Nasser 
 geobmx...@hotmail.comwrote:
  
   
   
 -- Task 470, a non-serious one, is listed only because it's
 mostly done
and
 just need a few loose ends tied up. I'll hopefully have time to
 take care
 of that this weekend.
   
   
How many GetX/SetX are left? I did a quick search for 'public *
 Get*()'
Most of them looked to be actual methods - perhaps a few to replace
   
   
 -- Task 446 (CLS Compliance), is important, but there's no way we
 can get
 this done quickly. The current state of this issue is that all of
 the
 names of public members are now compliant. There are a few things
 that
 aren't, the use of sbyte (particularly those related to the
 FieldCache)
and
 some conflicts with *protected or internal* fields (some with
 public
 members). Opinions on this one will be appreciated the most. My
 opinion
 is that we should draw a line on the amount of CLS compliance to
 have in
 this release, and push the rest into 3.5.
   
   
   
I count roughly 53 CLS compliant issues. the sbyte stuff will run
 into
trouble when you do bit shifting (I ran into this issue when trying
 to do
this for 2.9.4. I'd like to see if we can't get rid of the easier
 stuff
(internal/protected stuff). I would not try getting rid of sbyte or
volatile for thile release. It's going to take some serious
 consideration
to get rid of those
   
   
 -- Improvement 337 - Are we going to add this code (not present
 in java)
to
 the core library?
   
   
   
I'd skip it and re-evaluate the community desire for this in 3.5.
   
   
 -- Improvement 456 - This is related to builds being output in
 Apache's
 release format. Do we want to do this for this release?

   
   
I looked into this last weekend - I'm terrible with Nant, so I
 didn't get
anywhere. It would be nice to have, but I don't think I'll figure
 it out.
If Michael has some time to maybe make the adjustment, he knows
 these
scripts best. If not I'm going to look into it, but I don't call
 this a
show stopper - either we have it or we don't when the rest is done.
   
  
   With some Flo Rida and expresso shots, anything is possible.
  
   Did we switch to Nant?
  
   I saw the jira ticket for this. Is there an official apache release
   structure or this just our* apache release structure that we are
 using?
   Can I take the latest release and use that to model the structure you

Re: [VOTE] Apache Lucene.Net ready for graduation?

2012-07-08 Thread Itamar Syn-Hershko

+1 for graduation

I still think graduation should be in sync with the 3.0.3 release and a
press release on work towards 3.6 and 4.0 releases.

On Sun, Jul 8, 2012 at 8:44 PM, Prescott Nasser geobmx...@hotmail.comwrote:


 Hey All,

 This is the first step for graduation for the Apache Lucene.Net project
 (incubating of course..). We're taking a vote for the Lucene.Net community
 to see if the community is ready to govern itself as a top level project.


 Here is a short list of our accomplishments which I believe make us ready
 for graduation:
 - Released 2.9.4

 - Released 2.9.4g (Generics version)

 - created a new website, with a new logo (a 99designs contest gracious
 supported by stackoverflow)

 - Added two new committers bringing our total to 9.

 - Preparing for 3.0.3 Release within the next couple of weeks

 - Started work on 3.5 release.

 This is the process we will follow:
 - Community vote (this email). All votes count, there is no non-binding /
 binding status for this
 - We will propose a resolution for review (
 https://cwiki.apache.org/confluence/display/LUCENENET/Graduation+-+Resolution+Template
 )
 - We will call a vote on the resolution in general @ incubator
 - A Board resolution will be submitted.





 As a community, if you would please vote:



 [1] Ready for graduation

 [-1] Not ready because...




 I know I speak for all the developers on this project, we appreciate (and
 will continue to appreciate) everyone's contributions via the mailing list
 and jira.




 ~Prescott

Re: svn commit: r1353075 - /incubator/lucene.net/branches/Lucene.Net_3_5/

2012-06-23 Thread Itamar Syn-Hershko

Why 3.5 and not 3.6?

In my opinion we should skip all versions in between 3.0.3 and 3.6, and
just port 3.6 after we released 3.0.3. Lucene 4 will probably be released
by the time we are done, and then we could move on to porting it.

On Sat, Jun 23, 2012 at 9:35 AM, pnas...@apache.org wrote:

 Author: pnasser
 Date: Sat Jun 23 06:35:44 2012
 New Revision: 1353075

 URL: http://svn.apache.org/viewvc?rev=1353075view=rev
 Log:
 Branching for 3.5

 Added:
incubator/lucene.net/branches/Lucene.Net_3_5/   (props changed)
  - copied from r1353074, incubator/lucene.net/trunk/

 Propchange: incubator/lucene.net/branches/Lucene.Net_3_5/

 --
 --- svn:mergeinfo (added)
 +++ svn:mergeinfo Sat Jun 23 06:35:44 2012
 @@ -0,0 +1,2 @@
 +/incubator/lucene.net/branches/Lucene.Net.3.0.3/trunk:1199075-1294851*
 +/incubator/lucene.net/trunk:1199072-1294798*

Re: Endian types

2012-06-20 Thread Itamar Syn-Hershko

To add to this - Lucene 4x is still being worked on in the Java front. We
rather put efforts on porting v3.6 and start on v4 once there is an
official Java release

Thanks for your efforts!

On Wed, Jun 20, 2012 at 6:19 PM, Prescott Nasser geobmx...@hotmail.comwrote:

 How much are you trying to port? I've got it on my roadmap to work with
 sharpen to try and get most of it auto ported. Any porting help is of
 course appreciated and welcome - but if you so have some time and are so
 inclined we could use more people helping on the sharpen front.
 
 From: Oren Eini (Ayende Rahien)
 Sent: 6/20/2012 7:52 AM
 To: lucene-net-...@lucene.apache.org
 Subject: Re: Endian types

 I would assume that you would have to match the java behavior, if only to
 make sure that the index format matched.

 On Wed, Jun 20, 2012 at 5:47 PM, Kim Christensen k...@dubex.dk wrote:

  Hi all,
 
  I was looking into porting some Lucene 4x code, and ran into the issue
  about Big-Endian and Little-Endian.
  What is the standpoint on this? Always Big-Endian as Java does it?
 
  Regards,
  Kim

[jira] [Commented] (LUCENENET-495) Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations

2012-06-17 Thread Itamar Syn-Hershko (JIRA)

[
https://issues.apache.org/jira/browse/LUCENENET-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13393629#comment-13393629
]

Itamar Syn-Hershko commented on LUCENENET-495:
--

1. IMO, if there is a thread safety bug, it needs to be fixed

2. Why do we have AddIfNotContains(Hashtable, object), and we are not using
ConcurrentDictionary?

Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime
object allocations
--

Key: LUCENENET-495
URL: https://issues.apache.org/jira/browse/LUCENENET-495
Project: Lucene.Net
Issue Type: Bug
Components: Lucene.Net Core
Affects Versions: Lucene.Net 2.9.4, Lucene.Net 3.0.3
Reporter: Christopher Currens
Assignee: Christopher Currens
Priority: Critical
Fix For: Lucene.Net 3.0.3

This issue mostly just affects RAMDirectory. However, RAMFile and
RAMOutputStream are used in other (all?) directory implementations, including
FSDirectory types.
In RAMOutputStream, the file last modified property for the RAMFile is
updated when the stream is flushed. It's calculated using
{{DateTime.Now.Ticks / TimeSpan.TicksPerMillisecond}}. I've read before that
Microsoft has regretted making DateTime.Now a property instead of a method,
and after seeing what it's doing, I'm starting to understand why.
DateTime.Now is returning local time. In order for it to calculate that, it
has to get the utf offset for the machine, which requires the creation of a
_class_, System.Globalization.DaylightTime. This is bad for performance.
Using code to write 10,000 small documents to an index (4kb sizes), it
created 1,570,157 of these DaylightTime classes, a total of 62MB of extra
memory...clearly RAMOutputStream.Flush() is called a lot.
A fix I'd like to propose is to change the RAMFile from storing the
LastModified date to UTC instead of local. DateTime.UtcNow doesn't create
any additional objects and is very fast. For this small benchmark, the
performance increase is 31%.
I've set it to convert to local-time, when {{RAMDirectory.LastModified(string
name)}} is called to make sure it has the same behavior (tests fail
otherwise). Are there any other side-effects to making this change?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENENET-495) Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations

2012-06-17 Thread Itamar Syn-Hershko (JIRA)

[
https://issues.apache.org/jira/browse/LUCENENET-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13393633#comment-13393633
]

Itamar Syn-Hershko commented on LUCENENET-495:
--

Makes sense

Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime
object allocations
--

[jira] [Commented] (LUCENENET-495) Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations

2012-06-16 Thread Itamar Syn-Hershko (JIRA)

[
https://issues.apache.org/jira/browse/LUCENENET-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13393363#comment-13393363
]

Itamar Syn-Hershko commented on LUCENENET-495:
--

Take a look at DateTimeOffset as well - this becomes the standard for .NET 4 +

Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime
object allocations
--

Re: Lets talk graduation

2012-06-15 Thread Itamar Syn-Hershko

+1 for releasing after graduation, then

With some careful PR and our sponsorship offer, we can get the project
flying

There's still some work to do anyway

On Fri, Jun 15, 2012 at 1:59 PM, Stefan Bodewig bode...@apache.org wrote:

 On 2012-06-14, Christopher Currens wrote:

  I've gone back and forth on whether I think we're ready for graduation or
  not.  I had always felt like we weren't because the project isn't as
 active
  as I'd like it to be.  However, I think I've been looking at it wrong.
   We've got a good enough process and we *have* made progress.

 Absolutely, and I think you are ready to graduate as well.

 As a response to Itamar: Lucene.Net could get more exposure by becoming
 a top level project.  In particular you could craft a press release
 together with the ASF's PR folks to celebrate the re-birth.

 The sponsoring offer is a great thing, IMHO.

  I'm up for starting this process, but I don't want it to take any time
  away from getting 3.0.3 released.

 Understood.  OTOH if you'd graduate first then 3.0.3 would be an
 official Apache release and didn't have to wear the incubating tag.
 Your call.

 If you want to do the 3.0.3 release first, I don't think that will be
 much of delay as it seems to be around the corner anyway.

 Stefan

Releasing 3.0.3

2012-06-14 Thread Itamar Syn-Hershko

Where do we stand with this?

I want to push to a 3.0.3 release, what items are still pending?

Itamar.

Re: Lets talk graduation

2012-06-14 Thread Itamar Syn-Hershko

IMHO, whatever brings more attention to the project, and I'm not sure
graduation is what this project needs right now. In the end it's just
semantics.

I'd focus those efforts on getting more work done and having more frequent
releases. Hence our proposition to sponsor dev, which still stands.

On Thu, Jun 14, 2012 at 6:24 PM, Prescott Nasser geobmx...@hotmail.comwrote:


 I think with the addition of two new committers we've made some progress
 in community growth. I think we'll have 3.0.3 out the door soon - are there
 any other items we think we need to address before looking to graduate?
 ~P

Re: Releasing 3.0.3

2012-06-14 Thread Itamar Syn-Hershko

Ok, and is the code in 100% compliance with the 3.0.3 Java code?

I'll be spending some time on fixing the index corruption issue, and it is
probably best for Chris to wrap up the work he has started

Anyone else on board to close some tickets?

On Thu, Jun 14, 2012 at 6:19 PM, Prescott Nasser geobmx...@hotmail.comwrote:


 Agreed -
 JIRA for 3.0.3
 https://issues.apache.org/jira/browse/LUCENENET/fixforversion/12316215#selectedTab=com.atlassian.jira.plugin.system.project%3Aversion-issues-panel
 We should evaluate all of these - fix them, mark as won't fix, or move
 them to another release version. I think the biggest hold up currently is:
 https://issues.apache.org/jira/browse/LUCENENET-484. Chris has made a
 huge dent, but there are two test cases that are still listed as failing (I
 can't even duplicate those failures to know where to start)
 Also we should look at all the other jira tickets and make updates where
 appropriate
 ~P
  Date: Thu, 14 Jun 2012 13:21:04 +0300
  Subject: Releasing 3.0.3
  From: ita...@code972.com
  To: lucene-net-dev@lucene.apache.org
 
  Where do we stand with this?
 
  I want to push to a 3.0.3 release, what items are still pending?
 
  Itamar.

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko

I'm quite certain this shouldn't happen also when Commit wasn't called.

Mike, can you comment on that?

On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
currens.ch...@gmail.com wrote:

Well, the only thing I see is that there is no place where writer.Commit()
is called in the delegate assigned to corpusReader.OnDocument. I know that
lucene is very transactional, and at least in 3.x, the writer will never
auto commit to the index. You can write millions of documents, but if
commit is never called, those documents aren't actually part of the index.
Committing isn't a cheap operation, so you definitely don't want to do it
on every document.

You can test it yourself with this (naive) solution. Right below the
writer.SetUseCompoundFile(false) line, add int numDocsAdded = 0;. At the
end of the corpusReader.OnDocument delegate add:

// Example only. I wouldn't suggest committing this often
if(++numDocsAdded % 5 == 0)
{
writer.Commit();
}

I had the application crash for real on this file:

http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2
,
about 20% into the operation. Without the commit, the index is empty. Add
it in, and I get 755 files in the index after it crashes.

Thanks,
Christopher

On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko ita...@code972.com
wrote:

Yes, reproduced in first try. See attached program - I referenced it to
current trunk.

On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko ita...@code972.com
wrote:

Christopher,

I used the IndexBuilder app from here
https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a
8.5GB wikipedia dump.

After running for 2.5 days I had to forcefully close it (infinite loop
in
the wiki-markdown parser at 92%, go figure), and the 40-something GB
index
I had by then was unusable. I then was able to reproduce this

Please note I now added a few safe-guards you might want to remove to
make sure the app really crashes on process kill.

I'll try to come up with a better way to reproduce this - hopefully Mike
will be able to suggest better ways than manual process kill...

On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens
currens.ch...@gmail.com wrote:

Mike, The codebase for lucene.net should be almost identical to java's
3.0.3 release, and LUCENE-1044 is included in that.

Itamar, are you committing the index regularly? I only ask because I
can't
reproduce it myself by forcibly terminating the process while it's
indexing. I've tried both 3.0.3 and 2.9.4. If I don't commit at all
and
terminate the process (even with a 10,000 4K documents created), there
will
be no documents in the index when I open it in luke, which I expect.
If
I
commit at 10,000 documents, and terminate it a few thousand after that,
the
index has the first ten thousand that were committed. I've even
terminated
it *while* a second commit was taking place, and it still had all of
the
documents I expected.

It may be that I'm not trying to reproducing it correctly. Do you
have a
minimal amount of code that can reproduce it?

Thanks,
Christopher

On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless
luc...@mikemccandless.com wrote:

Hi Itamar,

One quick question: does Lucene.Net include the fixes done for
LUCENE-1044 (to fsync files on commit)? Those are very important for
an index to be intact after OS/JVM crash or power loss.

More responses below:

On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko
ita...@code972.com
wrote:

I'm a Lucene.Net committer, and there is a chance we have a bug in
our
FSDirectory implementation that causes indexes to get corrupted
when
indexing is cut while the IW is still open. As it roots from some
retroactive fixes you made, I'd appreciate your feedback.

Correct me if I'm wrong, but by design Lucene should be able to
recover
rather quickly from power failures or app crashes. Since existing
segment
files are read only, only new segments that are still being written
can
get
corrupted. Hence, recovering from worst-case scenarios is done by
simply
removing the write.lock file. The worst that could happen then is
having
the
last segment damaged, and that can be fixed by removing those
files,
possibly by running CheckIndex on the index.

You shouldn't even have to run CheckIndex ... because (as of
LUCENE-1044) we now fsync all segment files before writing the new
segments_N file, and then removing old segments_N files (and any
segments that are no longer referenced).

You do have to remove the write.lock if you aren't using
NativeFSLockFactory (but this has been the default lock impl for a
while now).

Last week I have been playing with rather large indexes and crashed
my
app
while it was indexing. I wasn't able

Re: Releasing 3.0.3

2012-06-14 Thread Itamar Syn-Hershko

Sorry, misread your question

This can be easily done with xUnit, using Theories.

On Thu, Jun 14, 2012 at 9:26 PM, Itamar Syn-Hershko ita...@code972.comwrote:

Something like:

Thread.CurrentThread.CurrentCulture = cultureInfo;
Thread.CurrentThread.CurrentUICulture = cultureInfo;

And setting it back later when the test is done.

You can easily do this with an IDisposable like this:

using(new TemporaryCulture(culture)){
...
}

On Thu, Jun 14, 2012 at 9:10 PM, Simon Svensson si...@devhost.se wrote:

I've been thinking about LUCENENET-493 (Make Lucene.Net culture
insensitive). It's easy to fix the code, and verify it on my machine
(running CurrentCulture=sv-SE), but there are no tests to confirm the
changes. I've been looking for ways to build test cases for different
cultures, like the overridden runBare method used originally in the java
code, but NUnit does not seem to have any such abilities within the tests
themselves.

1) It is possible to build NUnit addins that could execute every test
[with special annotation?] once for every culture. Resharper supports NUnit
addins, provided they are manually placed in the correct folder under the
resharper application folder.
2) We could rewrite culture sensitive tests into method that holds the
logic, and several test methods with [SetCulture(...)], but this requires
knowledge about what tests are culture sensitive. We could also rewrite
every method into a foreach-loop, executing the test logic with every
culture.
3) Change unit testing framework.

Any thoughts?

On 2012-06-14 17:58, Prescott Nasser wrote:

I'm going to try and review some of them - looking at the 3.5 ticket
atm. The code should be in compliance with 3.0.3. We might want to do some
spot checking various parts of the code. I'm not sure about the tests.
Also, we should probably run some code coverage tools to see how much
coverage we have.
~P

Date: Thu, 14 Jun 2012 18:37:12 +0300
Subject: Re: Releasing 3.0.3
From: ita...@code972.com
To: lucene-net-dev@lucene.apache.**orglucene-net-dev@lucene.apache.org

Ok, and is the code in 100% compliance with the 3.0.3 Java code?

I'll be spending some time on fixing the index corruption issue, and it
is
probably best for Chris to wrap up the work he has started

Anyone else on board to close some tickets?

On Thu, Jun 14, 2012 at 6:19 PM, Prescott Nassergeobmx...@hotmail.com
**wrote:

Agreed -
JIRA for 3.0.3
https://issues.apache.org/**jira/browse/LUCENENET/**
fixforversion/12316215#**selectedTab=com.atlassian.**
jira.plugin.system.project%**3Aversion-issues-panelhttps://issues.apache.org/jira/browse/LUCENENET/fixforversion/12316215#selectedTab=com.atlassian.jira.plugin.system.project%3Aversion-issues-panel
We should evaluate all of these - fix them, mark as won't fix, or move
them to another release version. I think the biggest hold up currently
is:
https://issues.apache.org/**jira/browse/LUCENENET-484https://issues.apache.org/jira/browse/LUCENENET-484.
Chris has made a
huge dent, but there are two test cases that are still listed as
failing (I
can't even duplicate those failures to know where to start)
Also we should look at all the other jira tickets and make updates
where
appropriate
~P

Date: Thu, 14 Jun 2012 13:21:04 +0300
Subject: Releasing 3.0.3
From: ita...@code972.com
To: lucene-net-dev@lucene.apache.**orglucene-net-dev@lucene.apache.org

Where do we stand with this?

I want to push to a 3.0.3 release, what items are still pending?

Itamar.

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko

I can confirm 2.9.4 had autoCommit, but it is gone in 3.0.3 already, so
Lucene.Net doesn't have autoCommit.

So I don't have autoCommit set to true, but I can clearly see a segments_1
file there along with the other files. If that helpes, it always keeps with
the name segments_1 with 32 bytes, never changes.

And as again, if I kill the process and try to open the index with Luke
3.3, the index folder is being wiped out.

Not sure what to make of all that.

On Fri, Jun 15, 2012 at 3:21 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Hmm, OK: in 2.9.4 / 3.0.x, if you open IW on a new directory, it will
 make a zero-segment commit.  This was changed/fixed in 3.1 with
 LUCENE-2386.

 In 2.9.x (not 3.0.x) there is still an autoCommit parameter,
 defaulting to false, but if you set it to true then IndexWriter will
 periodically commit.

 Seeing segment files created and merge is definitely expected, but
 it's not expected to see segments_N files unless you pass
 autoCommit=true.

 Mike McCandless

 http://blog.mikemccandless.com

 On Thu, Jun 14, 2012 at 8:14 PM, Itamar Syn-Hershko ita...@code972.com
 wrote:
  Not what I'm seeing. I actually see a lot of segments created and merged
  while it operates. Expected?
 
  Reminding you, this is 2.9.4 / 3.0.3
 
  On Fri, Jun 15, 2012 at 3:10 AM, Michael McCandless
  luc...@mikemccandless.com wrote:
 
  Right: Lucene never autocommits anymore ...
 
  If you create a new index, add a bunch of docs, and things crash
  before you have a chance to commit, then there is no index (not even a
  0 doc one) in that directory.
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
  On Thu, Jun 14, 2012 at 1:41 PM, Itamar Syn-Hershko ita...@code972.com
 
  wrote:
   I'm quite certain this shouldn't happen also when Commit wasn't
 called.
  
   Mike, can you comment on that?
  
   On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
   currens.ch...@gmail.com wrote:
  
   Well, the only thing I see is that there is no place where
   writer.Commit()
   is called in the delegate assigned to corpusReader.OnDocument.  I
 know
   that
   lucene is very transactional, and at least in 3.x, the writer will
   never
   auto commit to the index.  You can write millions of documents, but
 if
   commit is never called, those documents aren't actually part of the
   index.
Committing isn't a cheap operation, so you definitely don't want to
 do
   it
   on every document.
  
   You can test it yourself with this (naive) solution.  Right below the
   writer.SetUseCompoundFile(false) line, add int numDocsAdded = 0;.
  At
   the
   end of the corpusReader.OnDocument delegate add:
  
   // Example only.  I wouldn't suggest committing this often
   if(++numDocsAdded % 5 == 0)
   {
  writer.Commit();
   }
  
   I had the application crash for real on this file:
  
  
  
 http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2
 ,
   about 20% into the operation.  Without the commit, the index is
 empty.
Add
   it in, and I get 755 files in the index after it crashes.
  
  
   Thanks,
   Christopher
  
   On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko
   ita...@code972.comwrote:
  
  
Yes, reproduced in first try. See attached program - I referenced
 it
to
current trunk.
   
   
On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko
ita...@code972.comwrote:
   
Christopher,
   
I used the IndexBuilder app from here
https://github.com/synhershko/Talks/tree/master/LuceneNeatThings
with a
8.5GB wikipedia dump.
   
After running for 2.5 days I had to forcefully close it (infinite
loop
in
the wiki-markdown parser at 92%, go figure), and the 40-something
 GB
index
I had by then was unusable. I then was able to reproduce this
   
Please note I now added a few safe-guards you might want to remove
to
make sure the app really crashes on process kill.
   
I'll try to come up with a better way to reproduce this -
 hopefully
Mike
will be able to suggest better ways than manual process kill...
   
On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens 
currens.ch...@gmail.com wrote:
   
Mike, The codebase for lucene.net should be almost identical to
java's
3.0.3 release, and LUCENE-1044 is included in that.
   
Itamar, are you committing the index regularly?  I only ask
 because
I
can't
reproduce it myself by forcibly terminating the process while
 it's
indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at
all
and
terminate the process (even with a 10,000 4K documents created),
there
will
be no documents in the index when I open it in luke, which I
expect.
 If
I
commit at 10,000 documents, and terminate it a few thousand after
that,
the
index has the first ten thousand that were committed.  I've even
terminated
it *while* a second commit was taking

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko

I'm quite certain this shouldn't happen also when Commit wasn't called.

Mike, can you comment on that?

On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
currens.ch...@gmail.com wrote:

You can test it yourself with this (naive) solution. Right below the
writer.SetUseCompoundFile(false) line, add int numDocsAdded = 0;. At the
end of the corpusReader.OnDocument delegate add:

// Example only. I wouldn't suggest committing this often
if(++numDocsAdded % 5 == 0)
{
writer.Commit();
}

I had the application crash for real on this file:

Thanks,
Christopher

On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko ita...@code972.com
wrote:

Yes, reproduced in first try. See attached program - I referenced it to
current trunk.

On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko ita...@code972.com
wrote:

Christopher,

I used the IndexBuilder app from here
https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a
8.5GB wikipedia dump.

Please note I now added a few safe-guards you might want to remove to
make sure the app really crashes on process kill.

I'll try to come up with a better way to reproduce this - hopefully Mike
will be able to suggest better ways than manual process kill...

On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens
currens.ch...@gmail.com wrote:

Mike, The codebase for lucene.net should be almost identical to java's
3.0.3 release, and LUCENE-1044 is included in that.

It may be that I'm not trying to reproducing it correctly. Do you
have a
minimal amount of code that can reproduce it?

Thanks,
Christopher

On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless
luc...@mikemccandless.com wrote:

Hi Itamar,

One quick question: does Lucene.Net include the fixes done for
LUCENE-1044 (to fsync files on commit)? Those are very important for
an index to be intact after OS/JVM crash or power loss.

More responses below:

On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko
ita...@code972.com
wrote:

You do have to remove the write.lock if you aren't using
NativeFSLockFactory (but this has been the default lock impl for a
while now).

Last week I have been playing with rather large indexes and crashed
my
app
while it was indexing. I wasn't able

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko

Not what I'm seeing. I actually see a lot of segments created and merged
while it operates. Expected?

Reminding you, this is 2.9.4 / 3.0.3

On Fri, Jun 15, 2012 at 3:10 AM, Michael McCandless
luc...@mikemccandless.com wrote:

Right: Lucene never autocommits anymore ...

If you create a new index, add a bunch of docs, and things crash
before you have a chance to commit, then there is no index (not even a
0 doc one) in that directory.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jun 14, 2012 at 1:41 PM, Itamar Syn-Hershko ita...@code972.com
wrote:
I'm quite certain this shouldn't happen also when Commit wasn't called.

Mike, can you comment on that?

On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
currens.ch...@gmail.com wrote:

Well, the only thing I see is that there is no place where
writer.Commit()
is called in the delegate assigned to corpusReader.OnDocument. I know
that
lucene is very transactional, and at least in 3.x, the writer will never
auto commit to the index. You can write millions of documents, but if
commit is never called, those documents aren't actually part of the
index.
Committing isn't a cheap operation, so you definitely don't want to do
it
on every document.

You can test it yourself with this (naive) solution. Right below the
writer.SetUseCompoundFile(false) line, add int numDocsAdded = 0;. At
the
end of the corpusReader.OnDocument delegate add:

// Example only. I wouldn't suggest committing this often
if(++numDocsAdded % 5 == 0)
{
writer.Commit();
}

I had the application crash for real on this file:

http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2
,
about 20% into the operation. Without the commit, the index is empty.
Add
it in, and I get 755 files in the index after it crashes.

Thanks,
Christopher

On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko
ita...@code972.comwrote:

Yes, reproduced in first try. See attached program - I referenced it
to
current trunk.

On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko
ita...@code972.comwrote:

Christopher,

I used the IndexBuilder app from here
https://github.com/synhershko/Talks/tree/master/LuceneNeatThingswith a
8.5GB wikipedia dump.

After running for 2.5 days I had to forcefully close it (infinite
loop
in
the wiki-markdown parser at 92%, go figure), and the 40-something GB
index
I had by then was unusable. I then was able to reproduce this

Please note I now added a few safe-guards you might want to remove to
make sure the app really crashes on process kill.

I'll try to come up with a better way to reproduce this - hopefully
Mike
will be able to suggest better ways than manual process kill...

On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens
currens.ch...@gmail.com wrote:

Mike, The codebase for lucene.net should be almost identical to
java's
3.0.3 release, and LUCENE-1044 is included in that.

Itamar, are you committing the index regularly? I only ask because
I
can't
reproduce it myself by forcibly terminating the process while it's
indexing. I've tried both 3.0.3 and 2.9.4. If I don't commit at
all
and
terminate the process (even with a 10,000 4K documents created),
there
will
be no documents in the index when I open it in luke, which I expect.
If
I
commit at 10,000 documents, and terminate it a few thousand after
that,
the
index has the first ten thousand that were committed. I've even
terminated
it *while* a second commit was taking place, and it still had all of
the
documents I expected.

It may be that I'm not trying to reproducing it correctly. Do you
have a
minimal amount of code that can reproduce it?

Thanks,
Christopher

On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless
luc...@mikemccandless.com wrote:

Hi Itamar,

One quick question: does Lucene.Net include the fixes done for
LUCENE-1044 (to fsync files on commit)? Those are very important
for
an index to be intact after OS/JVM crash or power loss.

More responses below:

On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko
ita...@code972.com
wrote:

I'm a Lucene.Net committer, and there is a chance we have a bug
in
our
FSDirectory implementation that causes indexes to get corrupted
when
indexing is cut while the IW is still open. As it roots from
some
retroactive fixes you made, I'd appreciate your feedback.

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko

I can confirm 2.9.4 had autoCommit, but it is gone in 3.0.3 already, so
Lucene.Net doesn't have autoCommit.

So I don't have autoCommit set to true, but I can clearly see a segments_1
file there along with the other files. If that helpes, it always keeps with
the name segments_1 with 32 bytes, never changes.

And as again, if I kill the process and try to open the index with Luke
3.3, the index folder is being wiped out.

Not sure what to make of all that.

On Fri, Jun 15, 2012 at 3:21 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Hmm, OK: in 2.9.4 / 3.0.x, if you open IW on a new directory, it will
 make a zero-segment commit.  This was changed/fixed in 3.1 with
 LUCENE-2386.

 In 2.9.x (not 3.0.x) there is still an autoCommit parameter,
 defaulting to false, but if you set it to true then IndexWriter will
 periodically commit.

 Seeing segment files created and merge is definitely expected, but
 it's not expected to see segments_N files unless you pass
 autoCommit=true.

 Mike McCandless

 http://blog.mikemccandless.com

 On Thu, Jun 14, 2012 at 8:14 PM, Itamar Syn-Hershko ita...@code972.com
 wrote:
  Not what I'm seeing. I actually see a lot of segments created and merged
  while it operates. Expected?
 
  Reminding you, this is 2.9.4 / 3.0.3
 
  On Fri, Jun 15, 2012 at 3:10 AM, Michael McCandless
  luc...@mikemccandless.com wrote:
 
  Right: Lucene never autocommits anymore ...
 
  If you create a new index, add a bunch of docs, and things crash
  before you have a chance to commit, then there is no index (not even a
  0 doc one) in that directory.
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
  On Thu, Jun 14, 2012 at 1:41 PM, Itamar Syn-Hershko ita...@code972.com
 
  wrote:
   I'm quite certain this shouldn't happen also when Commit wasn't
 called.
  
   Mike, can you comment on that?
  
   On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
   currens.ch...@gmail.com wrote:
  
   Well, the only thing I see is that there is no place where
   writer.Commit()
   is called in the delegate assigned to corpusReader.OnDocument.  I
 know
   that
   lucene is very transactional, and at least in 3.x, the writer will
   never
   auto commit to the index.  You can write millions of documents, but
 if
   commit is never called, those documents aren't actually part of the
   index.
Committing isn't a cheap operation, so you definitely don't want to
 do
   it
   on every document.
  
   You can test it yourself with this (naive) solution.  Right below the
   writer.SetUseCompoundFile(false) line, add int numDocsAdded = 0;.
  At
   the
   end of the corpusReader.OnDocument delegate add:
  
   // Example only.  I wouldn't suggest committing this often
   if(++numDocsAdded % 5 == 0)
   {
  writer.Commit();
   }
  
   I had the application crash for real on this file:
  
  
  
 http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2
 ,
   about 20% into the operation.  Without the commit, the index is
 empty.
Add
   it in, and I get 755 files in the index after it crashes.
  
  
   Thanks,
   Christopher
  
   On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko
   ita...@code972.comwrote:
  
  
Yes, reproduced in first try. See attached program - I referenced
 it
to
current trunk.
   
   
On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko
ita...@code972.comwrote:
   
Christopher,
   
I used the IndexBuilder app from here
https://github.com/synhershko/Talks/tree/master/LuceneNeatThings
with a
8.5GB wikipedia dump.
   
After running for 2.5 days I had to forcefully close it (infinite
loop
in
the wiki-markdown parser at 92%, go figure), and the 40-something
 GB
index
I had by then was unusable. I then was able to reproduce this
   
Please note I now added a few safe-guards you might want to remove
to
make sure the app really crashes on process kill.
   
I'll try to come up with a better way to reproduce this -
 hopefully
Mike
will be able to suggest better ways than manual process kill...
   
On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens 
currens.ch...@gmail.com wrote:
   
Mike, The codebase for lucene.net should be almost identical to
java's
3.0.3 release, and LUCENE-1044 is included in that.
   
Itamar, are you committing the index regularly?  I only ask
 because
I
can't
reproduce it myself by forcibly terminating the process while
 it's
indexing.  I've tried both 3.0.3 and 2.9.4.  If I don't commit at
all
and
terminate the process (even with a 10,000 4K documents created),
there
will
be no documents in the index when I open it in luke, which I
expect.
 If
I
commit at 10,000 documents, and terminate it a few thousand after
that,
the
index has the first ten thousand that were committed.  I've even
terminated
it *while* a second commit was taking

Re: Corrupt index

2012-06-13 Thread Itamar Syn-Hershko

Christopher,

I used the IndexBuilder app from here
https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a
8.5GB wikipedia dump.

After running for 2.5 days I had to forcefully close it (infinite loop in
the wiki-markdown parser at 92%, go figure), and the 40-something GB index
I had by then was unusable. I then was able to reproduce this

Please note I now added a few safe-guards you might want to remove to make
sure the app really crashes on process kill.

I'll try to come up with a better way to reproduce this - hopefully Mike
will be able to suggest better ways than manual process kill...

On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens
currens.ch...@gmail.com wrote:

Mike, The codebase for lucene.net should be almost identical to java's
3.0.3 release, and LUCENE-1044 is included in that.

Itamar, are you committing the index regularly? I only ask because I can't
reproduce it myself by forcibly terminating the process while it's
indexing. I've tried both 3.0.3 and 2.9.4. If I don't commit at all and
terminate the process (even with a 10,000 4K documents created), there will
be no documents in the index when I open it in luke, which I expect. If I
commit at 10,000 documents, and terminate it a few thousand after that, the
index has the first ten thousand that were committed. I've even terminated
it *while* a second commit was taking place, and it still had all of the
documents I expected.

It may be that I'm not trying to reproducing it correctly. Do you have a
minimal amount of code that can reproduce it?

Thanks,
Christopher

On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless
luc...@mikemccandless.com wrote:

Hi Itamar,

One quick question: does Lucene.Net include the fixes done for
LUCENE-1044 (to fsync files on commit)? Those are very important for
an index to be intact after OS/JVM crash or power loss.

More responses below:

On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko ita...@code972.com
wrote:

I'm a Lucene.Net committer, and there is a chance we have a bug in our
FSDirectory implementation that causes indexes to get corrupted when
indexing is cut while the IW is still open. As it roots from some
retroactive fixes you made, I'd appreciate your feedback.

Correct me if I'm wrong, but by design Lucene should be able to recover
rather quickly from power failures or app crashes. Since existing
segment
files are read only, only new segments that are still being written can
get
corrupted. Hence, recovering from worst-case scenarios is done by
simply
removing the write.lock file. The worst that could happen then is
having
the
last segment damaged, and that can be fixed by removing those files,
possibly by running CheckIndex on the index.

You do have to remove the write.lock if you aren't using
NativeFSLockFactory (but this has been the default lock impl for a
while now).

Last week I have been playing with rather large indexes and crashed my
app
while it was indexing. I wasn't able to open the index, and Luke was
even
kind enough to wipe the index folder clean even though I opened it in
read-only mode. I re-ran this, and after another crash running
CheckIndex
revealed nothing - the index was detected to be an empty one. I am not
entirely sure what could be the cause for this, but I suspect it has
been corrupted by the crash.

Had no commit completed (no segments file written)?

If you don't fsync then all sorts of crazy things are possible...

I've been looking at these:

https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

(And LUCENE-1044 before that ... it was LUCENE-1044 that LUCENE-2328
broke...).

And it seems like this is what I was experiencing. Mike and Mark will
probably be able to tell if this is what they saw or not, but as far
as I
can tell this is not an expected behavior of a Lucene index.

Definitely not expected behavior: assuming nothing is flipping bits,
then on OS/JVM crash or power loss your index should be fine, just
reverted to the last successful commit.

What I'm looking for at the moment is some advice on what FSDirectory
implementation to use to make sure no corruption can happen. The 3.4
version
(which is where LUCENE-3418 was committed to) seems to handle a lot of
things the 3.0 doesn't, but on the other hand LUCENE-3418 was
introduced
by
changes made to the 3.0 codebase.

Hopefully it's just that you are missing fsync!

Also

Corrupt index

2012-06-13 Thread Itamar Syn-Hershko

Hi Java devs,

I'm a Lucene.Net committer, and there is a chance we have a bug in our
FSDirectory implementation that causes indexes to get corrupted when
indexing is cut while the IW is still open. As it roots from some
retroactive fixes you made, I'd appreciate your feedback.

Correct me if I'm wrong, but by design Lucene should be able to recover
rather quickly from power failures or app crashes. Since existing segment
files are read only, only new segments that are still being written can get
corrupted. Hence, recovering from worst-case scenarios is done by simply
removing the write.lock file. The worst that could happen then is having
the last segment damaged, and that can be fixed by removing those files,
possibly by running CheckIndex on the index.

Last week I have been playing with rather large indexes and crashed my app
while it was indexing. I wasn't able to open the index, and Luke was even
kind enough to wipe the index folder clean even though I opened it in
read-only mode. I re-ran this, and after another crash running CheckIndex
revealed nothing - the index was detected to be an empty one. I am not
entirely sure what could be the cause for this, but I suspect it has
been corrupted by the crash.

I've been looking at these:

https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

And it seems like this is what I was experiencing. Mike and Mark will
probably be able to tell if this is what they saw or not, but as far as I
can tell this is not an expected behavior of a Lucene index.

What I'm looking for at the moment is some advice on what FSDirectory
implementation to use to make sure no corruption can happen. The 3.4
version (which is where LUCENE-3418 was committed to) seems to handle a lot
of things the 3.0 doesn't, but on the other hand LUCENE-3418 was introduced
by changes made to the 3.0 codebase.

Also, is there any test in the suite checking for those scenarios?

Will appreciate any help on this,

Itamar.

Re: Corrupt index

2012-06-13 Thread Itamar Syn-Hershko

Mike,

On Wed, Jun 13, 2012 at 7:31 PM, Michael McCandless
luc...@mikemccandless.com wrote:

Hi Itamar,

One quick question: does Lucene.Net include the fixes done for
LUCENE-1044 (to fsync files on commit)? Those are very important for
an index to be intact after OS/JVM crash or power loss.

Definitely, as Christopher noted we are about to release a 3.0.3 compatible
version, which is line-by-line port of the Java version.

You do have to remove the write.lock if you aren't using
NativeFSLockFactory (but this has been the default lock impl for a
while now).

Somewhat unrelated to this thread, but what should I expect to see? from
time to time we do see write.lock present after an app-crash or power
failure. Also, what are the steps that are expected to be performed in such
cases?

Last week I have been playing with rather large indexes and crashed my
app
while it was indexing. I wasn't able to open the index, and Luke was even
kind enough to wipe the index folder clean even though I opened it in
read-only mode. I re-ran this, and after another crash running CheckIndex
revealed nothing - the index was detected to be an empty one. I am not
entirely sure what could be the cause for this, but I suspect it has
been corrupted by the crash.

Had no commit completed (no segments file written)?

If you don't fsync then all sorts of crazy things are possible...

Ok, so we do have fsync since LUCENE-1044 is present, and there were
segments present from previous commits. Any idea what went wrong?

I've been looking at these:

https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

(And LUCENE-1044 before that ... it was LUCENE-1044 that LUCENE-2328broke...).

So 2328 broke 1044, and this was fixed only in 3.4, right? so 2328 made it
to a 3.0.x release while the fix for it (3418) was only released in 3.4. Am
I right?

If this is the case, 2328 probably made it's way to Lucene.Net since we are
using the released sources for porting, and we now need to apply 3418 in
the current version.

Does it make sense to just port FSDirectory from 3.4 to 3.0.3? or were
there API or other changes that will make our life miserable if we do that?

And it seems like this is what I was experiencing. Mike and Mark will
probably be able to tell if this is what they saw or not, but as far as I
can tell this is not an expected behavior of a Lucene index.

Definitely not expected behavior: assuming nothing is flipping bits,
then on OS/JVM crash or power loss your index should be fine, just
reverted to the last successful commit.

What I suspected. Will try to reproduce reliably - any recommendations? not
really feeling like reinventing the wheel here...

MockDirectoryWrapper wasn't ported yet as it appears to only appear in 3.4,
and as you said it won't really help here anyway

Hopefully it's just that you are missing fsync!

Also, is there any test in the suite checking for those scenarios?

Our test framework has a sneaky MockDirectoryWrapper that, after a
test finishes, goes and corrupts any unsync'd files and then verifies
the index is still OK... it's good because it'll catch any times we
are missing calls t sync, but, it's not low level enough such that if
FSDir is failing to actually call fsync (that wsa the bug in
LUCENE-3418) then it won't catch that...