Re: Down to 5

2009-10-12 Thread Anders Melchiorsen
A bit of a shame if the HTML stripper in 1.4 will be useless for most
real situations.

I have added a new version of the patch, with test cases for the stuff
that I am fixing. I also think that the new code is a bit easier to
follow (no longer reusing existing state variables), so hopefully it
will be even easier to review.


Anders.



Yonik Seeley yo...@lucidimagination.com writes:

 Unfortunately we don't have good unit tests for this, so it's
 difficult for people to tell if we've avoided regressions while making
 progress.

 But since it's a bug fix... it is technically possible for it to be
 included after the code freeze deadline I think?

 -Yonik

 On Fri, Oct 9, 2009 at 5:20 PM, Anders Melchiorsen
 m...@spoon.kalibalik.dk wrote:
 Yonik Seeley yo...@lucidimagination.com writes:

 One further issue... should we commit the changes to the HTMLStripReader?
 https://issues.apache.org/jira/browse/SOLR-1394

 As the reporter of that bug, I would obviously like to see a fix
 included in 1.4.

 It would be one thing to have the patch declared faulty, but having it
 miss the window due to being ignored bothers me a bit.

 Is there anything that I can do to help it along?

 Anders.


Re: Down to 5

2009-10-12 Thread Grant Ingersoll

Yep, working on it now.

So, let's call it pencils down.  I'll put up one either tonight or  
tomorrow morning.


On Oct 12, 2009, at 5:14 PM, Yonik Seeley wrote:

On Fri, Oct 9, 2009 at 11:21 AM, Grant Ingersoll  
gsing...@apache.org wrote:
Realistically speaking, I can do an RC on Monday afternoon.  So,  
how about
we say Pencils down at 12 noon EDT on Monday and then I can  
create an RC

that afternoon.


Zero issues... If you have the time today, I think everyone's ready  
for that RC!


-Yonik
http://www.lucidimagination.com


-Grant

On Oct 9, 2009, at 10:13 AM, Shalin Shekhar Mangar wrote:

On Fri, Oct 9, 2009 at 6:08 PM, Koji Sekiguchi  
k...@r.email.ne.jp wrote:



Hi Shalin,


What about FastVectorHighlighter?
https://issues.apache.org/jira/browse/SOLR-1268


If we're targeting RC in this week, I'd like to push it to 1.5
because there is no patches. But perhaps you think
13 votes is considerable?


No, that is fine. We can push it to 1.5 unless it is very easy to  
make it

work with Solr. It affects relatively few number of people. The
applications
mentioned in the comments (Blacklight etc) can choose to release  
with

patched Solr versions I guess.

The only common case which needs this feature is highlighting n-gram
fields
(for auto-complete).

--
Regards,
Shalin Shekhar Mangar.








Re: Down to 5

2009-10-09 Thread Shalin Shekhar Mangar
On Fri, Oct 9, 2009 at 4:17 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Sun, Oct 4, 2009 at 6:13 PM, Grant Ingersoll gsing...@apache.org
 wrote:
  Coming along:
 
 https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230versionId=12313351showOpenIssuesOnly=true
 
  If we can finish these up this week, I can generate RCs next week.

 Down to 4!  We're still planning on a code freeze at the end of this
 week, right?  I think all of the targeted issues are just waiting for
 the final commits?

 SOLR-1458Java Replication error: NullPointerException SEVERE:
 SnapPull failed on 2009-09-22 nightly
 SOLR-1449   solrconfig.xml syntax to add classpath elements from
 outside of instanceDir
 SOLR-1497   Remove solrjs from svn -- point docs to AJAX Solr
 SOLR-1475   Java-based replication doesn't properly reserve its commit
 point during backups

 Other issues:
  - Do we just bag chinese for now and force people to write their own
 factories? SOLR-1336
  - Does the Lucene 2.9 bugfix branch have anything warranting upgrading to
 it?


What about FastVectorHighlighter?

https://issues.apache.org/jira/browse/SOLR-1268
-- 
Regards,
Shalin Shekhar Mangar.


Re: Down to 5

2009-10-09 Thread Koji Sekiguchi

Hi Shalin,

 What about FastVectorHighlighter?
 https://issues.apache.org/jira/browse/SOLR-1268

If we're targeting RC in this week, I'd like to push it to 1.5
because there is no patches. But perhaps you think
13 votes is considerable?

Koji




Re: Down to 5

2009-10-09 Thread Shalin Shekhar Mangar
On Fri, Oct 9, 2009 at 6:08 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 Hi Shalin,

  What about FastVectorHighlighter?
  https://issues.apache.org/jira/browse/SOLR-1268

 If we're targeting RC in this week, I'd like to push it to 1.5
 because there is no patches. But perhaps you think
 13 votes is considerable?


No, that is fine. We can push it to 1.5 unless it is very easy to make it
work with Solr. It affects relatively few number of people. The applications
mentioned in the comments (Blacklight etc) can choose to release with
patched Solr versions I guess.

The only common case which needs this feature is highlighting n-gram fields
(for auto-complete).

-- 
Regards,
Shalin Shekhar Mangar.


Re: Down to 5

2009-10-09 Thread Grant Ingersoll
Realistically speaking, I can do an RC on Monday afternoon.  So, how  
about we say Pencils down at 12 noon EDT on Monday and then I can  
create an RC that afternoon.


-Grant

On Oct 9, 2009, at 10:13 AM, Shalin Shekhar Mangar wrote:

On Fri, Oct 9, 2009 at 6:08 PM, Koji Sekiguchi k...@r.email.ne.jp  
wrote:



Hi Shalin,


What about FastVectorHighlighter?
https://issues.apache.org/jira/browse/SOLR-1268


If we're targeting RC in this week, I'd like to push it to 1.5
because there is no patches. But perhaps you think
13 votes is considerable?


No, that is fine. We can push it to 1.5 unless it is very easy to  
make it
work with Solr. It affects relatively few number of people. The  
applications

mentioned in the comments (Blacklight etc) can choose to release with
patched Solr versions I guess.

The only common case which needs this feature is highlighting n-gram  
fields

(for auto-complete).

--
Regards,
Shalin Shekhar Mangar.





Re: Down to 5

2009-10-09 Thread Anders Melchiorsen
Yonik Seeley yo...@lucidimagination.com writes:

 One further issue... should we commit the changes to the HTMLStripReader?
 https://issues.apache.org/jira/browse/SOLR-1394

As the reporter of that bug, I would obviously like to see a fix
included in 1.4.

It would be one thing to have the patch declared faulty, but having it
miss the window due to being ignored bothers me a bit.

Is there anything that I can do to help it along?


Anders.


Re: Down to 5

2009-10-09 Thread Yonik Seeley
Unfortunately we don't have good unit tests for this, so it's
difficult for people to tell if we've avoided regressions while making
progress.

But since it's a bug fix... it is technically possible for it to be
included after the code freeze deadline I think?

-Yonik

On Fri, Oct 9, 2009 at 5:20 PM, Anders Melchiorsen
m...@spoon.kalibalik.dk wrote:
 Yonik Seeley yo...@lucidimagination.com writes:

 One further issue... should we commit the changes to the HTMLStripReader?
 https://issues.apache.org/jira/browse/SOLR-1394

 As the reporter of that bug, I would obviously like to see a fix
 included in 1.4.

 It would be one thing to have the patch declared faulty, but having it
 miss the window due to being ignored bothers me a bit.

 Is there anything that I can do to help it along?

 Anders.


Re: Down to 5

2009-10-08 Thread Yonik Seeley
On Sun, Oct 4, 2009 at 6:13 PM, Grant Ingersoll gsing...@apache.org wrote:
 Coming along:
  https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230versionId=12313351showOpenIssuesOnly=true

 If we can finish these up this week, I can generate RCs next week.

Down to 4!  We're still planning on a code freeze at the end of this
week, right?  I think all of the targeted issues are just waiting for
the final commits?

SOLR-1458Java Replication error: NullPointerException SEVERE:
SnapPull failed on 2009-09-22 nightly
SOLR-1449   solrconfig.xml syntax to add classpath elements from
outside of instanceDir
SOLR-1497   Remove solrjs from svn -- point docs to AJAX Solr
SOLR-1475   Java-based replication doesn't properly reserve its commit
point during backups

Other issues:
 - Do we just bag chinese for now and force people to write their own
factories? SOLR-1336
 - Does the Lucene 2.9 bugfix branch have anything warranting upgrading to it?

-Yonik
http://www.lucidimagination.com


Re: Down to 5

2009-10-08 Thread Yonik Seeley
One further issue... should we commit the changes to the HTMLStripReader?
https://issues.apache.org/jira/browse/SOLR-1394

-Yonik
http://www.lucidimagination.com

On Thu, Oct 8, 2009 at 6:47 PM, Yonik Seeley yo...@lucidimagination.com wrote:
 On Sun, Oct 4, 2009 at 6:13 PM, Grant Ingersoll gsing...@apache.org wrote:
 Coming along:
  https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230versionId=12313351showOpenIssuesOnly=true

 If we can finish these up this week, I can generate RCs next week.

 Down to 4!  We're still planning on a code freeze at the end of this
 week, right?  I think all of the targeted issues are just waiting for
 the final commits?

 SOLR-1458        Java Replication error: NullPointerException SEVERE:
 SnapPull failed on 2009-09-22 nightly
 SOLR-1449       solrconfig.xml syntax to add classpath elements from
 outside of instanceDir
 SOLR-1497       Remove solrjs from svn -- point docs to AJAX Solr
 SOLR-1475       Java-based replication doesn't properly reserve its commit
 point during backups

 Other issues:
  - Do we just bag chinese for now and force people to write their own
 factories? SOLR-1336
  - Does the Lucene 2.9 bugfix branch have anything warranting upgrading to it?

 -Yonik
 http://www.lucidimagination.com



Re: Down to 5

2009-10-08 Thread Jason Rutherglen
I haven't had any time to verify this, we're not currently
recording the HTML that failed which I'd use to reproduce with a
test case. Though when I do, it should be fairly comprehensive,
though I'm not sure I'd be able to fit it all in an actual unit
test unless the HTML was in files which probably shouldn't be in
a patch.

Though to confirm, I am still seeing multiple errors from this
bug. Perhaps on the weekend I'll work on it?

On Thu, Oct 8, 2009 at 3:54 PM, Yonik Seeley yo...@lucidimagination.com wrote:
 One further issue... should we commit the changes to the HTMLStripReader?
 https://issues.apache.org/jira/browse/SOLR-1394

 -Yonik
 http://www.lucidimagination.com

 On Thu, Oct 8, 2009 at 6:47 PM, Yonik Seeley yo...@lucidimagination.com 
 wrote:
 On Sun, Oct 4, 2009 at 6:13 PM, Grant Ingersoll gsing...@apache.org wrote:
 Coming along:
  https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230versionId=12313351showOpenIssuesOnly=true

 If we can finish these up this week, I can generate RCs next week.

 Down to 4!  We're still planning on a code freeze at the end of this
 week, right?  I think all of the targeted issues are just waiting for
 the final commits?

 SOLR-1458        Java Replication error: NullPointerException SEVERE:
 SnapPull failed on 2009-09-22 nightly
 SOLR-1449       solrconfig.xml syntax to add classpath elements from
 outside of instanceDir
 SOLR-1497       Remove solrjs from svn -- point docs to AJAX Solr
 SOLR-1475       Java-based replication doesn't properly reserve its commit
 point during backups

 Other issues:
  - Do we just bag chinese for now and force people to write their own
 factories? SOLR-1336
  - Does the Lucene 2.9 bugfix branch have anything warranting upgrading to 
 it?

 -Yonik
 http://www.lucidimagination.com




Re: Down to 5

2009-10-05 Thread Grant Ingersoll


On Oct 5, 2009, at 12:23 AM, Yonik Seeley wrote:

On Sun, Oct 4, 2009 at 10:53 PM, Israel Ekpo israele...@gmail.com  
wrote:
Smaller precisionStep values (specified in bits) will lead to more  
tokens
indexed per value, slightly larger index size, and faster range  
queries


It also states that for faster range queries, consider the
tint/tfloat/tlong/tdouble types.

Now, the tint/tfloat/tlong/tdouble have a precisionStetp of 8 while  
the

int/float/long/double types have a precisionStep of 0


precisionStep of 0 means don't do precision steps at all (i.e. no
acceleration of range queries).
I'll take a shot at clearing this up in the example schema.xml.


It seems like we should have tint4, tint8 instead of just 8, no?  My  
impression of the Lucene javadocs is that a step of 4 for ints/floats  
was more appropriate.  Or am I missing something (I haven't followed  
the numerics stuff that closely)


-Grant


Re: Down to 5

2009-10-05 Thread Grant Ingersoll


On Oct 4, 2009, at 10:51 PM, Chris Hostetter wrote:



: Subject: Down to 5

I reopened SOLR-1448 because it seems perfectly reasonable to me ..
looking for a reply.


I commented on the issue.



I'm also just waiting on someone else to test/review SOLR-1449 ...
otherwise it should be pushed to 1.5.



I haven't had a chance to look.


Down to 5

2009-10-04 Thread Grant Ingersoll

Coming along:  
https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230versionId=12313351showOpenIssuesOnly=true

If we can finish these up this week, I can generate RCs next week.

Thoughts?

-Grant


Re: Down to 5

2009-10-04 Thread Koji Sekiguchi

+1.

Grant Ingersoll wrote:
Coming along:  
https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230versionId=12313351showOpenIssuesOnly=true 



If we can finish these up this week, I can generate RCs next week.

Thoughts?

-Grant





Re: Down to 5

2009-10-04 Thread Chris Hostetter

: Subject: Down to 5

I reopened SOLR-1448 because it seems perfectly reasonable to me .. 
looking for a reply.

I'm also just waiting on someone else to test/review SOLR-1449 ... 
otherwise it should be pushed to 1.5.



-Hoss



Re: Down to 5

2009-10-04 Thread Israel Ekpo
On Sun, Oct 4, 2009 at 9:51 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : Subject: Down to 5

 I reopened SOLR-1448 because it seems perfectly reasonable to me ..
 looking for a reply.

 I'm also just waiting on someone else to test/review SOLR-1449 ...
 otherwise it should be pushed to 1.5.



 -Hoss



Hi Grant,

I was just looking at the documentation bug SOLR-1483 and I found the
following comments in the comments in the schema file.

Numeric field types that index each value at various levels of precision to
accelerate range queries when the number of values between the range
endpoints is large

Smaller precisionStep values (specified in bits) will lead to more tokens
indexed per value, slightly larger index size, and faster range queries

It also states that for faster range queries, consider the
tint/tfloat/tlong/tdouble types.

Now, the tint/tfloat/tlong/tdouble have a precisionStetp of 8 while the
int/float/long/double types have a precisionStep of 0

From these comments, it seems like the int/float/long/double with smaller
precisionstep values should lead to more tokens indexed per value, slightly
larger index size, and faster range queries.

So maybe we should recommend, the int/float/long/double types over the
tint/tfloat/tlong/tdouble types for faster range queries.

If all we need to do is to rewrite the documentation, I can come up with a
re-write of the comments in the schema file and submit the patch so that
this issue can be closed.

So if you want to assign this one to me, that would be fine too.

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Down to 5

2009-10-04 Thread Yonik Seeley
On Sun, Oct 4, 2009 at 10:53 PM, Israel Ekpo israele...@gmail.com wrote:
 Smaller precisionStep values (specified in bits) will lead to more tokens
 indexed per value, slightly larger index size, and faster range queries

 It also states that for faster range queries, consider the
 tint/tfloat/tlong/tdouble types.

 Now, the tint/tfloat/tlong/tdouble have a precisionStetp of 8 while the
 int/float/long/double types have a precisionStep of 0

precisionStep of 0 means don't do precision steps at all (i.e. no
acceleration of range queries).
I'll take a shot at clearing this up in the example schema.xml.

-Yonik
http://www.lucidimagination.com