date:20100702

Re: Revisions to Incubator proposal

2010-07-02 Thread Nathan Kurz

On Thu, Jul 1, 2010 at 10:08 AM, Marvin Humphrey mar...@rectangular.com wrote:
 There is a possibility that this proposal won't be accepted.  But if that
 happens, we have a contingency plan: leave Apache, yet continue to operate
 according to many of the same procedures and values, possibly returning with a
 new proposal at some later date.  We should seek to exploit any critiques that
 we receive during the proposal process to help guide us, regardless of the
 outcome -- just as we have attempted to make best use of the Lucene PMC's 
 advice.

 Having slept on the matter, here are some thoughts on the current proposal.

 The first half of the Rationale seems fine...

    There is great hunger for a search engine library in the mode of Lucene
    which is accessible from various dynamic languages, and for one accessible
    from pure C. Individuals naturally wish to code in their language of
    choice. Organizations which do not have significant Java expertise may not
    want to support Java strictly for the sake of running a Lucene instance.
    Native applications can be launched much more quickly than JVMs. Lucy will
    meet all these demands.

I like this, but might stop after the first two sentences.  I think
they're the stronger reasons.  If you wanted to go further, I'd
mention the ways in which C has traditionally been the language of
Unix server programming, and allows one to take full advantage of the
machine's potential.

I think this is essentially what made Apache a great web server:  a
brilliant architecture combined with the clarity of C.  Can you
picture writing mod_php, mod_perl, or mod_python in Java?  It's
certainly not the only way to go, but fills a niche that Lucene never
will.

    We acknowledge that Apache seems like a natural home for Lucy given that
    it is also the home of Lucene, and speculate that this may have been on
    the minds of the Lucene PMC when Lucy was green-lighted as a sub-project.
    More importantly, though, the Lucy development community strongly believes
    that The Apache Way is right for Lucy.

I agree that this part is somewhat weak. And while I really like
Apache, I'm not sure it's the only way for Lucy.  For example, I also
really like the SQLite way.  While Lucy certainly can take advantage
of the Apache infrastructure, I'm not sure it really needs it.  What
it needs is more people writing code for it, and the best way to
achieve this is probably to get more technically proficient users.
If the code is clear (which it is) and the architecture is graspable
(needs improvement?) some significant percentage of these users will
contribute.

 The one thing I don't think we've done well (and this is my fault) is handle
 releases and backwards compatibility.  Father Chrysostomos put in an awful lot
 of work creating a subclassable Highlighter, but later releases of KS broke
 back compat on him.  Nevertheless, I think we have learned from how that
 played out, and that the backwards compatibility policy we arrived at last
 year goes a long way towards solving those problems.

I think you're doing fine on backwards compatibility, and if anything
you're spending too much time on it.  Instead of worrying about not
breaking things that are old, spend the effort making it easier to
write things that are new.

Let's make the dogfood tasty, and start writing 'extensions' instead
of trying to include everything in core.  If it's easy enough and
useful enough, someone will port the old to the new.  In fact, that
would be an excellent project for a beginner to familiarize themselves
with the code base.

 Lastly, it would be nice to cover our contingency plan of growing the
 community and coming back with a bigger committer list at some later date.
 However, I think that may arise naturally during the discussion, and it's
 probably too big a topic to squeeze in.

I don't know if you need to discuss this.  And once we have more
developers, do we really need to come back?   I mean, I really like
Apache, but Trac plus taking the best points of the Incubator approach
might offer 90% of the benefit with a lot less overhead.  Which
doesn't mean Lucy wouldn't benefit from being included, but I wouldn't
precommit to returning at a later date if they don't want us.

--nate

[Lucy Wiki] Update of LucyIncubatorProposal by Nathan Kurz

2010-07-02 Thread Apache Wiki

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Lucy Wiki for change 
notification.

The LucyIncubatorProposal page has been changed by NathanKurz.
http://wiki.apache.org/lucy/LucyIncubatorProposal?action=diffrev1=35rev2=36

--

  === Core Developers ===
   * Marvin Humphrey is the project founder of KinoSearch, and co-founded the 
existing Lucy sub-project.  He is presently employed by Eventful, Inc.
   * Peter Karman has contributed to several open source projects since 2001, 
including being a committer at http://swish-e.org/ (a search engine), 
http://code.google.com/p/rose/ (an ORM) and http://catalyst.perl.org/ (web 
framework).  He is employed by American Public Media.
-  * Nathan Kurz has participated in numerous open source projects and has been 
a KinoSearch committer since 2007.  He is currently Chief Flavor Engineer of 
Scream Sorbet, and writes software in his copious free time.
+  * Nathan Kurz is excited by the intersection of search and recommendations, 
and has been a KinoSearch committer since 2007.  As the owner of Scream Sorbet 
(http://screamsorbet.com), he divides his time between code and fruit. 
  
  === Alignment ===
  One Apache value which is particularly cherished by the Lucy community is 
codebase transparency.  We have developed institutions which enable us to 
measure and maximize usability (see [http://wiki.apache.org/lucy/BrainLog]), 
and we feel strongly that the bindings for Lucy must present APIs and 
documentation which are idiomatic to the host language culture so that end 
users can consume our work as easily as possible.

[jira] Created: (LUCENE-2523) if index is too old you should hit an exception saying so

2010-07-02 Thread Michael McCandless (JIRA)

if index is too old you should hit an exception saying so
-

 Key: LUCENE-2523
 URL: https://issues.apache.org/jira/browse/LUCENE-2523
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 4.0


If you create an index in 2.3.x (I used demo's IndexFiles) and then try to read 
it in 4.0.x (I used CheckIndex), you hit a confusing exception like this:
{noformat}
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at 
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:40)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:76)
at org.apache.lucene.index.SegmentInfo.init(SegmentInfo.java:171)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:230)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:269)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:649)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:484)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:265)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:308)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:287)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:930)
{noformat}

I think instead we should throw an IndexTooOldException or something like that?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Build failed in Hudson: Solr-trunk #1195

2010-07-02 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Solr-trunk/1195/changes

Changes:

[yonik] formatting: make consistent with other clauses

[gsingers] fix typo in javadoc

[gsingers] LUCENE-1810: added LATENT field selector option

--
[...truncated 9858 lines...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 9.208 sec
[junit] 
[junit] Testsuite: org.apache.solr.handler.SpellCheckerRequestHandlerTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 66.886 sec
[junit] 
[junit] Testsuite: org.apache.solr.handler.StandardRequestHandlerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.217 sec
[junit] 
[junit] Testsuite: org.apache.solr.handler.TestCSVLoader
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 13.609 sec
[junit] 
[junit] Testsuite: org.apache.solr.handler.TestReplicationHandler
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 98.945 sec
[junit] 
[junit] - Standard Error -
[junit] Jul 2, 2010 9:29:39 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55735/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] Jul 2, 2010 9:29:40 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55735/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] Jul 2, 2010 9:29:57 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55757/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] Jul 2, 2010 9:30:41 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55809/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] Jul 2, 2010 9:30:51 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55831/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] Jul 2, 2010 9:30:52 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55831/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] Jul 2, 2010 9:30:52 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55822/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] Jul 2, 2010 9:30:53 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55831/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] Jul 2, 2010 9:30:54 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55831/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] Jul 2, 2010 9:30:55 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55831/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] Jul 2, 2010 9:30:56 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55831/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] Jul 2, 2010 9:30:57 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55831/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] Jul 2, 2010 9:30:59 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex
[junit] SEVERE: Master at: http://localhost:55831/solr/replication is not 
available. Index fetch failed. Exception: Connection refused
[junit] -  ---
[junit] Testsuite: org.apache.solr.handler.XmlUpdateRequestHandlerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6 sec
[junit] 
[junit] Testsuite: org.apache.solr.handler.admin.LukeRequestHandlerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 12.173 sec
[junit] 
[junit] Testsuite: org.apache.solr.handler.admin.SystemInfoHandlerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.5 sec
[junit] 
[junit] Testsuite: 
org.apache.solr.handler.component.DistributedSpellCheckComponentTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 58.226 sec
[junit] 
[junit] Testsuite: 
org.apache.solr.handler.component.DistributedTermsComponentTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 49.52 sec
[junit] 
[junit] Testsuite:

[jira] Commented: (SOLR-1144) replication hang

2010-07-02 Thread Toby Cole (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884644#action_12884644
 ] 

Toby Cole commented on SOLR-1144:
-

Just over a year since it was first spotted, I'm consistently getting the same 
symptoms as this bug.
We've got a single master, with two slaves polling it, both slaves have stalled 
at exactly the same point in the replication.

Here's the relevent section of the replication handler's 'details' response:
Node A
{code:xml}
  str name=numFilesDownloaded18/str
  str name=replicationStartTimeFri Jul 02 10:40:00 BST 2010/str
  str name=timeElapsed6683s/str
  str name=currentFile_9du.prx/str
  str name=currentFileSize8.17 MB/str
  str name=currentFileSizeDownloaded8.17 MB/str
  str name=currentFileSizePercent100.0/str
  str name=bytesDownloaded40.55 MB/str
  str name=totalPercent0.0/str
  str name=timeRemaining8290722s/str
  str name=downloadSpeed6.21 KB/str
{code}

Node B
{code:xml}
  str name=numFilesDownloaded18/str
  str name=replicationStartTimeFri Jul 02 10:40:00 BST 2010/str
  str name=timeElapsed6752s/str
  str name=currentFile_9du.prx/str
  str name=currentFileSize8.17 MB/str
  str name=currentFileSizeDownloaded8.17 MB/str
  str name=currentFileSizePercent100.0/str
  str name=bytesDownloaded40.55 MB/str
  str name=totalPercent0.0/str
  str name=timeRemaining8376322s/str
  str name=downloadSpeed6.15 KB/str
{code}

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucy] Revisions to Incubator proposal

2010-07-02 Thread Peter Karman

Marvin Humphrey wrote on 7/1/10 12:08 PM:

 ... but I'm dissatisfied with the second half:
 
 We acknowledge that Apache seems like a natural home for Lucy given that
 it is also the home of Lucene, and speculate that this may have been on
 the minds of the Lucene PMC when Lucy was green-lighted as a sub-project.
 More importantly, though, the Lucy development community strongly believes
 that The Apache Way is right for Lucy. 
 
 First, this passage only asserts that we believe in The Apache Way rather than
 demonstrating our understanding of it.  We should show, not tell.  Second,
 we should purge the PMC mind reading.  Who knows what they were thinking! :)
 Third, I don't want to leave in any mention of Lucy belonging at Apache 
 because
 Lucene is there, too.  That's Lucy sponging off the Lucene brand, and it's not
 a benefit to Apache.  We should just leave that unstated and stand on our
 merits.

+1 to all your points. Nuke that second half.


 The Community section does a fine job of identifying our challenges and
 presenting a plan:
 
 Lucy currently has a small community, most members of which originated in 
 the
 KinoSearch community.
 
 Lucy's chief challenge is growing its community, which it hopes to achieve
 through efforts in two areas: reaching a 1.0 release, and actively 
 reaching
 out to its target audience, users and developers in the dynamic language
 communities who want a fast, scalable full-text search solution in their
 native language. 
 
 Still, I think we deserve a little more credit.  We've taken a lot of flak
 regarding the size of the Lucy community, but you know, if you consider how
 the *KinoSearch* community has operated over the years, we haven't done so
 bad.

+1 for mentioning KS, since *that* is the code that is being donated.

 The one thing I don't think we've done well (and this is my fault) is handle
 releases and backwards compatibility.

I agree with what Nate said on this. The perfect is the enemy of the good.


 Lastly, it would be nice to cover our contingency plan of growing the
 community and coming back with a bigger committer list at some later date.
 However, I think that may arise naturally during the discussion, and it's
 probably too big a topic to squeeze in.
 

+1 on Nate's comments here.

I'm afk most of the day today (Friday), Marvin. I'm fine with the proposal as it
written at the moment; it looks like you've already addressed most of the points
above in your edits from last night.

cheers on a hard week's work!

pek


-- 
Peter Karman  .  http://peknet.com/  .  pe...@peknet.com

[jira] Updated: (SOLR-1144) replication hang

2010-07-02 Thread Toby Cole (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toby Cole updated SOLR-1144:


Attachment: stacktrace-master.txt
stacktrace-slave-1.txt
stacktrace-slave-2.txt

Adding stacktraces for both slave instances and the master instance.

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: stacktrace-master.txt, stacktrace-slave-1.txt, 
 stacktrace-slave-2.txt


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (SOLR-1144) replication hang

2010-07-02 Thread Toby Cole (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884672#action_12884672
 ] 

Toby Cole edited comment on SOLR-1144 at 7/2/10 9:37 AM:
-

Adding stacktraces for both slave instances and the master instance.
These stack traces are from a reproduction of the original problem, so the 
timestamps will not matchup with the XML from the replication-handler 
previously posted.

  was (Author: tub):
Adding stacktraces for both slave instances and the master instance.
  
 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: stacktrace-master.txt, stacktrace-slave-1.txt, 
 stacktrace-slave-2.txt


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1144) replication hang

2010-07-02 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884690#action_12884690
 ] 

Yonik Seeley commented on SOLR-1144:


Thanks for the stack traces Toby!

Interesting... seems like the commit in the slave blocked...
{code}
at 
org.apache.solr.common.util.ConcurrentLRUCache.getLatestAccessedItems(ConcurrentLRUCache.java:276)
{code}

So perhaps another thread locked, but didn't unlock the lock?

SOLR-1538 did fix something that could possibly lead to a deadlock, but it's 
super unlikely (a very small object allocation would have to fail at just the 
right spot).  Still, if this is easy enough to reproduce, could you try Solr 
1.4.1 and see if it's fixed?  (and if it hangs again, be sure to get stack 
traces... they are super helpful!)

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: stacktrace-master.txt, stacktrace-slave-1.txt, 
 stacktrace-slave-2.txt


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1144) replication hang

2010-07-02 Thread Toby Cole (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884693#action_12884693
 ] 

Toby Cole commented on SOLR-1144:
-

Oh yes, should have mentioned... we're already on Solr 1.4.1 in production as 
of yesterday (we don't hang about y'know ;) ).

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: stacktrace-master.txt, stacktrace-slave-1.txt, 
 stacktrace-slave-2.txt


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Core Developer bios

2010-07-02 Thread Marvin Humphrey

On Wed, Jun 30, 2010 at 05:01:45PM -0700, Marvin Humphrey wrote:
 On Wed, Jun 30, 2010 at 11:47:42PM -, Apache Wiki wrote:
  +  * Nathan Kurz has participated in numerous open source projects and has 
  been a KinoSearch committer since 2007.  He is currently Chief Flavor 
  Engineer of Scream Sorbet, and writes software in his copious free time.
 
 Nate, I didn't know what to write to describe your crazy bio, and I don't
 think this does enough to convey your serious C and comp-sci chops.  
 
 I also don't think Chief Flavor Engineer is your title at Scream, though IMO
 it ought to be.
 
 Can you please edit?

Heh, I hear from Nate via private email that he likes the existing bio.  So be
it!

Marvin Humphrey

[Lucy Wiki] Update of LucyIncubatorProposal by Marvin Humphrey

2010-07-02 Thread Apache Wiki

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Lucy Wiki for change 
notification.

The LucyIncubatorProposal page has been changed by MarvinHumphrey.
The comment on this change is: Change Rationale to mention C's 
interoperability, fine-grained control..
http://wiki.apache.org/lucy/LucyIncubatorProposal?action=diffrev1=33rev2=34

--

  Since the Lucene PMC will not be responsible for Lucy much longer, it is more 
appropriate for the software grant to take place within the context of the 
Incubator than the Lucene TLP.  As none of our current members have Apache PMC 
experience, we also seek to take advantage of the Incubator environment to 
prepare ourselves for responsible self-governance.
  
  == Rationale ==
- There is great hunger for a search engine library in the mode of Lucene which 
is accessible from various dynamic languages, and for one accessible from pure 
C.  Individuals naturally wish to code in their language of choice.  
Organizations which do not have significant Java expertise may not want to 
support Java strictly for the sake of running a Lucene installation.  Native 
applications may be launched much more quickly than JVMs.  Lucy will meet all 
these demands.
+ There is great hunger for a search engine library in the mode of Lucene which 
is accessible from various dynamic languages, and for one accessible from pure 
C.  Individuals naturally wish to code in their language of choice.  
Organizations which do not have significant Java expertise may not want to 
support Java strictly for the sake of running a Lucene installation.  
Developers may want to take advantage of C's interoperability and fine-grained 
control.  Lucy will meet all these demands.
  
  Apache is a natural home for our project given the way it has always 
operated: user-driven innovation, security as a requirement, lively and amiable 
mailing list discussions, strength through diversity, and so on.  We feel 
comfortable here, and we believe that we will become exemplary Apache citizens.

[jira] Commented: (SOLR-1144) replication hang

2010-07-02 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884717#action_12884717
 ] 

Yonik Seeley commented on SOLR-1144:


The odd thing is that the line numbers in the stack traces don't match up for 
either 1.4.0 or 1.4.1
Specifically ConcurrentLRUCache.java:276 is in the middle of markAndSweep() in 
both versions (as opposed to getLatestAccessedItems() which your stack trace 
would suggest).

Are these stack traces from stock 1.4.0 or 1.41?  If so, does anyone have a 
clue why the line numbers would be off?


 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: stacktrace-master.txt, stacktrace-slave-1.txt, 
 stacktrace-slave-2.txt


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1144) replication hang

2010-07-02 Thread Toby Cole (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884719#action_12884719
 ] 

Toby Cole commented on SOLR-1144:
-

I know exactly why the line numbers would be off. I just remembered we're using 
a custom war package so we can add our own plugins in (yes, I know we can use 
solr.home/lib, but we've not got round to that yet).

The only classes we're overriding from solr are ConcurrentLRUCache and 
FastLRUCache. This was from pre solr 1.4, when the cache implementations were 
slowing faceting right down.
I have a feeling if I remove those overridden classes and use the new 
(bug-free) ones, the hang may stop.

I'll give it a go now, sorry in advance if it was my oversight that is causing 
this bug to re-appear.
T

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: stacktrace-master.txt, stacktrace-slave-1.txt, 
 stacktrace-slave-2.txt


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Revisions to Incubator proposal

2010-07-02 Thread Marvin Humphrey

On Thu, Jul 01, 2010 at 11:44:46PM -0700, Nathan Kurz wrote:
     There is great hunger for a search engine library in the mode of Lucene
     which is accessible from various dynamic languages, and for one 
  accessible
     from pure C. Individuals naturally wish to code in their language of
     choice. Organizations which do not have significant Java expertise may 
  not
     want to support Java strictly for the sake of running a Lucene instance.
     Native applications can be launched much more quickly than JVMs. Lucy 
  will
     meet all these demands.
 
 I like this, but might stop after the first two sentences.  I think
 they're the stronger reasons.  If you wanted to go further, I'd
 mention the ways in which C has traditionally been the language of
 Unix server programming, and allows one to take full advantage of the
 machine's potential.

OK, I went with this for today:

  Developers may want to take advantage of C's interoperability and
  fine-grained control. 

I do think it's important to at least touch on the advantages of natively
compiled code in the Rationale, and not limit ourselves to the
language-loyalty angle.

 While Lucy certainly can take advantage of the Apache infrastructure, I'm
 not sure it really needs it.  

Acknowledged.  However, the more successful Lucy gets, the more useful it is
to have the support systems and institutions of Apache protecting it and
instilling confidence.  And it's not just about backups, uptime, and fending
off crackers -- it's things like stable governance, legal protection
mechanisms, encouraging contributions from developers sponsored by major
corporations, and so on.

 What it needs is more people writing code for it, and the best way to
 achieve this is probably to get more technically proficient users.

I agree.  That is our most pressing need, so we should prioritize accordingly.

 I think you're doing fine on backwards compatibility, and if anything
 you're spending too much time on it.  Instead of worrying about not
 breaking things that are old, spend the effort making it easier to
 write things that are new.

I believe that the new backwards compatibility policy properly privileges
active developers, while still offering strong stability guarantees to those
who require them.  It's not perfect, but I think it strikes a good balance.  

 Let's make the dogfood tasty, and start writing 'extensions' instead
 of trying to include everything in core.  

A public C API will facilitate extensions.  Had one been available,
ProximityQuery would have been written to be a separate CPAN distro.

I think separating ProximityQuery from core would be a good concrete goal to
help us focus while we design the C API.

 I don't know if you need to discuss this.  And once we have more
 developers, do we really need to come back?   I mean, I really like
 Apache, but Trac plus taking the best points of the Incubator approach
 might offer 90% of the benefit with a lot less overhead.  Which
 doesn't mean Lucy wouldn't benefit from being included, but I wouldn't
 precommit to returning at a later date if they don't want us.

Your response helps to clarify the point that I really wanted to get in there,
which is that a long overdue release is imminent.  That's now covered in the
revised Community section.

OK, I only made that one minor content change, so I think we have adequate
consensus.  I'm going to perform a QA pass and maybe tweak some phrasings so
that the word community doesn't appear eleventy-billion times.  Then I'll
send it off to gene...@lucene.

Thanks, folks -- this has worked out well!

Marvin Humphrey

Re: Trouble updating Solr website

2010-07-02 Thread Chris Hostetter


:  Like I said, on Mahout we went w/ a simple landing page and everything else 
in Confluence, as inspired by OFBiz.
: 
: 
: Sounds good to me any objections?  Any tips on how to get us started?

FWIW: Ryan actually setup a SOLR Confluence wiki a long time ago with the 
plan of it replacing the mail solr site...

https://cwiki.apache.org/SOLRxSITE/

While i have mo particular love for forrest, i would like to suggest the 
following things be considered before moving forward...

1) now may be a good time to consider consolidating the entire Lucene TLP 
website into a single update model (ie: instead of having /java, /solr, 
/pylucene, /lucene.net, etc... all be managed with distinct wikis or 
publishing flows, lets just have one site with one edit model.

2) docs that ship as part of the releases can (and probably should) be 
seperated out and managed independently -- either as distinct wikis, or 
via some other means -- and still available online linked to from the main 
site.  /java already has this working well for the per release lucene-java 
documentation where both teh site and the docs are in forrest -- we 
can follow this model for all of the individual releases independently of 
how the files them selves are generated.

3) if we do move to confluence we should think carefully in advance about 
how we want to deal with the ACLs ... what should be editable to anyone 
with an wiki acount, vs. what should be editable only to people with a CLA 
on file, vs what should be editable to only committers -- the first two in 
particular will heavily impact what can be packaged up in releases.



-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-461) Highlighting TokenStream Truncation capability

2010-07-02 Thread Uwe Schindler (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884806#action_12884806
]

Uwe Schindler commented on SOLR-461:

The TokenFilter for limiting Token count is already in Lucene 3.x and trunk:
LimitTokenCountFilter and a Wrapping Analyzer that adds this filter on top of
any existing Analyzer. I think somebody already added it to Solr, too, dont
know the issue, but have seen it.

Highlighting TokenStream Truncation capability
--

Key: SOLR-461
URL: https://issues.apache.org/jira/browse/SOLR-461
Project: Solr
Issue Type: Improvement
Components: highlighter
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Attachments: SOLR-461.patch

It is sometimes the case when generating snippets that one need not
fragment/analyze the whole document (especially for large documents) in order
to show meaningful snippet highlights.
Patch to follow that adds a counting TokenFilter that returns null after X
number of Tokens have been seen. This filter will then be hooked into the
SolrHighlighter and configurable via solrconfig.xml. The default value will
be Integer.MAX_VALUE or, I suppose, it could be set to whatever Max Field
Length is set to, as well.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-1981) solr.xml should fail to load if multiple cores with the same name

2010-07-02 Thread Hoss Man (JIRA)

solr.xml should fail to load if multiple cores with the same name
-

 Key: SOLR-1981
 URL: https://issues.apache.org/jira/browse/SOLR-1981
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-1981.patch

As noted in email a while back...

http://search.lucidimagination.com/search/document/674bf5dfbbb349bc/multiple_cores_w_same_name_in_solr_xml

{quote}
but there is currently no assertion that every core have a name, or that the
names be unique before the SolrCore is constructed ... it's not until the core 
is registered that an error will be
generated if the core name is null, or that the previous core with an identicle 
name will be close()ed.
{quote}

I think we should fail fast if solr.xml specifies the same name more then once

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2522) add simple japanese tokenizer, based on tinysegmenter

2010-07-02 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2522:


Attachment: LUCENE-2522.patch

i refactored the TinySegmenterConstants to use ints/switch statements instead 
of all the hashmaps.

this creates a larger .java file, but its a smaller .class, and scoring no 
longer has to create 24 strings per character


 add simple japanese tokenizer, based on tinysegmenter
 -

 Key: LUCENE-2522
 URL: https://issues.apache.org/jira/browse/LUCENE-2522
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-2522.patch, LUCENE-2522.patch


 TinySegmenter (http://www.chasen.org/~taku/software/TinySegmenter/) is a tiny 
 japanese segmenter.
 It was ported to java/lucene by Kohei TAKETA k-...@void.in, 
 and is under friendly license terms (BSD, some files explicitly disclaim 
 copyright to the source code, giving a blessing instead)
 Koji knows the author, and already contacted about incorporating into lucene:
 {noformat}
 I've contacted Takeda-san who is the creater of Java version of
 TinySegmenter. He said he is happy if his program is part of Lucene.
 He is a co-author of my book about Solr published in Japan, BTW. ;-)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-1949) overwrite document fails if Solr index is not optimized

2010-07-02 Thread Hoss Man (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man resolved SOLR-1949.

Resolution: Not A Problem

overwrite document fails if Solr index is not optimized
---

Key: SOLR-1949
URL: https://issues.apache.org/jira/browse/SOLR-1949
Project: Solr
Issue Type: Bug
Components: update
Affects Versions: 1.4
Environment: linux centos
Reporter: Miguel B.

Scenario:
- Solr 1.4 with multicore
- We have a set of 5.000 source documents that we want to index.
- We send these set to Solr by SolrJ API and they are added correctly. We
have field ID as string and uniqueKey, so the update operation overwite
documents with the same ID. The result is 4500 unique documents in Solr. Also
all documents have an index field that contains the source repository of each
document, we need it because we want to index another sources.
- After add operation, we send optimization.

All works fine at this point. Solr have 4.500 documents at Solr core (and
4.500 max documents too).

Now these 5.000 sources documents are updated by users, and a set of them are
deleted (supose, 1000). So, now we want to update our Solr index with these
change (unfortunately our repository doesn't support an incremental
approach), the operations are:

- At index Solr, delete documents by query (by the field that contains
document source repository). We use deleteByQuery and commit SolrJ operations.
- At this point Solr core have 0 documents (but 4.500 max documents,
important!!!)
- Now we add to Solr the new version of source documents (4000). Remember
that documents don't have unique identifiers, supose that unique items are
3000. So when add operation finish (after commit sended) Solr index must have
3.000 unique items.

But the result isn't 3.000 unique items, we obtains a random results: 3000,
2980, 2976, etc. It's a serious problem because we lost documents.
We have a workaround. At these operations just after delete operation, we
send an optimization to Solr (maxDocuments are updated). After this, we send
new documents. By this way the result is always fine.
In our tests, we can see that this issue is only when the new documents
overwrites documents that existed in solr.
Thanks!!

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1949) overwrite document fails if Solr index is not optimized

2010-07-02 Thread Hoss Man (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884849#action_12884849
]

Hoss Man commented on SOLR-1949:

1) In the future, please don't use jira to ask questions about odd behavior you
are seeing -- that is what the solr-user mailing list is for. as a general
rule you should only open a Bug in Jira issue if you have already asked a
question on the solr-user mailing list and have verified that there isn't a
mistake/missunderstanding in your config.

2) your initial report is inconsistent and makes no sense...

bq. We have field ID as string and uniqueKey, so the update operation overwite
documents with the same ID.
...
bq. Remember that documents don't have unique identifiers

...if you do follow up on solr-user, please be more explicit, and clarify
exactly what you are doing (showing us your schema.xml, and some sample
documents is pretty much a neccdessity to make sense of problems like this that
don't produce an error message)

overwrite document fails if Solr index is not optimized
---

Key: SOLR-1949
URL: https://issues.apache.org/jira/browse/SOLR-1949
Project: Solr
Issue Type: Bug
Components: update
Affects Versions: 1.4
Environment: linux centos
Reporter: Miguel B.

All works fine at this point. Solr have 4.500 documents at Solr core (and
4.500 max documents too).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1982) Leading wildcard queries work for all fields if ReversedWildcardFilterFactory is used for any field

2010-07-02 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884847#action_12884847
 ] 

Hoss Man commented on SOLR-1982:


The behavior comes from the fact that during initialization, 
SolrQueryParser.checkAllowLeadingWildcards calls setAllowLeadingWildcard(true); 
if any field type uses ReversedWildcardFilterFactory.

Then in getWildcardQuery, it the specific field type before calling 
ReverseStringFilter.reverse, but then, regardless of field type, delegates to 
super.getWildcardQuery which will allow the leading wildcard for *all* fields 
based on the previous call to setAllowLeadingWildcard(true).

I'm not really sure what the intention was for fields that don't use 
ReversedWildcardFilterFactory, but the current behavior makes no sense at all.  
Either leading wildcards should *only* be allowed for fieldtypes that use 
ReversedWildcardFilterFactory, or the QParser should have a config option to 
control it for other fields -- but as it stands it makes no sense what so ever.

 Leading wildcard queries work for all fields if 
 ReversedWildcardFilterFactory is used for any field
 ---

 Key: SOLR-1982
 URL: https://issues.apache.org/jira/browse/SOLR-1982
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4, 1.4.1
Reporter: Hoss Man

 As noted on the mailing list...
 http://search.lucidimagination.com/search/document/8064e6877f49e4c4/leading_wildcard_query_strangeness
 ...SolrQueryParse supports leading wild card queries for *any* field as long 
 as at least one field type exists in the schema.xml which uses 
 ReversedWildcardFilterFactory -- even if that field type is never used.
 This is extremely confusing, and ost likely indicates a bug in how 
 SolrQueryParser deals with ReversedWildcardFilterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1283) Mark Invalid error on indexing

2010-07-02 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-1283:
---

Fix Version/s: 3.1
   4.0

we have a patch that *seems* to work, so we should dfinitely try to get this 
into the next release ... i'm hoping someone more familiar with the code can 
sanity check it soon.

 Mark Invalid error on indexing
 --

 Key: SOLR-1283
 URL: https://issues.apache.org/jira/browse/SOLR-1283
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3
 Environment: Ubuntu 8.04, Sun Java 6
Reporter: solrize
 Fix For: 3.1, 4.0

 Attachments: SOLR-1283.modules.patch, SOLR-1283.patch


 When indexing large (1 megabyte) documents I get a lot of exceptions with 
 stack traces like the below.  It happens both in the Solr 1.3 release and in 
 the July 9 1.4 nightly.  I believe this to NOT be the same issue as SOLR-42.  
 I found some further discussion on solr-user: 
 http://www.nabble.com/IOException:-Mark-invalid-while-analyzing-HTML-td17052153.html
  
 In that discussion, Grant asked the original poster to open a Jira issue, but 
 I didn't see one so I'm opening one; please feel free to merge or close if 
 it's redundant. 
 My stack trace follows.
 Jul 15, 2009 8:36:42 AM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/update params={} status=500 QTime=3 
 Jul 15, 2009 8:36:42 AM org.apache.solr.common.SolrException log
 SEVERE: java.io.IOException: Mark invalid
 at java.io.BufferedReader.reset(BufferedReader.java:485)
 at 
 org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171)
 at 
 org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728)
 at 
 org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742)
 at java.io.Reader.read(Reader.java:123)
 at 
 org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:108)
 at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:178)
 at 
 org.apache.lucene.analysis.standard.StandardFilter.next(StandardFilter.java:84)
 at 
 org.apache.lucene.analysis.LowerCaseFilter.next(LowerCaseFilter.java:53)
 at 
 org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:347)
 at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159)
 at 
 org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36)
 at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:748)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2512)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2484)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1292)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
   at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
   at

Re: [VOTE] RC2 Release Solr 1.4.1

2010-07-02 Thread Chris Hostetter

: Is there any chance for the patch posted on issue SOLR-1283 (
: https://issues.apache.org/jira/browse/SOLR-1283) to be integrated in the
: forthcoming release ?
: It appears this old issue is still present in 1.4.1 rc2.

Julien:

I'm afraid it's definitely too late for it to make it into 1.4.1 -- by the 
looks of it it was already too late when you sent your email (i think 
timing wise the release was already official, there just hadn't been an 
announcement yet while miller was waiting for it to hit the mirrors)

Sometimes patches slip through the cracks, and that seems to be what 
happened with your patch in SOLR-1283 -- i'm not familiar enough with that 
code to commit to feel comfortable committing it myself -- but i've 
flagged it in Jira in hopes that someone who is might take a look.


-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1283) Mark Invalid error on indexing

2010-07-02 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-1283:
---

Attachment: SOLR-1283.modules.patch

Updates patch to trunk (where the charfilter stuff has been refactored into the 
new top level modules directory)

I'm not familiar with the HTMLStripCharFilter stuff, so i can't say whether the 
fix is correct (no idea if peek should be incrementing that counter -- 
that's why even private methods should have javadocs), but the test certainly 
looks valid to me

 Mark Invalid error on indexing
 --

 Key: SOLR-1283
 URL: https://issues.apache.org/jira/browse/SOLR-1283
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3
 Environment: Ubuntu 8.04, Sun Java 6
Reporter: solrize
 Attachments: SOLR-1283.modules.patch, SOLR-1283.patch


 When indexing large (1 megabyte) documents I get a lot of exceptions with 
 stack traces like the below.  It happens both in the Solr 1.3 release and in 
 the July 9 1.4 nightly.  I believe this to NOT be the same issue as SOLR-42.  
 I found some further discussion on solr-user: 
 http://www.nabble.com/IOException:-Mark-invalid-while-analyzing-HTML-td17052153.html
  
 In that discussion, Grant asked the original poster to open a Jira issue, but 
 I didn't see one so I'm opening one; please feel free to merge or close if 
 it's redundant. 
 My stack trace follows.
 Jul 15, 2009 8:36:42 AM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/update params={} status=500 QTime=3 
 Jul 15, 2009 8:36:42 AM org.apache.solr.common.SolrException log
 SEVERE: java.io.IOException: Mark invalid
 at java.io.BufferedReader.reset(BufferedReader.java:485)
 at 
 org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171)
 at 
 org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728)
 at 
 org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742)
 at java.io.Reader.read(Reader.java:123)
 at 
 org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:108)
 at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:178)
 at 
 org.apache.lucene.analysis.standard.StandardFilter.next(StandardFilter.java:84)
 at 
 org.apache.lucene.analysis.LowerCaseFilter.next(LowerCaseFilter.java:53)
 at 
 org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:347)
 at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159)
 at 
 org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36)
 at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:748)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2512)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2484)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1292)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
   at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at

[jira] Created: (SOLR-1983) snappuller fails when modifiedConfFiles is not empty and full copy of index is needed

2010-07-02 Thread Hoss Man (JIRA)

snappuller fails when modifiedConfFiles is not empty and full copy of index is 
needed
-

 Key: SOLR-1983
 URL: https://issues.apache.org/jira/browse/SOLR-1983
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.4


see the mail thread. http://markmail.org/thread/orxyqftqrsqvrt5w


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1983) snappuller fails when modifiedConfFiles is not empty and full copy of index is needed

2010-07-02 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884867#action_12884867
 ] 

Hoss Man commented on SOLR-1983:


cloned from SOLR-1264 where a new attachment was recently added addressing a 
continuation of the original problem in a specific situation.

Cloned rather then reopend because SOLR-1264 is already closed and part of Solr 
1.4 -- for tracking in CAHNGES.txt we need a new issue number -- see comments 
in SOLR-1264 for more details



 snappuller fails when modifiedConfFiles is not empty and full copy of index 
 is needed
 -

 Key: SOLR-1983
 URL: https://issues.apache.org/jira/browse/SOLR-1983
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.4


 see the mail thread. http://markmail.org/thread/orxyqftqrsqvrt5w

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1556) TermVectorComponents should provide good error messages when fieldtype isn't compatible with requested options

2010-07-02 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1556:
--

Attachment: SOLR-1556.patch

Ready to go.  Going to commit this weekend.

 TermVectorComponents should provide good error messages when fieldtype isn't 
 compatible with requested options
 --

 Key: SOLR-1556
 URL: https://issues.apache.org/jira/browse/SOLR-1556
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.4
Reporter: Hoss Man
Assignee: Grant Ingersoll
 Fix For: Next

 Attachments: SOLR-1556.patch, SOLR-1556.patch


 As noted by grant on the email list, asking TermVectorComponent for things 
 like termVectors, positions, and offsets can't produce meaningful results 
 unless the field in question has the corrisponding schema option set to true 
 -- but the behavior of TVC when they not true is confusing to users. 
 We should make TVC return a meaningful error if it's asked to return  a 
 certain type of info for field that it can't deal with - something making it 
 clear what properties of hte schema need to be changed to make it work...
 http://old.nabble.com/Re%3A-TermVectorComponent-%3A-Required---Optional-Parameters-p26181454.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Hudson build is back to normal : Lucene-3.x #59

2010-07-02 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Lucene-3.x/59/



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Hudson build is back to normal : Lucene-trunk #1232

2010-07-02 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/1232/changes



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Revisions to Incubator proposal

[Lucy Wiki] Update of LucyIncubatorProposal by Nathan Kurz

[jira] Created: (LUCENE-2523) if index is too old you should hit an exception saying so

Build failed in Hudson: Solr-trunk #1195

[jira] Commented: (SOLR-1144) replication hang

Re: [Lucy] Revisions to Incubator proposal

[jira] Updated: (SOLR-1144) replication hang

[jira] Issue Comment Edited: (SOLR-1144) replication hang

[jira] Commented: (SOLR-1144) replication hang

[jira] Commented: (SOLR-1144) replication hang

Re: Core Developer bios

[Lucy Wiki] Update of LucyIncubatorProposal by Marvin Humphrey

[jira] Commented: (SOLR-1144) replication hang

[jira] Commented: (SOLR-1144) replication hang

Re: Revisions to Incubator proposal

Re: Trouble updating Solr website

[jira] Commented: (SOLR-461) Highlighting TokenStream Truncation capability

[jira] Created: (SOLR-1981) solr.xml should fail to load if multiple cores with the same name

[jira] Updated: (LUCENE-2522) add simple japanese tokenizer, based on tinysegmenter

[jira] Resolved: (SOLR-1949) overwrite document fails if Solr index is not optimized

[jira] Commented: (SOLR-1949) overwrite document fails if Solr index is not optimized

[jira] Commented: (SOLR-1982) Leading wildcard queries work for all fields if ReversedWildcardFilterFactory is used for any field

[jira] Updated: (SOLR-1283) Mark Invalid error on indexing

Re: [VOTE] RC2 Release Solr 1.4.1

[jira] Updated: (SOLR-1283) Mark Invalid error on indexing

[jira] Created: (SOLR-1983) snappuller fails when modifiedConfFiles is not empty and full copy of index is needed

[jira] Commented: (SOLR-1983) snappuller fails when modifiedConfFiles is not empty and full copy of index is needed

[jira] Updated: (SOLR-1556) TermVectorComponents should provide good error messages when fieldtype isn't compatible with requested options

Hudson build is back to normal : Lucene-3.x #59

Hudson build is back to normal : Lucene-trunk #1232

30 matches

Site Navigation

Mail list logo

Footer information