date:20100825

Re: do the Java and Python garbage collectors talk to each other, with JCC?

2010-08-25 Thread Bill Janssen

Andi Vajda va...@apache.org wrote:

 
 On Tue, 24 Aug 2010, Bill Janssen wrote:
 
  I'm starting to see traces like the following in my UpLib (OS X 10.5.8,
  32-bit Python 2.5, Java 6, JCC-2.6, PyLucene-2.9.3) that indicate an
  out-of-memory issue.  I spawn a lot of short-lived threads in Python,
  and each of them is attached to Java, and detached after the run
  method returns.  I've run test programs that do nothing but repeatedly
  start new threads that then invoke pylucene to index a document, and see
  no problems.
 
  I'm trying to come up with a hypothesis for this.  One of the things I'm
  wondering is if my Python memory space is approaching the limit, does
  PyLucene arrange for the Java garbage collector to invoke the Python
  garbage collector if it can't allocate memory?
 
 No, not that I know of. The only fancy exchange between the Python
 world and the Java world is for 'extensions' of Java classes in
 Python. These are in a deadly embrace since they keep track of each
 other. A proxy object and some weak reference tricks do their work to
 resolve this cleanly. But this assumes the ref count on the Python
 side becomes 0 or that the finalize() method on the Java side is
 invoked (for which there is no guarantee according to the spec).

I don't think that's the issue.  I'm keeping an eye on the refs via
_dumpRefs(), and they seem OK, no matter how many new threads I create.

As I understand it, the Java GC allocates two blocks of memory (heap and
stack) immediately when creating a new thread, and does its own
allocations to the thread from within these blocks -- the JVM GC works
exclusively within this allocated heap block.  These blocks are returned
to the system when the thread exits.  The Python GC, in contrast, works
globally, allocating memory blocks as needed and returning them to the
system when possible, asynchronously respective to thread creation and
completion.

What I think's happening is that Java is attempting to create a thread,
and fails because the system (malloc) can't allocate a large enough heap
block.  The weak-ref'ed allocations that could be freed are on the
Python side of the world, not the Java side.  I wonder if it would be
possible to add a hook somehow to the Java GC that would call into
Python and have Python run its GC, too.  Though I'm not sure the Java GC
is being called at all, so perhaps this hook would have to be in the
part of the Java VM that calls malloc, the thread creation code.

  Note that the thread being unsuccessfully started isn't mine; it's
  being started by Java.
 
 It is generally better practice to pool threads and to reuse them
 instead of allocating them for short-lived tasks.

Sure, but tell that to the Lucene folks.  They're the ones starting a
new thread here.  Of course, now and then one needs to start a new
thread.

 I have personally no
 confidence in the JNI thread detaching mechanism... If it works, great
 but...
 
 As an aside, here is what I found out about using Java-created threads
 in Python:
 
 When Java creates a thread, Python is not being told about it and the
 Python VM considers this thread dummy, that is, without a thread state
 object. In other words, Python doesn't have a documented
 'attachCurrentThread()' call.
 
 Instead, a Python thread state object is allocated at every call
 entering the Python VM from the Java VM running on such a dummy thread
 and is freed upon return.
 
 The buggy side effect of this is that you lose your thread-local
 storage between such calls and pay an extra thread state allocation
 cost for every such call into Python when the GIL is acquired.
 
 A workaround for this is to create and increment this thread state
 object's ref count when the Java thread is first created and to
 decrement it upon thread completion. This is what the
 PythonVM.acquire/releaseThreadState() methods are for in jcc.cpp. The
 PythonVM class is used when embedding a Python VM in a Java VM as when
 running Python code in a Tomcat process, for example. Maybe these
 methods should move elsewhere if they have potential uses outside this
 scenario...

Yes, that sounds useful.

Bill

 
 Andi..
 
 
  Bill
 
  thr1730: Running document rippers raised the following exception:
  thr1730: Traceback (most recent call last):
  thr1730:File /local/share/UpLib-1.7.9/code/uplib/newFolder.py, line 
  282, in _run_rippers
  thr1730: ripper.rip(folderpath, id)
  thr1730:File /local/share/UpLib-1.7.9/code/uplib/createIndexEntry.py, 
  line 187, in rip
  thr1730: index_folder(location, self.repository().index_path())
  thr1730:File /local/share/UpLib-1.7.9/code/uplib/createIndexEntry.py, 
  line 82, in index_folder
  thr1730: c.index(folder, doc_id)
  thr1730:File /local/share/UpLib-1.7.9/code/uplib/indexing.py, line 
  813, in index
  thr1730: self.reopen()
  thr1730:File /local/share/UpLib-1.7.9/code/uplib/indexing.py, line 
  635, in reopen
  thr1730: self.current_writer.flush()
  thr1730:

Hudson build is back to normal : Solr-3.x #85

2010-08-25 Thread Apache Hudson Server

See https://hudson.apache.org/hudson/job/Solr-3.x/85/changes



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2079) Expose HttpServletRequest object from SolrQueryRequest object

2010-08-25 Thread JIRA

[
https://issues.apache.org/jira/browse/SOLR-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902359#action_12902359
]

Jan Høydahl commented on SOLR-2079:
---

I have been using SolrParams to convey metadata from frontends to middleware
layer, and I think it has worked really well. In addition, you get it included
in the query logs!
As for load balancers, most have an option to convey the client's IP in the
X-Forwarded-For header.

What if the dispatchFilter adds all HTTP headers to the SolrQueryRequest
context. Then we could map explicitly in requestHandler config how to use them:

{code:xml}
lst name=invariants
str name=_http_remote-ip$HTTP_HEADERS(X-Forwarded-For,
Remote-Address)/str
/lst
{code}

This would mean that if HTTP header X-Forwarded-For exists in the context, it
will be mapped to param _http_remote-ip, if not, it will use Remote-Address. In
this way each application can choose whether to pollute the SolrParams with
headers or not, choose naming as well as whether it should be invariant or
default.

Expose HttpServletRequest object from SolrQueryRequest object
-

Key: SOLR-2079
URL: https://issues.apache.org/jira/browse/SOLR-2079
Project: Solr
Issue Type: Improvement
Components: Response Writers, search
Reporter: Chris A. Mattmann
Fix For: 3.1

Attachments: SOLR-2079.Quach.Mattmann.082310.patch.txt

This patch adds the HttpServletRequest object to the SolrQueryRequest object.
The HttpServletRequest object is needed to obtain the client's IP address for
geotargetting, and is part of the patches from W. Quach and C. Mattmann.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Should analysis.jsp honor maxFieldLength

2010-08-25 Thread Jan Høydahl / Cominvent

What about an option to override this on a per field-type and/or per field 
basis. Then the global setting could still be default:

   fieldType name=text class=solr.TextField positionIncrementGap=100 
maxLength=10
OR
   field name=teaser type=text indexed=true stored=true 
maxLength=10/

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 24. aug. 2010, at 20.56, Eric Pugh wrote:

 I did always think that the global maxFieldLength was odd.  In one project I 
 have, 10,000 is fine except for 1 field that I would like to bump up to 
 100,000, and there isn't (as far as I know) a way to do that.  Is there any 
 real negative effect to swapping to maxFieldLength of 100,000 (with the 
 caveat that the auto truncation won't be working!)?   
 
 The filter approach that you pointed out does make sense, the only worry I 
 have is that it might make building analyzers more complex.  One of the 
 things I treasure about Solr is how many decisions it makes for you out of 
 the box that are right so very often, and therefore how simple it is.  If 
 every user needs to think about maxFieldLength from day one, then that might 
 make life more complex.
 
 Eric
 
 
 
 
 
 On Aug 24, 2010, at 2:44 PM, Robert Muir wrote:
 
 
 
 On Tue, Aug 24, 2010 at 2:29 PM, Eric Pugh ep...@opensourceconnections.com 
 wrote:
 I created a patch file at https://issues.apache.org/jira/browse/SOLR-2086.  
 I went with the simplest approach since I didn't want to confuse things by 
 having extra filters being added to what the user created.  However, either 
 approach would work!
 
 
 
 One idea here was that this maxFieldLength might be going away: see 
 https://issues.apache.org/jira/browse/LUCENE-2295 for more information 
 (though i notice its still not listed as deprecated?).
 
 But for now its worth mentioning: The filter is more flexible, for example 
 it supports per-field configuration (and of course if you use the filter 
 instead, which you can do now, it will automatically work in analysis.jsp). 
 
  
 -- 
 Robert Muir
 rcm...@gmail.com
 
 -
 Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
 http://www.opensourceconnections.com
 Co-Author: Solr 1.4 Enterprise Search Server available from 
 http://www.packtpub.com/solr-1-4-enterprise-search-server
 Free/Busy: http://tinyurl.com/eric-cal
 
 
 
 
 
 
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2095) Document not guaranteed to be found after write and commit

2010-08-25 Thread vijaykumarraja.grandhi (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902371#action_12902371
]

vijaykumarraja.grandhi commented on LUCENE-2095:

I am currently using LUCENE Net 2.9.2 version. We have upgraded from v1.9.0 to
2.9.2. Basically we want to use threading concept now. But i am strucked with a
lock. How to over come with these locks. Can any one provide .net code sample.
Thank you in advance.

Document not guaranteed to be found after write and commit
--

Key: LUCENE-2095
URL: https://issues.apache.org/jira/browse/LUCENE-2095
Project: Lucene - Java
Issue Type: Bug
Components: Index
Affects Versions: 2.4.1, 2.9.1
Environment: Linux 64bit
Reporter: Sanne Grinovero
Assignee: Michael McCandless
Fix For: 2.9.2, 3.0.1, 4.0

Attachments: LUCENE-2095.patch, lucene-stresstest.patch

after same email on developer list:
I developed a stress test to assert that a new document containing a
specific term X is always found after a commit on the IndexWriter.
This works most of the time, but it fails under load in rare occasions.
I'm testing with 40 Threads, both with a SerialMergeScheduler and a
ConcurrentMergeScheduler, all sharing a common IndexWriter.
Attached testcase is using a RAMDirectory only, but I verified a
FSDirectory behaves in the same way so I don't believe it's the
Directory implementation or the MergeScheduler.
This test is slow, so I don't consider it a functional or unit test.
It might give false positives: it doesn't always fail, sorry I
couldn't find out how to make it more likely to happen, besides
scheduling it to run for a longer time.
I tested this to affect versions 2.4.1 and 2.9.1;

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene Test Failure: org.apache.lucene.search.TestCachingWrapperFilter.testEnforceDeletions (from TestCachingWrapperFilter)

2010-08-25 Thread Michael McCandless

OK I just cut this test over to SMS, and, took steps to make sure the
reader is not GC'd.  It seems to be passing now ;)

Mike

On Tue, Aug 24, 2010 at 6:46 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Hmm so cms.sync() wasn't it -- I just saw it fail again.

 Uwe you are right -- we are failing to keep a hard ref to the old
 reader, for this one assert.

 Yet if I try to keep a ref, I still see it sometimes fail... still digging...

 Mike

 On Tue, Aug 24, 2010 at 5:49 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Yeah the key should still have a hard ref.

 The key is either the SegmentReader instance, or it's CoreReader
 instance.  The tests holds a hard ref to the parent reader, which then
 references the subs.

 I think it may instead be due to CMS, ie, we reopen the reader before
 a merge completes, then the merge completes, then the next reopen
 (which assumes there will be no changes) sees the completed merge as a
 change.

 I'll try inserting CMS.sync() to the test...

 Mike

 On Tue, Aug 24, 2010 at 5:44 PM, Uwe Schindler u...@thetaphi.de wrote:
 Right, but has the key any refs?

 This was my only explanation for the bug. My problem is, that I had no time
 to look closely into the test and I did not understand the new deletion
 modes completely and what the test tries to do. This changed since 3.0 when
 I modified the filter the last time (at ApacheCon US).

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Tuesday, August 24, 2010 11:38 PM
 To: dev@lucene.apache.org
 Subject: Re: Lucene Test Failure:
 org.apache.lucene.search.TestCachingWrapperFilter.testEnforceDeletions
 (from TestCachingWrapperFilter)

 Wait -- it's a WeakHashMap right?  Entries should not be removed unless
 the
 key no longer has any hard refs?

 Mike

 On Tue, Aug 24, 2010 at 5:34 PM, Uwe Schindler u...@thetaphi.de wrote:
  Weh ad the same on hudson a few days ago. The problem is a too heavy
  GC (because if GC is very active it removes the entry from the cache
  and then this error occurs).
 
  This is a bug in the test. To test this correctly we can either:
  - during test replace WeakHashMap by a conventional HashMap (the map
  is package private, maybe we replace it in the test)
  - hold a reference to the cache entry during the test (that is the
  DocIdSet)
 
  Uwe
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Mark Miller [mailto:markrmil...@gmail.com]
  Sent: Tuesday, August 24, 2010 10:33 PM
  To: dev@lucene.apache.org
  Subject: Lucene Test Failure:
  org.apache.lucene.search.TestCachingWrapperFilter.testEnforceDeletion
  s
  (from TestCachingWrapperFilter)
 
  Error Message
 
  expected:2 but was:3
  Stacktrace
 
  junit.framework.AssertionFailedError: expected:2 but was:3
        at
 
  org.apache.lucene.search.TestCachingWrapperFilter.testEnforceDeletions
  (Test
  CachingWrapperFilter.java:228)
        at
  org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:380
  )
        at
  org.apache.lucene.util.LuceneTestCase.run(LuceneTestCase.java:372)
  Standard Output
 
  NOTE: random codec of testcase 'testEnforceDeletions' was: PreFlex
  NOTE: random locale of testcase 'testEnforceDeletions' was: zh_CN
  NOTE: random timezone of testcase 'testEnforceDeletions' was:
  Etc/GMT+4
  NOTE: random seed of testcase 'testEnforceDeletions' was: -
  46038615367376670
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2598) allow tests to use different Directory impls

2010-08-25 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2598:


Attachment: LUCENE-2598.patch

ok, here is the previous patch, except random is now enabled by default. (but 
most of the time uses ramdirectory so the tests are still generally quick)

 allow tests to use different Directory impls
 

 Key: LUCENE-2598
 URL: https://issues.apache.org/jira/browse/LUCENE-2598
 Project: Lucene - Java
  Issue Type: Test
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2598.patch, LUCENE-2598.patch, LUCENE-2598.patch, 
 LUCENE-2598.patch, LUCENE-2598.patch, LUCENE-2598.patch, LUCENE-2598.patch, 
 LUCENE-2598.patch, LUCENE-2598.patch, LUCENE-2598.patch


 Now that all tests use MockRAMDirectory instead of RAMDirectory, they are all 
 picky like windows and force our tests to
 close readers etc before closing the directory.
 I think we should do the following:
 # change new MockRAMDIrectory() in tests to .newDirectory(random)
 # LuceneTestCase[J4] tracks if all dirs are closed at tearDown and also 
 cleans up temp dirs like solr.
 # factor out the Mockish stuff from MockRAMDirectory into MockDirectoryWrapper
 # allow a -Dtests.directoryImpl or simpler to specify the default Directory 
 to use for tests: default being random
 i think theres a chance we might find some bugs that havent yet surfaced 
 because they are easier to trigger with FSDir
 Furthermore, this would be beneficial to Directory-implementors as they could 
 run the entire testsuite against their Directory impl, just like 
 codec-implementors can do now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-2590) Enable access to the freq information in a Query's sub-scorers

2010-08-25 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-2590:
---

Assignee: Simon Willnauer  (was: Michael McCandless)

 Enable access to the freq information in a Query's sub-scorers
 --

 Key: LUCENE-2590
 URL: https://issues.apache.org/jira/browse/LUCENE-2590
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Attachments: LUCENE-2590.patch, LUCENE-2590.patch, LUCENE-2590.patch, 
 LUCENE-2590.patch


 The ability to gather more details than just the score, of how a given
 doc matches the current query, has come up a number of times on the
 user's lists.  (most recently in the thread Query Match Count by
 Ryan McV on java-user).
 EG if you have a simple TermQuery foo, on each hit you'd like to
 know how many times foo occurred in that doc; or a BooleanQuery +foo
 +bar, being able to separately see the freq of foo and bar for the
 current hit.
 Lucene doesn't make this possible today, which is a shame because
 Lucene in fact does compute exactly this information; it's just not
 accessible from the Collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2590) Enable access to the freq information in a Query's sub-scorers

2010-08-25 Thread Simon Willnauer (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902379#action_12902379
]

Simon Willnauer commented on LUCENE-2590:
-

bq. Oh I see we can't quite have Scorer impl this because it doesn't know the
query. But maybe we can factor out a common method, that the subclass passed
the query to?
I had the same idea in a previous iteration but since Scorer doesn't know about
the Query the scorer concerns I can not do the call. One way of doing it would
be adding the scorers {{Weight}} as a protected final member since {{Weight}}
already has a {{#getQuery()}} method we can easily access it or throw an
UnsupportedOperationException if the weight is null (force it via ctor and have
a default one which sets it to null).

Since the most of the scorers know their {{Weight}} anyway and would need to
call the visitor we can also factor it out.

bq. Also, we are missing some scorers (SpanScorer,
ConstantScoreQuery.ConstantScorer, probably others), but if we do the super
approach, we'd get these for free (I think?).
most of them would then be for free though!

Thoughts?

Enable access to the freq information in a Query's sub-scorers
--

Key: LUCENE-2590
URL: https://issues.apache.org/jira/browse/LUCENE-2590
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Reporter: Michael McCandless
Assignee: Simon Willnauer
Attachments: LUCENE-2590.patch, LUCENE-2590.patch, LUCENE-2590.patch,
LUCENE-2590.patch

The ability to gather more details than just the score, of how a given
doc matches the current query, has come up a number of times on the
user's lists. (most recently in the thread Query Match Count by
Ryan McV on java-user).
EG if you have a simple TermQuery foo, on each hit you'd like to
know how many times foo occurred in that doc; or a BooleanQuery +foo
+bar, being able to separately see the freq of foo and bar for the
current hit.
Lucene doesn't make this possible today, which is a shame because
Lucene in fact does compute exactly this information; it's just not
accessible from the Collector.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2031) QueryComponent's default query parser should be configurable from solrconfig.xml

2010-08-25 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated SOLR-2031:
--

Attachment: SOLR-2031.patch

updated patch to trunk - if nobody objects I'd commit that in a day or two

 QueryComponent's default query parser should be configurable from 
 solrconfig.xml
 

 Key: SOLR-2031
 URL: https://issues.apache.org/jira/browse/SOLR-2031
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Reporter: Karl Wright
Assignee: Simon Willnauer
Priority: Minor
 Attachments: SOLR-2031.patch, SOLR-2031.patch, SOLR-2031.patch


 In a multi-lucene-query environment, QueryComponent's way of selecting a 
 default query parser must include solrconfig.xml support to be useful.  It 
 can't just get the default query parser from the request arguments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2616) FastVectorHighlighter: out of alignment when the first value is empty in multiValued field

2010-08-25 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved LUCENE-2616.


Fix Version/s: 3.1
   4.0
   Resolution: Fixed

trunk: Committed revision 989035.
branch_3x: Committed revision 989056.


 FastVectorHighlighter: out of alignment when the first value is empty in 
 multiValued field
 --

 Key: LUCENE-2616
 URL: https://issues.apache.org/jira/browse/LUCENE-2616
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.9.3
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2616.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2095) Document not guaranteed to be found after write and commit

2010-08-25 Thread vijaykumarraja.grandhi (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902410#action_12902410
]

vijaykumarraja.grandhi commented on LUCENE-2095:

Please help me. Slowly all my trails are getting dried out.

Document not guaranteed to be found after write and commit
--

Attachments: LUCENE-2095.patch, lucene-stresstest.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2095) Document not guaranteed to be found after write and commit

2010-08-25 Thread vijaykumarraja.grandhi (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902410#action_12902410
]

vijaykumarraja.grandhi edited comment on LUCENE-2095 at 8/25/10 8:41 AM:
-

Please help me. Slowly all my trails are getting dried out. failing to resolve
multi threading with Lucene. It is getting deadlog. Always I am seeing some
Write.Lock file inside Index folder.

was (Author: gvkraj23):
Please help me. Slowly all my trails are getting dried out.

Document not guaranteed to be found after write and commit
--

Attachments: LUCENE-2095.patch, lucene-stresstest.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2598) allow tests to use different Directory impls

2010-08-25 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902433#action_12902433
 ] 

Robert Muir commented on LUCENE-2598:
-

the fixes to NIOFS and MMap are committed in revision 989030.

On windows all tests pass with all directory impls, but the default is still 
RAMDirectory until at least we verify macos and linux are ok with random.


 allow tests to use different Directory impls
 

 Key: LUCENE-2598
 URL: https://issues.apache.org/jira/browse/LUCENE-2598
 Project: Lucene - Java
  Issue Type: Test
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2598.patch, LUCENE-2598.patch, LUCENE-2598.patch, 
 LUCENE-2598.patch, LUCENE-2598.patch, LUCENE-2598.patch, LUCENE-2598.patch, 
 LUCENE-2598.patch, LUCENE-2598.patch, LUCENE-2598.patch


 Now that all tests use MockRAMDirectory instead of RAMDirectory, they are all 
 picky like windows and force our tests to
 close readers etc before closing the directory.
 I think we should do the following:
 # change new MockRAMDIrectory() in tests to .newDirectory(random)
 # LuceneTestCase[J4] tracks if all dirs are closed at tearDown and also 
 cleans up temp dirs like solr.
 # factor out the Mockish stuff from MockRAMDirectory into MockDirectoryWrapper
 # allow a -Dtests.directoryImpl or simpler to specify the default Directory 
 to use for tests: default being random
 i think theres a chance we might find some bugs that havent yet surfaced 
 because they are easier to trigger with FSDir
 Furthermore, this would be beneficial to Directory-implementors as they could 
 run the entire testsuite against their Directory impl, just like 
 codec-implementors can do now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2088) contrib/extraction fails on a turkish computer

2010-08-25 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902453#action_12902453
 ] 

Mark Miller commented on SOLR-2088:
---

I'm running into this on my hudson box - more info:

Stacktrace

junit.framework.AssertionFailedError: query failed XPath: //*...@numfound='1']
 xml response was: ?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint 
name=QTime3/int/lstresult name=response numFound=0 start=0/
/response

 request was: start=0q=title:Welcomeqt=standardrows=20version=2.2
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:320)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:310)
at 
org.apache.solr.handler.ExtractingRequestHandlerTest.testExtraction(ExtractingRequestHandlerTest.java:83)
Standard Output

NOTE: random codec of testcase 'testExtraction' was: MockSep
NOTE: random locale of testcase 'testExtraction' was: tr
NOTE: random timezone of testcase 'testExtraction' was: Africa/Dar_es_Salaam
Standard Error

25.Ağu.2010 08:51:38 org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'a'
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:321)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:120)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:125)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:195)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.util.TestHarness.queryAndResponse(TestHarness.java:334)
at 
org.apache.solr.handler.ExtractingRequestHandlerTest.loadLocal(ExtractingRequestHandlerTest.java:361)
at 
org.apache.solr.handler.ExtractingRequestHandlerTest.testDefaultField(ExtractingRequestHandlerTest.java:149)

 contrib/extraction fails on a turkish computer
 --

 Key: SOLR-2088
 URL: https://issues.apache.org/jira/browse/SOLR-2088
 Project: Solr
  Issue Type: Bug
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Robert Muir
 Fix For: 3.1, 4.0


 reproduce with: ant test -Dtests.locale=tr_TR
 {noformat}
 test:
 [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
 [junit]  xml response was: ?xml version=1.0 encoding=UTF-8?
 [junit] response
 [junit] lst name=responseHeaderint name=status0/intint 
 name=QTime5/int/lst
 result name=response numFound=0 start=0/
 [junit] /response
 [junit]
 [junit]  request was: 
 start=0q=title:Welcomeqt=standardrows=20version=2.2)
 [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
 [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
 BUILD FAILED
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1566) Allow components to add fields to outgoing documents

2010-08-25 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902456#action_12902456
]

Grant Ingersoll commented on SOLR-1566:
---

I think we all have generally worked around the same issues here, between this
and SOLR-1298. I guess we just need to pick some names and work it out.

One thing about this last patch (and mine, I think) is that perhaps we should
just put the augmenter on the Request. That way, you don't have to add the
response in a bunch of places. Besides, in my mind anyway, you are requesting
augmentation via the Augmenter provided.

Also, I'm not sure why StdAugmenter is instantiated in SolrCore. Wouldn't we
want to allow for that to be driven by some user implementations?

Perhaps, since there are a few of us w/ eyes on this, we should first try to
tackle the ResponseWriter mess.

Allow components to add fields to outgoing documents

Key: SOLR-1566
URL: https://issues.apache.org/jira/browse/SOLR-1566
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Noble Paul
Assignee: Grant Ingersoll
Fix For: Next

Attachments: SOLR-1566-gsi.patch, SOLR-1566-rm.patch,
SOLR-1566.patch, SOLR-1566.patch, SOLR-1566.patch, SOLR-1566.patch

Currently it is not possible for components to add fields to outgoing
documents which are not in the the stored fields of the document. This makes
it cumbersome to add computed fields/metadata .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2412) Architecture Diagrams needed for Lucene, Solr and Nutch

2010-08-25 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved LUCENE-2412.
-

Resolution: Fixed

 Architecture Diagrams needed for Lucene, Solr and Nutch
 ---

 Key: LUCENE-2412
 URL: https://issues.apache.org/jira/browse/LUCENE-2412
 Project: Lucene - Java
  Issue Type: Task
  Components: Other
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
 Attachments: arch.pdf, LIA2_01_04.pdf, NutchArch.pdf, solr-arch.pdf




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2059) Allow customizing how WordDelimiterFilter tokenizes text.

2010-08-25 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902588#action_12902588
]

Peter Karich commented on SOLR-2059:

Robert,

thanks for this work! I have a different application for this patch: in a
twitter search # and @ shouldn't be removed. Instead I will handle them like
ALPHA, I think.

Would you mind to update the patch for the latest version of the trunk? I got a
problem with WordDelimiterIterator at line 254 if I am using
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr and a file is missing
problem (line 37) for http://svn.apache.org/repos/asf/solr

Allow customizing how WordDelimiterFilter tokenizes text.
-

Key: SOLR-2059
URL: https://issues.apache.org/jira/browse/SOLR-2059
Project: Solr
Issue Type: New Feature
Components: Schema and Analysis
Reporter: Robert Muir
Priority: Minor
Fix For: 3.1, 4.0

Attachments: SOLR-2059.patch

By default, WordDelimiterFilter assigns 'types' to each character (computed
from Unicode Properties).
Based on these types and the options provided, it splits and concatenates
text.
In some circumstances, you might need to tweak the behavior of how this works.
It seems the filter already had this in mind, since you can pass in a custom
byte[] type table.
But its not exposed in the factory.
I think you should be able to customize the defaults with a configuration
file:
{noformat}
# A customized type mapping for WordDelimiterFilterFactory
# the allowable types are: LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, SUBWORD_DELIM
#
# the default for any character without a mapping is always computed from
# Unicode character properties
# Map the $, %, '.', and ',' characters to DIGIT
# This might be useful for financial data.
$ = DIGIT
% = DIGIT
. = DIGIT
\u002C = DIGIT
{noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2059) Allow customizing how WordDelimiterFilter tokenizes text.

2010-08-25 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902593#action_12902593
]

Robert Muir commented on SOLR-2059:
---

Hi Peter:

thats a great example. For my use case it was actually not the example either,
but I was just trying to give a good general example.

What do you think of the file format, is it ok for describing these categories?
This format/parser is just stolen the one from MappingCharFilterFactory, it
seemed unambiguous and is already in use.

As far as applying the patch, you need to apply it to
https://svn.apache.org/repos/asf/lucene/dev/trunk, not
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr

This is because it has to modify a file in modules, too.

Allow customizing how WordDelimiterFilter tokenizes text.
-

Key: SOLR-2059
URL: https://issues.apache.org/jira/browse/SOLR-2059
Project: Solr
Issue Type: New Feature
Components: Schema and Analysis
Reporter: Robert Muir
Priority: Minor
Fix For: 3.1, 4.0

Attachments: SOLR-2059.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2059) Allow customizing how WordDelimiterFilter tokenizes text.

2010-08-25 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902600#action_12902600
]

Peter Karich commented on SOLR-2059:

Ups, my mistake ... this helped!

What do you think of the file format, is it ok for describing these
categories?

I think it is ok. I even had a more simpler patch before stumbling over yours:
handleAsChar=@# which is now more powerful IMHO:

@ = ALPHA
# = ALPHA

Allow customizing how WordDelimiterFilter tokenizes text.
-

Key: SOLR-2059
URL: https://issues.apache.org/jira/browse/SOLR-2059
Project: Solr
Issue Type: New Feature
Components: Schema and Analysis
Reporter: Robert Muir
Priority: Minor
Fix For: 3.1, 4.0

Attachments: SOLR-2059.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (SOLR-2059) Allow customizing how WordDelimiterFilter tokenizes text.

2010-08-25 Thread Peter Karich (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902600#action_12902600
]

Peter Karich edited comment on SOLR-2059 at 8/25/10 3:46 PM:
-

Ups, my mistake ... this helped!

What do you think of the file format, is it ok for describing these
categories?

I think it is ok. I even had a more simpler patch before stumbling over yours:
handleAsChar=@# which is now more powerful IMHO:
{code}
@ = ALPHA
# = ALPHA
{code}

was (Author: peathal):
Ups, my mistake ... this helped!

What do you think of the file format, is it ok for describing these
categories?

I think it is ok. I even had a more simpler patch before stumbling over yours:
handleAsChar=@# which is now more powerful IMHO:

@ = ALPHA
# = ALPHA

Allow customizing how WordDelimiterFilter tokenizes text.
-

Key: SOLR-2059
URL: https://issues.apache.org/jira/browse/SOLR-2059
Project: Solr
Issue Type: New Feature
Components: Schema and Analysis
Reporter: Robert Muir
Priority: Minor
Fix For: 3.1, 4.0

Attachments: SOLR-2059.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2239) Revise NIOFSDirectory and its usage due to NIO limitations on Thread.interrupt

2010-08-25 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2239:


Attachment: LUCENE-2239.patch

here is a new patch that add the essential information to the NIOFSDirectory 
and MMapDirectory.
I wonder if we should refer to this issue in the doc, IMO a link is not 
necessary. I removed the TestCase from the previous patch since it was only to 
reproduce the problem in isolation.


 Revise NIOFSDirectory and its usage due to NIO limitations on Thread.interrupt
 --

 Key: LUCENE-2239
 URL: https://issues.apache.org/jira/browse/LUCENE-2239
 Project: Lucene - Java
  Issue Type: Task
  Components: Store
Affects Versions: 2.4, 2.4.1, 2.9, 2.9.1, 3.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Attachments: LUCENE-2239.patch, LUCENE-2239.patch


 I created this issue as a spin off from 
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/201001.mbox/%3cf18c9dde1001280051w4af2bc50u1cfd55f85e509...@mail.gmail.com%3e
 We should decide what to do with NIOFSDirectory, if we want to keep it as the 
 default on none-windows platforms and how we want to document this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2089) Faceting: order term ords before converting to values

2010-08-25 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902623#action_12902623
 ] 

Yonik Seeley commented on SOLR-2089:


Results:
docs=10M, docs matching query=1M, facet on field of 100,000 unique terms, 
facet.method=fc (multivalued)

|facet.limit|ms to facet trunk|ms to facet patch|
|100|63|63|
|1000|228|191|
|5000|722|307|
|1|1033|316|

So a decent speedup when facet.limit is very high.
It will also help when facet.limit is high relative to the number of unique 
terms (since the speedup is due to ordering the term ords and not having to 
seek as often).

I plan on committing soon if there are no objections.

 Faceting: order term ords before converting to values
 -

 Key: SOLR-2089
 URL: https://issues.apache.org/jira/browse/SOLR-2089
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Yonik Seeley
 Attachments: SOLR-2089.patch


 We should be able to speed up multi-valued faceting that sorts by count and 
 returns many values by first sorting the term ords before converting them to 
 a string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1986) Allow users to define multiple subfield types in AbstractSubTypeFieldType

2010-08-25 Thread Thomas Joiner (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Joiner updated SOLR-1986:


Attachment: AbstractMultiSubTypeFieldType.patch

Since the reason people seemed to object to the patch in the mailing list was 
that the AbstractSubTypeFieldType was not originally intended to be used for 
multiple different types, I made it a separate class.  Also, the subFieldType 
parameter now works, and the created subFields are prepended with subtype_ so 
as to allow dynamicFields to be used to simulate multiValued fields.

 Allow users to define multiple subfield types in AbstractSubTypeFieldType
 -

 Key: SOLR-1986
 URL: https://issues.apache.org/jira/browse/SOLR-1986
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Mark Allan
Priority: Minor
 Attachments: AbstractMultiSubTypeFieldType.patch, multiSubType.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 A few small changes to the AbstractSubTypeFieldType class to allow users to 
 define distinct field types for each subfield.  This enables us to define 
 complex data types in the schema.
 For example, we have our own subclass of the CoordinateFieldType called 
 TemporalCoverage where we store a start and end date for an event but now we 
 can store a name for the event as well.
 fieldType name=temporal class=uk.ac.edina.solr.schema.TemporalCoverage 
 dimension=3 subFieldSuffix=_ti,_ti,_s/
 In this example, the start and end dates get stored as trie-coded integer 
 subfields and the description as a string subfield.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2239) Revise NIOFSDirectory and its usage due to NIO limitations on Thread.interrupt

2010-08-25 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902641#action_12902641
 ] 

Simon Willnauer commented on LUCENE-2239:
-

Good point Robert  - instead of duplicating documentation we could recommend 
users to read the implementation specific documentation before using 
FSDirector#open().

something like that:

Currently this returns {...@link NIOFSDirectory} on non-Windows JREs and 
{...@link SimpleFSDirectory} on Windows. Since these directory implementation 
have slightly different behavior and limitations it is recommended to consult 
the implementation specific documentation for the platform your application is 
running on.

simon

 Revise NIOFSDirectory and its usage due to NIO limitations on Thread.interrupt
 --

 Key: LUCENE-2239
 URL: https://issues.apache.org/jira/browse/LUCENE-2239
 Project: Lucene - Java
  Issue Type: Task
  Components: Store
Affects Versions: 2.4, 2.4.1, 2.9, 2.9.1, 3.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Attachments: LUCENE-2239.patch, LUCENE-2239.patch


 I created this issue as a spin off from 
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/201001.mbox/%3cf18c9dde1001280051w4af2bc50u1cfd55f85e509...@mail.gmail.com%3e
 We should decide what to do with NIOFSDirectory, if we want to keep it as the 
 default on none-windows platforms and how we want to document this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[no subject]

2010-08-25 Thread Peter Karich



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1301) Solr + Hadoop

2010-08-25 Thread Daniel Ivan Pizarro (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902645#action_12902645
]

Daniel Ivan Pizarro commented on SOLR-1301:
---

I'm getting the following error:

java.lang.IllegalStateException: Failed to initialize record writer for ,
attempt_local_0001_r_00_0

Where can I find instructions to run the CVSUploader?

(readme file says Please read the original patch readme for details on the CSV
bulk uploader., and I can't find that readme file)

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: Next

Attachments: commons-logging-1.0.4.jar,
commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar,
hadoop-0.20.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt,
SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch,
SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java

This patch contains a contrib module that provides distributed indexing
(using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is
twofold:
* provide an API that is familiar to Hadoop developers, i.e. that of
OutputFormat
* avoid unnecessary export and (de)serialization of data maintained on HDFS.
SolrOutputFormat consumes data produced by reduce tasks directly, without
storing it in intermediate files. Furthermore, by using an
EmbeddedSolrServer, the indexing task is split into as many parts as there
are reducers, and the data to be indexed is not sent over the network.
Design
--
Key/value pairs produced by reduce tasks are passed to SolrOutputFormat,
which in turn uses SolrRecordWriter to write this data. SolrRecordWriter
instantiates an EmbeddedSolrServer, and it also instantiates an
implementation of SolrDocumentConverter, which is responsible for turning
Hadoop (key, value) into a SolrInputDocument. This data is then added to a
batch, which is periodically submitted to EmbeddedSolrServer. When reduce
task completes, and the OutputFormat is closed, SolrRecordWriter calls
commit() and optimize() on the EmbeddedSolrServer.
The API provides facilities to specify an arbitrary existing solr.home
directory, from which the conf/ and lib/ files will be taken.
This process results in the creation of as many partial Solr home directories
as there were reduce tasks. The output shards are placed in the output
directory on the default filesystem (e.g. HDFS). Such part-N directories
can be used to run N shard servers. Additionally, users can specify the
number of reduce tasks, in particular 1 reduce task, in which case the output
will consist of a single shard.
An example application is provided that processes large CSV files and uses
this API. It uses a custom CSV processing to avoid (de)serialization overhead.
This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this
issue, you should put it in contrib/hadoop/lib.
Note: the development of this patch was sponsored by an anonymous contributor
and approved for release under Apache License.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2090) Allow reader to be passed in SolrInputDocument.addField method

2010-08-25 Thread Bojan Vukojevic (JIRA)

Allow reader to be passed in SolrInputDocument.addField method
--

 Key: SOLR-2090
 URL: https://issues.apache.org/jira/browse/SOLR-2090
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.4.1
 Environment: Windows Vista 32 bit. JDK 1.6.
Reporter: Bojan Vukojevic


I am using SolrJ with embedded  Solr server and some documents have a lot of 
text. Solr will be running on a small device with very limited memory. In my 
tests I cannot process more than 3MB of text (in a body) with 64MB heap. 
According to Java there is about 30MB free memory before I call server.add and 
with 5MB of text it runs out of memory.  

I sent an inquiry to a mailing list and was advised to create JIRA issue. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: do the Java and Python garbage collectors talk to each other, with JCC?

2010-08-25 Thread Andi Vajda



On Wed, 25 Aug 2010, Bill Janssen wrote:


Sure, but tell that to the Lucene folks.  They're the ones starting a
new thread here.  Of course, now and then one needs to start a new
thread.


I forwarded your question to Mike McCandless (who is also a subscriber to 
this list) to see if he had something to say on this topic.


Still, in your earlier message, you said: I spawn a lot of short-lived 
threads in Python, and each of them is attached to Java, and detached 
after the run method returns.. To which I'm suggesting that you pool 
these threads instead and reuse them.


Andi..

Re: [jira] Commented: (LUCENE-2611) IntelliJ IDEA setup

2010-08-25 Thread Erick Erickson

Great, I'll give it a whirl when I see the notification come back through.

Erick

On Wed, Aug 25, 2010 at 7:24 AM, Steven Rowe (JIRA) j...@apache.org wrote:

[
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902459#action_12902459]

Steven Rowe commented on LUCENE-2611:
-

Hi Erick,

bq. I have to go in to each module and re-select project sdk on the
dependencies tab, even though it looks like it's already selected!

I removed a chunk of configuration from the *.iml files that sets this, I
think - I'll post a patch shortly that puts the per-module project SDK
inheritance back, and should hopefully address the problem you're seeing.

IntelliJ IDEA setup
---

Key: LUCENE-2611
URL: https://issues.apache.org/jira/browse/LUCENE-2611
Project: Lucene - Java
Issue Type: New Feature
Components: Build
Affects Versions: 4.0
Reporter: Steven Rowe
Priority: Minor
Fix For: 4.0

Attachments: LUCENE-2611.patch

Setting up Lucene/Solr in IntelliJ IDEA can be time-consuming.
The attached patch adds a new top level directory {{dev-tools/}} with
sub-dir {{idea/}} containing basic setup files for trunk, as well as a
top-level ant target named idea that copies these files into the proper
locations. This arrangement avoids the messiness attendant to in-place
project configuration files directly checked into source control.
The IDEA configuration includes modules for Lucene and Solr, each Lucene
and Solr contrib, and each analysis module. A JUnit test run per module is
included.
Once {{ant idea}} has been run, the only configuration that must be
performed manually is configuring the project-level JDK.
If this patch is committed, Subversion svn:ignore properties should be
added/modified to ignore the destination module files (*.iml) in each
module's directory.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2622) Random Test Failure org.apache.lucene.TestExternalCodecs.testPerFieldCodec (from TestExternalCodecs)

2010-08-25 Thread Mark Miller (JIRA)

Random Test Failure org.apache.lucene.TestExternalCodecs.testPerFieldCodec 
(from TestExternalCodecs)


 Key: LUCENE-2622
 URL: https://issues.apache.org/jira/browse/LUCENE-2622
 Project: Lucene - Java
  Issue Type: Test
Reporter: Mark Miller
Priority: Minor


Error Message

state.ord=54 startOrd=0 ir.isIndexTerm=true state.docFreq=1
Stacktrace

junit.framework.AssertionFailedError: state.ord=54 startOrd=0 
ir.isIndexTerm=true state.docFreq=1
at 
org.apache.lucene.index.codecs.standard.StandardTermsDictReader$FieldReader$SegmentTermsEnum.seek(StandardTermsDictReader.java:395)
at 
org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:1099)
at 
org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:1028)
at 
org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:4213)
at 
org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:3381)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3221)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3211)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2345)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2323)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2293)
at 
org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:645)
at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:381)
at org.apache.lucene.util.LuceneTestCase.run(LuceneTestCase.java:373)
Standard Output

NOTE: random codec of testcase 'testPerFieldCodec' was: 
MockFixedIntBlock(blockSize=1327)
NOTE: random locale of testcase 'testPerFieldCodec' was: lt_LT
NOTE: random timezone of testcase 'testPerFieldCodec' was: Africa/Lusaka
NOTE: random seed of testcase 'testPerFieldCodec' was: 812019387131615618

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2623) Random Test Failure org.apache.lucene.index.TestIndexWriter.testAddIndexesWithThreads (from TestIndexWriter)

2010-08-25 Thread Mark Miller (JIRA)

Random Test Failure 
org.apache.lucene.index.TestIndexWriter.testAddIndexesWithThreads (from 
TestIndexWriter)


 Key: LUCENE-2623
 URL: https://issues.apache.org/jira/browse/LUCENE-2623
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Mark Miller
Priority: Minor


Error Message

expected:3160 but was:2752
Stacktrace

junit.framework.AssertionFailedError: expected:3160 but was:2752
at
org.apache.lucene.index.TestIndexWriter.testAddIndexesWithThreads(TestIndexWriter.java:3794)
at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:380)
at org.apache.lucene.util.LuceneTestCase.run(LuceneTestCase.java:372)
Standard Output

java.lang.AssertionError: IndexFileDeleter doesn't know about file _8h.cfs
at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:4284)
at 
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4331)
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3088)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3161)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3132)
at
org.apache.lucene.index.TestIndexWriter$CommitAndAddIndexes.doBody(TestIndexWriter.java:3774)
at
org.apache.lucene.index.TestIndexWriter$RunAddIndexesThreads$1.run(TestIndexWriter.java:3710)
NOTE: random codec of testcase 'testAddIndexesWithThreads' was: MockSep
NOTE: random locale of testcase 'testAddIndexesWithThreads' was: ms_MY
NOTE: random timezone of testcase 'testAddIndexesWithThreads' was:
Asia/Aqtau
NOTE: random seed of testcase 'testAddIndexesWithThreads' was:
-5272061551011630291


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2611) IntelliJ IDEA setup

2010-08-25 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902700#action_12902700
]

Erick Erickson commented on LUCENE-2611:

Steven:

That worked like a champ, all I had to do was set the project-level JDK and
then run tests.

The only other anomaly (and it's not causing me any problems) is that on the
project settings page, there are circular dependencies...
1. queries, misc, common, remote
2. solr, extraction

FWIW
Erick

IntelliJ IDEA setup
---

Attachments: LUCENE-2611.patch, LUCENE-2611.patch

Setting up Lucene/Solr in IntelliJ IDEA can be time-consuming.
The attached patch adds a new top level directory {{dev-tools/}} with sub-dir
{{idea/}} containing basic setup files for trunk, as well as a top-level ant
target named idea that copies these files into the proper locations. This
arrangement avoids the messiness attendant to in-place project configuration
files directly checked into source control.
The IDEA configuration includes modules for Lucene and Solr, each Lucene and
Solr contrib, and each analysis module. A JUnit test run per module is
included.
Once {{ant idea}} has been run, the only configuration that must be performed
manually is configuring the project-level JDK.
If this patch is committed, Subversion svn:ignore properties should be
added/modified to ignore the destination module files (*.iml) in each
module's directory.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2611) IntelliJ IDEA setup

2010-08-25 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902739#action_12902739
]

Robert Muir commented on LUCENE-2611:
-

bq. DIH and Solr unit test runs still don't fully pass for me, but all other
modules' test runs pass.

Can you provide any information on tests that are giving you trouble?
We could re-open LUCENE-2398 also. really it would be nice if all tests worked
from these IDEs.

IntelliJ IDEA setup
---

Attachments: LUCENE-2611.patch, LUCENE-2611.patch

Setting up Lucene/Solr in IntelliJ IDEA can be time-consuming.
The attached patch adds a new top level directory {{dev-tools/}} with sub-dir
{{idea/}} containing basic setup files for trunk, as well as a top-level ant
target named idea that copies these files into the proper locations. This
arrangement avoids the messiness attendant to in-place project configuration
files directly checked into source control.
The IDEA configuration includes modules for Lucene and Solr, each Lucene and
Solr contrib, and each analysis module. A JUnit test run per module is
included.
Once {{ant idea}} has been run, the only configuration that must be performed
manually is configuring the project-level JDK.
If this patch is committed, Subversion svn:ignore properties should be
added/modified to ignore the destination module files (*.iml) in each
module's directory.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Random Lucene Test Failure: org.apache.lucene.index.TestBackwardsCompatibility.testIndexOldIndex (from TestBackwardsCompatibility)

2010-08-25 Thread Mark Miller

org.apache.lucene.index.TestBackwardsCompatibility.testIndexOldIndex
(from TestBackwardsCompatibility)

Failing for the past 1 build (Since #1353 )
Took 0.29 sec.
Error Message

wrong doc count expected:46 but was:45
Stacktrace

junit.framework.AssertionFailedError: wrong doc count expected:46 but
was:45
at
org.apache.lucene.index.TestBackwardsCompatibility.changeIndexWithAdds(TestBackwardsCompatibility.java:388)
at
org.apache.lucene.index.TestBackwardsCompatibility.testIndexOldIndex(TestBackwardsCompatibility.java:287)
at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:381)
at org.apache.lucene.util.LuceneTestCase.run(LuceneTestCase.java:373)
Standard Output

NOTE: random codec of testcase 'testIndexOldIndex' was:
MockVariableIntBlock(baseBlockSize=49)
NOTE: random locale of testcase 'testIndexOldIndex' was: es_AR
NOTE: random timezone of testcase 'testIndexOldIndex' was: Portugal
NOTE: random seed of testcase 'testIndexOldIndex' was: -724598633153762820

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

2010-08-25 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902744#action_12902744
 ] 

Robert Muir commented on SOLR-2034:
---

if no one objects to the latest patch, i'd like to commit in a day or two.


 javabin should use UTF-8, not modified UTF-8
 

 Key: SOLR-2034
 URL: https://issues.apache.org/jira/browse/SOLR-2034
 Project: Solr
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch, 
 SOLR-2034.patch


 for better interoperability, javabin should use standard UTF-8 instead of 
 modified UTF-8 (http://www.unicode.org/reports/tr26/)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2624) add new snowball languages

2010-08-25 Thread Robert Muir (JIRA)

add new snowball languages
--

 Key: LUCENE-2624
 URL: https://issues.apache.org/jira/browse/LUCENE-2624
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Robert Muir
 Fix For: 3.1, 4.0
 Attachments: LUCENE-2624.patch

Snowball added new languages. This patch adds support for them.

http://snowball.tartarus.org/algorithms/armenian/stemmer.html
http://snowball.tartarus.org/algorithms/catalan/stemmer.html
http://snowball.tartarus.org/algorithms/basque/stemmer.html


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2624) add new snowball languages

2010-08-25 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2624:


Attachment: LUCENE-2624.patch

 add new snowball languages
 --

 Key: LUCENE-2624
 URL: https://issues.apache.org/jira/browse/LUCENE-2624
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2624.patch


 Snowball added new languages. This patch adds support for them.
 http://snowball.tartarus.org/algorithms/armenian/stemmer.html
 http://snowball.tartarus.org/algorithms/catalan/stemmer.html
 http://snowball.tartarus.org/algorithms/basque/stemmer.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

38 matches

Mail list logo