from:"Noble Paul നോബിള്‍ नोब्ळ्"

Re: Vote on merging dev of Lucene and Solr

2010-03-04 Thread Noble Paul നോബിള്‍ नोब्ळ्

+1

On Thu, Mar 4, 2010 at 6:32 PM, Mark Miller markrmil...@gmail.com wrote:
 For those committers that don't follow the general mailing list, or follow
 it that closely, we are currently having a vote for committers:

 http://search.lucidimagination.com/search/document/4722d3144c2e3a8b/vote_merge_lucene_solr_development

 --
 - Mark

 http://www.lucidimagination.com






-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Solr 1.5 or 2.0?

2009-11-19 Thread Noble Paul നോബിള്‍ नोब्ळ्

option 3 looks best . But do we plan to remove anything we have not
already marked as deprecated?

On Thu, Nov 19, 2009 at 8:10 PM, Uwe Schindler u...@thetaphi.de wrote:
 We also had some (maybe helpful) opinions :-)

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: Thursday, November 19, 2009 3:31 PM
 To: java-dev@lucene.apache.org
 Subject: Re: Solr 1.5 or 2.0?

 Oops... of course I meant to post this in solr-dev.

 -Yonik
 http://www.lucidimagination.com

 On Wed, Nov 18, 2009 at 8:53 PM, Yonik Seeley
 yo...@lucidimagination.com wrote:
  What should the next version of Solr be?
 
  Options:
  - have a Solr 1.5 with a lucene 2.9.x
  - have a Solr 1.5 with a lucene 3.x, with weaker back compat given all
  of the removed lucene deprecations from 2.9-3.0
  - have a Solr 2.0 with a lucene 3.x
 
  -Yonik
  http://www.lucidimagination.com
 

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Solr 1.5 or 2.0?

2009-11-19 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Fri, Nov 20, 2009 at 6:30 AM, Ryan McKinley ryan...@gmail.com wrote:

 On Nov 19, 2009, at 3:34 PM, Mark Miller wrote:

 Ryan McKinley wrote:

 I would love to set goals that are ~3 months out so that we don't have
 another 1 year release cycle.  For a 2.0 release where we could have
 more back-compatibly flexibility, i would love to see some work that
 may be too ambitious...  In particular, the config spaghetti needs
 some attention.

 I don't see the need to increment solr to 2.0 for the lucene 3.0
 change -- of course that needs to be noted, but incrementing the major
 number in solr only makes sense if we are going to change *solr*
 significantly.

 Lucene major numbers don't work that way, and I don't think Solr needs
 to work that way be default. I think major numbers are better for
 indicating backwards compat issues than major features with the way
 these projects work. Which is why Yonik mentions 1.5 with weaker back
 compat - its not just the fact that we are going to Lucene 3.x - its
 that Solr still relies on some of the API's that won't be around in 3.x
 - they are not all trivial to remove or to remove while preserving back
 compat.

 I confess I don't know the details of the changes that have not yet been
 integrated in solr  -- the only lucene changes I am familiar with is what
 was required for solr 1.4.






 The lucene 2.x - 3.0 upgrade path seems independent of that to me.  I
 would even argue that with solr 1.4 we have already required many
 lucene 3.0 changes -- All my custom lucene stuff had to be reworked to
 work with solr 1.4 (tokenizers  multi-reader filters).

 Many - but certainly not all.

 Just my luck...  I'm batting 1000 :)

 But that means my code can upgrade to 3.0 without a issue now!



 In general, I wonder where the solr back-compatibility contract
 applies (and to what degree).  For solr, I would rank the importance as:
 #1 - the URL API syntax.  Client query parameters should change as
 little as possible
 #2 - configuration
 #3 - java APIs

 Someone else would likely rank it differently - not everyone using Solr
 even uses HTTP with it. Someone heavily involved in custom plugins might
 care more about that than config. As a dev, I just plainly rank them all
 as important and treat them on a case by case basis.

 I think it is fair to suggest that people will have the most
 stable/consistent/seamless upgrade path if you stick to the HTTP API (and by
 extension most of the solrj API)

 I am not suggesting that the java APIs are not important and that
 back-compatibly is not important.  Solr has a some APIs with a clear
 purpose, place, and intended use -- we need to take these very seriously.
  We also have lots of APIs that are half baked and loosy goosy.  If a
 developer is working on the edges, i think it is fair to expect more hickups
 in the upgrade path.



 With that in mind, i think 'solr 1.5 with lucene 3.x' makes the most
 sense.  Unless we see making serious changes to solr that would
 warrent a major release bump
solr 1.5 with lucene 3.x is  a good option.
Solr 2.0 can have non-back compat changes for Solr itself. e.g
removing the single core option , changing configuration, REST Api
changes etc

 What is a serious change that would warrant a bump in your opinion?

 for example:
 - config overhaul.  detangle the XML from the components.  perhaps using
 spring.
This is already done. No components read config from xml anymore SOLR-1198
 - major URL request changes.  perhaps we change things to be more RESTful --
 perhaps let jersey take care of the URL/request building
 https://jersey.dev.java.net/
 - perhaps OSGi support/control/configuration



 Lucene has an explict back-compatibility contract:
 http://wiki.apache.org/lucene-java/BackwardsCompatibility

 I don't know if solr has one...  if we make one, I would like it to
 focus on the URL syntax+configuration

 Its not nice to give people plugins and then not worry about back compat
 for them :)

 i want to be nice.  I just think that a different back compatibility
 contract applies for solr then for lucene.  It seems reasonable to consider
 the HTTP API, configs, and java API independently.

 From my perspective, saying solr 1.5 uses lucene 3.0 implies everything a
 plugin developer using lucene APIs needs to know about the changes.

 To be clear, I am not against bumping to solr 2.0 -- I just have high
 aspirations (yet little time) for what a 2.0 bump could mean for solr.

 ryan


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Blob storage

2008-12-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Fri, Dec 26, 2008 at 2:11 PM, Babak Farhang farh...@gmail.com wrote:
 Most of all, I'm trying to communicate an *idea* which itself cannot
 be encumbered by any license, anyway. But if you want to incorporate
 some of this code into an asf project, I'd be happy to also release it
 under the apache license. Hope the license I chose for my project
 doesn't get in the way of this conversation..

It would be more useful if the user could store the data using his own
id (a long).This forces me to have a mapping of a key - skwish_id
elsewhere . This is a serious limitation. For retrieval it means I may
need to do a lookup and that can be costly if I have 10's of millions
of records

BTW . The license is a problem

 On Fri, Dec 26, 2008 at 12:46 AM, Noble Paul നോബിള്‍ नोब्ळ्
 noble.p...@gmail.com wrote:
 The license is GPL . It cannont be used directly in any apache projects

 On Fri, Dec 26, 2008 at 12:47 PM, Babak Farhang farh...@gmail.com wrote:
 I assume one could use Skwish instead of Lucene's normal stored fields to
 store  retrieve document data?

 Exactly: instead of storing the field's value directly in Lucene, you
 could store it in skwish and then store its skwish id in the Lucene
 field instead.  This works well for serving large streams (e.g.
 original document contents).

 Have you run any threaded performance tests comparing the two?

 No direct comps, yet.

 -b


 On Thu, Dec 25, 2008 at 5:22 AM, Michael McCandless
 luc...@mikemccandless.com wrote:

 This looks interesting!
 I assume one could use Skwish instead of Lucene's normal stored fields to
 store  retrieve document data?
 Have you run any threaded performance tests comparing the two?
 Mike

 Babak Farhang farh...@gmail.com wrote:

 Hi everyone,

 I've been working on a library called Skwish to complement indexes
 like Lucene,  for blob storage and retrieval. This is nothing more
 than a structured implementation of storing all the files in one file
 and managing their offsets in another.  The idea is to provide a fast,
 concurrent, lock-free way to serve lots of files to lots of users.

 Hope you find it useful or interesting.

 -Babak
 http://skwish.sourceforge.net/

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





 --
 --Noble Paul

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org






-- 
--Noble Paul

Re: Blob storage

2008-12-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Fri, Dec 26, 2008 at 10:05 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Similar thoughts here.  I don't have ML thread pointers nor JIRA issue 
 pointers, but there has been discussion in this area before, and I believe 
 the thinking was that what's needed is a general interface/abstraction/API 
 for storing and loading field data to an external component, be that a BDB, 
 an RDBMS, or something like Skwish.  I *think* that often came up in the 
 context of Document updates (as opposed to delete+add).
This is an area of interest for me as well SOLR-828


 I didn't look at Skwish, but I think this is the direction to explore, Babak, 
 esp. if we can come up with something that let's one plug in other types of 
 storage, as well as deal with transaction type stuff that Ian mentioned.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Ian Holsman li...@holsman.net
 To: java-dev@lucene.apache.org
 Sent: Friday, December 26, 2008 5:40:36 AM
 Subject: Re: Blob storage

 Babak Farhang wrote:
  Most of all, I'm trying to communicate an *idea* which itself cannot
  be encumbered by any license, anyway. But if you want to incorporate
  some of this code into an asf project, I'd be happy to also release it
  under the apache license. Hope the license I chose for my project
  doesn't get in the way of this conversation..
 

 as an idea, let me offer some thoughts.
 - there will be a trade-off where reading the info from a 2nd system
 would be slower than just a single call which has all the results.
 Especially if you have to fetch a couple of these things.

 - how is this different than BDB, and a UUID. couldn't you just store it
 using that?

 - how are you going to deal with situations where the commit fails in
 lucene. does the client have to recognize this and rollback skwish?

 - there will need to be some kind of reconciliation process that will
 need to deal with inconsistencies where someone forgets to delete the
 skiwsh object when they have deleted the lucene record.

 on a positive note, it would shrink the index size and allow more
 records to fit in memory.

 Regards
 Ian
  On Fri, Dec 26, 2008 at 12:46 AM, Noble Paul നോബിള്‍ नोब्ळ्
  wrote:
 
  The license is GPL . It cannont be used directly in any apache projects
 
  On Fri, Dec 26, 2008 at 12:47 PM, Babak Farhang wrote:
 
  I assume one could use Skwish instead of Lucene's normal stored fields 
  to
  store  retrieve document data?
 
  Exactly: instead of storing the field's value directly in Lucene, you
  could store it in skwish and then store its skwish id in the Lucene
  field instead.  This works well for serving large streams (e.g.
  original document contents).
 
 
  Have you run any threaded performance tests comparing the two?
 
  No direct comps, yet.
 
  -b
 
 
  On Thu, Dec 25, 2008 at 5:22 AM, Michael McCandless
  wrote:
 
  This looks interesting!
  I assume one could use Skwish instead of Lucene's normal stored fields 
  to
  store  retrieve document data?
  Have you run any threaded performance tests comparing the two?
  Mike
 
  Babak Farhang wrote:
 
  Hi everyone,
 
  I've been working on a library called Skwish to complement indexes
  like Lucene,  for blob storage and retrieval. This is nothing more
  than a structured implementation of storing all the files in one file
  and managing their offsets in another.  The idea is to provide a fast,
  concurrent, lock-free way to serve lots of files to lots of users.
 
  Hope you find it useful or interesting.
 
  -Babak
  http://skwish.sourceforge.net/
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 
  --
  --Noble Paul
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





-- 
--Noble Paul

Re: Blob storage

2008-12-25 Thread Noble Paul നോബിള്‍ नोब्ळ्

The license is GPL . It cannont be used directly in any apache projects

On Fri, Dec 26, 2008 at 12:47 PM, Babak Farhang farh...@gmail.com wrote:
 I assume one could use Skwish instead of Lucene's normal stored fields to
 store  retrieve document data?

 Exactly: instead of storing the field's value directly in Lucene, you
 could store it in skwish and then store its skwish id in the Lucene
 field instead.  This works well for serving large streams (e.g.
 original document contents).

 Have you run any threaded performance tests comparing the two?

 No direct comps, yet.

 -b


 On Thu, Dec 25, 2008 at 5:22 AM, Michael McCandless
 luc...@mikemccandless.com wrote:

 This looks interesting!
 I assume one could use Skwish instead of Lucene's normal stored fields to
 store  retrieve document data?
 Have you run any threaded performance tests comparing the two?
 Mike

 Babak Farhang farh...@gmail.com wrote:

 Hi everyone,

 I've been working on a library called Skwish to complement indexes
 like Lucene,  for blob storage and retrieval. This is nothing more
 than a structured implementation of storing all the files in one file
 and managing their offsets in another.  The idea is to provide a fast,
 concurrent, lock-free way to serve lots of files to lots of users.

 Hope you find it useful or interesting.

 -Babak
 http://skwish.sourceforge.net/

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





-- 
--Noble Paul

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Mark Miller as core Lucene committer

2008-11-22 Thread Noble Paul നോബിള്‍ नोब्ळ्

congrats Mark. You have been a great contributor to the Solr community as well.

On Sat, Nov 22, 2008 at 7:09 AM, Michael Busch [EMAIL PROTECTED] wrote:
 Welcome Mark! Good to have you on board!

 -Michael

 Grant Ingersoll wrote:

 Please welcome Mark Miller as a core Lucene committer.  For a while now,
 Mark has been a contrib committer and has recently stepped up his efforts in
 contributions to the core.  In recognition the PMC has voted to make him a
 core committer.

 Cheers,
 Grant

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-- 
--Noble Paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Realtime Search for Social Networks Collaboration

2008-09-20 Thread Noble Paul നോബിള്‍ नोब्ळ्

Moving back to RDBMS model will be a big step backwards where we miss
mulivalued fields and arbitrary fields .

On Tue, Sep 9, 2008 at 4:17 AM, Jason Rutherglen
[EMAIL PROTECTED] wrote:
 Cool.  I mention H2 because it does have some Lucene code in it yes.
 Also according to some benchmarks it's the fastest of the open source
 databases.  I think it's possible to integrate realtime search for H2.
  I suppose there is no need to store the data in Lucene in this case?
 One loses the multiple values per field Lucene offers, and the schema
 become static.  Perhaps it's a trade off?

 On Mon, Sep 8, 2008 at 6:17 PM, J. Delgado [EMAIL PROTECTED] wrote:
 Yes, both Marcelo and I would be interested.

 We looked into H2 and it looks like something similar to Oracle's ODCI can
 be implemented. Plus the primitive full-text implementación is based on
 Lucene.
 I say primitive because looking at the code I saw that one cannot define an
 Analyzer and for each scan corresponding to a where clause a searcher is
 open and closed, instead of having a pool, plus it does not have any way to
 queue changes to reduce the use of the IndexWriter, etc.

 But its open source and that is a great starting point!

 -- Joaquin

 On Mon, Sep 8, 2008 at 2:05 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:

 Perhaps an interesting project would be to integrate Ocean with H2
 www.h2database.com to take advantage of both models.  I'm not sure how
 exactly that would work, but it seems like it would not be too
 difficult.  Perhaps this would solve being able to perform faster
 hierarchical queries and perhaps other types of queries that Lucene is
 not capable of.

 Is this something Joaquin you are interested in collaborating on?  I
 am definitely interested in it.

 On Sun, Sep 7, 2008 at 4:04 AM, J. Delgado [EMAIL PROTECTED]
 wrote:
  On Sat, Sep 6, 2008 at 1:36 AM, Otis Gospodnetic
  [EMAIL PROTECTED] wrote:
 
  Regarding real-time search and Solr, my feeling is the focus should be
  on
  first adding real-time search to Lucene, and then we'll figure out how
  to
  incorporate that into Solr later.
 
 
  Otis, what do you mean exactly by adding real-time search to Lucene?
   Note
  that Lucene, being a indexing/search library (and not a full blown
  search
  engine), is by definition real-time: once you add/write a document to
  the
  index it becomes immediately searchable and if a document is logically
  deleted and no longer returned in a search, though physical deletion
  happens
  during an index optimization.
 
  Now, the problem of adding/deleting documents in bulk, as part of a
  transaction and making these documents available for search immediately
  after the transaction is commited sounds more like a search engine
  problem
  (i.e. SOLR, Nutch, Ocean), specially if these transactions are known to
  be
  I/O expensive and thus are usually implemented bached proceeses with
  some
  kind of sync mechanism, which makes them non real-time.
 
  For example, in my previous life, I designed and help implement a
  quasi-realtime enterprise search engine using Lucene, having a set of
  multi-threaded indexers hitting a set of multiple indexes alocatted
  accross
  different search services which powered a broker based distributed
  search
  interface. The most recent documents provided to the indexers were
  always
  added to the smaller in-memory (RAM) indexes which usually could absorbe
  the
  load of a bulk add transaction and later would be merged into larger
  disk
  based indexes and then flushed to make them ready to absorbe new fresh
  docs.
  We even had further partitioning of the indexes that reflected time
  periods
  with caps on size for them to be merged into older more archive based
  indexes which were used less (yes the search engine default search was
  on
  data no more than 1 month old, though user could open the time window by
  including archives).
 
  As for SOLR and OCEAN,  I would argue that these semi-structured search
  engines are becomming more and more like relational databases with
  full-text
  search capablities (without the benefit of full reletional algebra --
  for
  example joins are not possible using SOLR). Notice that real-time CRUD
  operations and transactionality are core DB concepts adn have been
  studied
  and developed by database communities for aquite long time. There has
  been
  recent efforts on how to effeciently integrate Lucene into releational
  databases (see Lucene JVM ORACLE integration, see
 
  http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html)
 
  I think we should seriously look at joining efforts with open-source
  Database engine projects, written in Java (see
  http://java-source.net/open-source/database-engines) in order to blend
  IR
  and ORM for once and for all.
 
  -- Joaquin
 
 
 
  I've read Jason's Wiki as well.  Actually, I had to read it a number of
  times to understand bits and pieces of it.  I have to admit there is
  still

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Noble Paul നോബിള്‍ नोब्ळ्

Why do you need to keep a strong reference?
Why not a WeakReference ?

--Noble

On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu [EMAIL PROTECTED] wrote:
 The problem should be similar to what's talked about on this discussion.
 http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

 There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
 May23,2008)

 This patch brings in a ThreadLocal cache to TermInfosReader.

 It's usually recommended to keep the reader open, and reuse it when
 possible. In a common J2EE application, the http requests are usually
 handled by different threads. But since the cache is ThreadLocal, the cache
 are not really usable by other threads. What's worse, the cache can not be
 cleared by another thread!

 This leak is not so obvious usually. But my case is using RAMDirectory,
 having several hundred megabytes. So one un-released resource is obvious to
 me.

 Here is the reference tree:
 org.apache.lucene.store.RAMDirectory
  |- directory of org.apache.lucene.store.RAMFile
  |- file of org.apache.lucene.store.RAMInputStream
  |- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
  |- input of org.apache.lucene.index.SegmentTermEnum
  |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry


 After I switched back to svn revision 659601, right before this patch is
 checked in, the memory leak is gone.
 Although my case is RAMDirectory, I believe this will affect disk based
 index also.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!




-- 
--Noble Paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Noble Paul നോബിള്‍ नोब्ळ्

When I look at the reference tree That is the feeling I get. if you
held a WeakReference it would get released .
 |- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
  |- input of org.apache.lucene.index.SegmentTermEnum
  |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry

On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu [EMAIL PROTECTED] wrote:
 Does this make any difference?
 If I intentionally close the searcher and reader failed to release the
 memory, I can not rely on some magic of JVM to release it.
 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul നോബിള്‍ नोब्ळ्
 [EMAIL PROTECTED] wrote:

 Why do you need to keep a strong reference?
 Why not a WeakReference ?

 --Noble

 On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu [EMAIL PROTECTED] wrote:
  The problem should be similar to what's talked about on this discussion.
  http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
 
  There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
  May23,2008)
 
  This patch brings in a ThreadLocal cache to TermInfosReader.
 
  It's usually recommended to keep the reader open, and reuse it when
  possible. In a common J2EE application, the http requests are usually
  handled by different threads. But since the cache is ThreadLocal, the
  cache
  are not really usable by other threads. What's worse, the cache can not
  be
  cleared by another thread!
 
  This leak is not so obvious usually. But my case is using RAMDirectory,
  having several hundred megabytes. So one un-released resource is obvious
  to
  me.
 
  Here is the reference tree:
  org.apache.lucene.store.RAMDirectory
   |- directory of org.apache.lucene.store.RAMFile
   |- file of org.apache.lucene.store.RAMInputStream
   |- base of
  org.apache.lucene.index.CompoundFileReader$CSIndexInput
   |- input of org.apache.lucene.index.SegmentTermEnum
   |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry
 
 
  After I switched back to svn revision 659601, right before this patch is
  checked in, the memory leak is gone.
  Although my case is RAMDirectory, I believe this will affect disk based
  index also.
 
  --
  Chris Lu
  -
  Instant Scalable Full-Text Search On Any Database/Application
  site: http://www.dbsight.net
  demo: http://search.dbsight.com
  Lucene Database Search in 3 minutes:
 
  http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
  DBSight customer, a shopping comparison site, (anonymous per request)
  got
  2.6 Million Euro funding!
 



 --
 --Noble Paul

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]







-- 
--Noble Paul

Re: Vote on merging dev of Lucene and Solr

Re: Solr 1.5 or 2.0?

Re: Solr 1.5 or 2.0?

Re: Blob storage

Re: Blob storage

Re: Blob storage

Re: Mark Miller as core Lucene committer

Re: Realtime Search for Social Networks Collaboration

Re: ThreadLocal causing memory leak with J2EE applications

Re: ThreadLocal causing memory leak with J2EE applications

10 matches

Site Navigation

Mail list logo

Footer information