Re: Indexing a Date/DateTime/Time field in Lucene 4

2017-04-08 Thread KARTHIK SHIVAKUMAR
I did use the Date into millisec and stored the long into index, this
helped me to convert the searched index into any date format later on the
o/p.

On Wed, Apr 5, 2017 at 6:08 PM, Frederik Van Hoyweghen <
frederik.vanhoyweg...@chapoo.com> wrote:

> Hey everyone,
>
> I'm seeing some conflicting suggestions concerning the type of field to use
> for indexing a Date/DateTime/Time value.
>
> Some suggest conversion using DateTools.timeToString() and using a
> StringField,
> while others suggest using the long value of getTime() and using a
> LongField (this is supposed to perform better using NumericRangeQuery).
>
> What are your opinions on this?
>
> Kind regards,
> Frederik
>



-- 





*N.S.KARTHIKR.M.S.COLONYBEHIND BANK OF INDIAR.M.V 2ND STAGEBANGALORE560094*


Re: Indexing and searching a DateTime range

2015-02-09 Thread KARTHIK SHIVAKUMAR
Hi

Long time ago,.. I used to store datetime in millisecond .

TermRangequery used to work in perfect condition

Convert all datetime to millisecond and index the same.

On search condition again convert datetime to millisecond and use
TermRangequery.

With regards
Karthik
On Feb 9, 2015 1:24 PM, Gergely Nagy foge...@gmail.com wrote:

 Hi Lucene users,

 I am in the beginning of implementing a Lucene application which would
 supposedly search through some log files.

 One of the requirements is to return results between a time range. Let's
 say these are two lines in a series of log files:
 2015-02-08 00:02:06.852Z INFO...
 ...
 2015-02-08 18:02:04.012Z INFO...

 Now I need to search for these lines and return all the text in-between. I
 was using this demo application to build an index:

 http://lucene.apache.org/core/4_10_3/demo/src-html/org/apache/lucene/demo/IndexFiles.html

 After that my first thought was using a term range query like this:
 TermRangeQuery query = TermRangeQuery.newStringRange(contents,
 2015-02-08 00:02:06.852Z, 2015-02-08 18:02:04.012Z, true, true);

 But for some reason this didn't return any results.

 Then I was Googling for a while how to solve this problem, but all the
 datetime examples I found are searching based on a much simpler field.
 Those examples usually use a field like this:
 doc.add(new LongField(modified, file.lastModified(), Field.Store.NO));

 So I was wondering, how can I index these log files to make a range query
 work on them? Any ideas? Maybe my approach is completely wrong. I am still
 new to Lucene so any help is appreciated.

 Thank you.

 Gergely Nagy



Re: Can some terms from analysis be silently dropped when indexing? Because I'm pretty sure I'm seeing that happen.

2014-08-25 Thread KARTHIK SHIVAKUMAR
 some terms from analysis be silently dropped when indexing

Then I presume the same need to be also be exempted/dropped while searching
process.

else the desired results are not as expected.




with regards

karthik


On Mon, Aug 25, 2014 at 12:52 PM, Trejkaz trej...@trypticon.org wrote:

 It seems like nobody knows the answer, so I'm just going to file a bug.

 TX

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




-- 





*N.S.KARTHIKR.M.S.COLONYBEHIND BANK OF INDIAR.M.V 2ND STAGEBANGALORE560094*


Re: Re-indexing a particular field only without re-indexing the entire enclosing document in the index

2012-04-25 Thread KARTHIK SHIVAKUMAR
Hi

Update Index  for the dynamic data

I have done this in Past ..It  worked for me long time ago,

All u need is have a piece of  Code to Search and find the Specific Doc
within the Index's  ( probably using the Unique name for document )
Then delete the same and insert the same Fresh Document alone.

All of this need to be done in Iteration for large set of docs.




with regards
karthik


On Wed, Apr 25, 2012 at 12:37 PM, Torsten Krah 
tk...@fachschaft.imn.htwk-leipzig.de wrote:

 Am Dienstag, den 24.04.2012, 21:57 +0530 schrieb KARTHIK SHIVAKUMAR:
  Simple Techniques is  to use  Update Index  for the dynamic data
  colum
 
  rather then re-indexing the whole document.

 Just for interest, how do you do that?




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: Re-indexing a particular field only without re-indexing the entire enclosing document in the index

2012-04-24 Thread KARTHIK SHIVAKUMAR
Hi

Simple Techniques is  to use  Update Index  for the dynamic data colum

rather then re-indexing the whole document.




with regards
karthik

On Mon, Apr 23, 2012 at 9:01 PM, Jong Kim jong.luc...@gmail.com wrote:

 Hi,

 I'm sure that this is very common use case that probably hundreds of people
 have asked the same question in the past, but I haven't been able to find
 an exact answer to my question.

 I have a system where each document in the Lucene index comprises of at
 least one field containing very large number of terms (for example, entire
 text from the content of potentially very large text files) and another
 metadata field that is much smaller. The first field is rarely modified
 hence remains mostly static, while the second field is modified very
 frequently.

 Currently, I'm re-indexing the entire Lucene document whenever the value of
 the second field changes from the source side. Needless to say, this yields
 very inefficient system, because significant amount of the system resources
 are being wasted in effectively re-indexing what has not changed.

 Is there any good way to solve this design problem? Obviously, an
 alternative design would be to split the index into two, and maintain
 static (and large) data in one index and the other dynamic part in the
 other index. However, this approach is not acceptable due to our data
 pattern where the match on the first index yields very large result set,
 and filtering them against the second index is very inefficient due to high
 ratio of disjoint data. In other word, while the alternate approach
 significantly reduces the indexing-time overhead, resulting search is
 unacceptably expensive.

 Any design help would be highly appreciated.

 Thanks
 /Jong




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: lucene-3.0.3

2012-02-01 Thread KARTHIK SHIVAKUMAR
Hi

lucene-3.0.3 can be used for searching a text from

Lucene 's primary job is to do a text search.

May it be PDF/HTML/XML/MSword/PPT/XLS

U have to have the code for plugin to do 2 things

1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS)
2) Index this processed text using Lucene

The indexed process can be later used for Searching thru the required
content.

;)
with regards
karthik


On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH prasad.kokep...@ness.comwrote:

 Hi,



 lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc,
 xls, msg, TXT files. For this we have any common function to accomplish
 this. Please help me on this.



 Thanks

 Prasad






-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: Can't get a hit

2011-12-30 Thread KARTHIK SHIVAKUMAR
Hi

My suggestion

U should have a Common coloum which stores Unique Identity of the Data
being Index.

ex - Name+Date of Record

This helps in replacing the duplicates with latest  by using TermQuery
search /Replace process.

This also helps in Maintaining unique record List with out duplicates


with regars'
karthik

On Thu, Dec 29, 2011 at 9:56 PM, Cheng zhoucheng2...@gmail.com wrote:

 Hi,

 I need to save a list of records into an index on hard drive. I keep a
 writer and a reader open till the end of the operation.

 My issue is that I need to compare each of the new records with each of the
 records that have been saved into the index. There are plenty of duplicate
 records in the original list.

 To my surprise, I can't find a hit for a duplicate record on the fly
 although I use the writer.commit() for every record that were being saved.

 However, if I intentionally stopped the operations (some of the records
 being saved), I re-ran the list of records and lots of hits occurs.

 Please help!

 Thanks!




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: Lucene bangalore chapter

2011-12-11 Thread KARTHIK SHIVAKUMAR
Hi

I definitely think there is NONE.. ;)



with regards
karthik


On Tue, Dec 6, 2011 at 11:41 AM, Vinaya Kumar Thimmappa 
vthimma...@ariba.com wrote:

 is there a lucene Bangalore chapter ?


 -Vinaya


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: tokenizing text using language analyzer but preserving stopwords if possible

2011-12-11 Thread KARTHIK SHIVAKUMAR
Hi

 tokenize the original foreign text into words

Need to Identify the Appropriate analyzer ( foreign language before
Indexing ...)


with regards
karthik


On Wed, Dec 7, 2011 at 4:57 PM, Avi Rosenschein arosensch...@gmail.comwrote:

 On Wed, Dec 7, 2011 at 00:41, Ilya Zavorin izavo...@caci.com wrote:

  I need to implement a quick and dirty or poor man's translation of a
  foreign language document by looking up each word in a dictionary and
  replacing it with the English translation. So what I need is to tokenize
  the original foreign text into words and then access each word, look it
 up
  and get its translation. However, if possible, I also need to preserve
  non-words, i.e. stopwords so that I could replicate them in the output
  stream without translating. If the latter is not possible then I just
 need
  to preserve the order of the original words so that their translations
 have
  the same order in the output.
 
  Can I accomplish this using Lucene components? I presume I'd have to
 start
  by creating an analyzer for the foreign language, but then what? How do I
  (i) tokenize, (ii) access words in the correct order, (iii) also access
  non-words if possible?
 

 You can always use something like StandardAnalyzer for the specific
 language, with an empty stopword list (so that no words are treated as
 stopwords). A bit trickier might be dealing with punctuation - depending on
 the analyzer, you might be able to get these to parse as separate tokens.

 -- Avi


 
  Thanks much
 
 
  Ilya Zavorin
 
 
 




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: Lucene index inside of a web app?

2011-12-05 Thread KARTHIK SHIVAKUMAR
Hi

Check  http://tomcat.apache.org

80% of the Web containers follow the same stattegy


web.xml is well explained in this URL.

cBy the way which WEB Container do u use ?

with regards
karthik

On Fri, Dec 2, 2011 at 7:54 PM, okayndc bodymo...@gmail.com wrote:

 What would the web.xml look like?  I'm lost.

 On Thu, Dec 1, 2011 at 11:04 PM, KARTHIK SHIVAKUMAR
 nskarthi...@gmail.comwrote:

  Hi
 
   generated Lucene index
 
  What if u need to upgrade this with More docs
 
  Best approach is Inject the Real path of the Index ( c:/temp/Indexes )
  to
  the Web server Application via web.xml
 
  By this approach u can even achieve
 
  1) Load balancing of multiple Web servers pointing to same Index
 files
  2) Update /Delete /Re-index with out the Web application being
 interrupted
 
 
 
  with regards
  Karthik
 
  On Tue, Nov 29, 2011 at 12:25 AM, okayndc bodymo...@gmail.com wrote:
 
   Awesome.  Thanks guys!
  
   On Mon, Nov 28, 2011 at 12:19 PM, Uwe Schindler u...@thetaphi.de
 wrote:
  
You can store the index in WEB_INF directory, just use something:
ServletContext.getRealPath(/WEB-INF/data/myIndexName);
   
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
   
   
 -Original Message-
 From: Ian Lea [mailto:ian@gmail.com]
 Sent: Monday, November 28, 2011 6:11 PM
 To: java-user@lucene.apache.org
 Subject: Re: Lucene index inside of a web app?

 Using a static string is fine - it just wasn't clear from your
  original
post what it
 was.

 I usually use a full path read from a properties file so that I can
change
it
 without a recompile, have different settings on test/live/whatever
systems, etc.
 Works for me, but isn't the only way to do it.

 If you know where your app lives, you could use a full path
 pointing
  to
 somewhere within that tree, or you could use a partial path that
 the
   app
server
 will interpret relative to something.  Which is fine too - take
 your
   pick
of
 whatever works for you.


 --
 Ian.


 On Mon, Nov 28, 2011 at 4:40 PM, okayndc bodymo...@gmail.com
  wrote:
  Hi,
 
  Thanks for your response.  Yes, LUCENE_INDEX_DIRECTORY is a
 static
  string which contains the file system path of the index (for
  example,
 c:\\index).
   Is this good practice?  If not,  what should the full path to an
  index look like?
 
  Thanks
 
  On Mon, Nov 28, 2011 at 4:54 AM, Ian Lea ian@gmail.com
  wrote:
 
  What is LUCENE_INDEX_DIRECTORY?  Some static string in your app?
 
  Lucene knows nothing about your app, JSP, or what app server you
  are
  using.  It requires a file system path and it is up to you to
   provide
  that.  I always use a full path since I prefer to store indexes
  outside the app and it avoids complications with what the app
  server
  considers the default directory. But if you want to store it
  inside,
  without specifying full path, look at the docs for your app
  server.
 
 
  --
  Ian.
 
 
  On Sun, Nov 27, 2011 at 2:10 AM, okayndc bodymo...@gmail.com
   wrote:
   Hello,
  
   I want to store the generated Lucene index inside of my Java
   application, preferably within a folder where my JSP files are
   located.  I also want
  to
   be able to search from the index within the web app. I've been
   using the LUCENE_INDEX_DIRECTORY but, this is on a file system
   (currently my hard drive).  Should I continue to use
   LUCENE_INDEX_DIRECTORY if I want the Lucene index inside the
 app
   or
   use something else.  I was a bit confused about this.  Btw,
 the
Lucene index
 content comes from a database.
  
   Any help is appreciated
  
 
 
   -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail:
 java-user-h...@lucene.apache.org
 
 
 


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
   
   
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
   
   
  
 
 
 
  --
  *N.S.KARTHIK
  R.M.S.COLONY
  BEHIND BANK OF INDIA
  R.M.V 2ND STAGE
  BANGALORE
  560094*
 




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: lucene-core-3.3.0 not optimizing

2011-12-05 Thread KARTHIK SHIVAKUMAR
Hi

LUCENE-3454 http://issues.apache.org/jira/browse/LUCENE-3454:

So u mean the code has changed with this API ...

Does any body have any sample code snippet   or is there a sample to
play around


with regards
karthik


On Fri, Dec 2, 2011 at 3:44 PM, Ian Lea ian@gmail.com wrote:

 Well, calling optimize(maxNumSegments) will (from the javadocs on
 recent releases) Optimize the index down to = maxNumSegments.  So
 optimize(100) won't get you down to 1 big file, unless you are using
 compound files perhaps.  Maybe it did something different 7 years ago
 but that seems very unlikely.

 In 3.5.0 all optimize() calls are deprecated anyway.  I suggest you
 read the release notes and the javadocs, upgrade to 3.5.0 and remove
 all optimize() calls altogether.


 --
 Ian.


 On Fri, Dec 2, 2011 at 9:58 AM, KARTHIK SHIVAKUMAR
 nskarthi...@gmail.com wrote:
  Hi
 
  I have used Index and Optimize   5+ Million XML docs  in Lucene 1.x7
  years ago,
 
  And this piece of IndexWriter.optimize used to Merger all the bits and
  pieces of the created into 1 big file.
 
  I have not tracked the API changes since 7 yearsand with
  lucene-core-3.3.0 ...on google  not able to  find the solutions Why this
 is
  happening.
 
 
  with regards
  karthik
 
  On Fri, Dec 2, 2011 at 12:37 PM, Simon Willnauer 
  simon.willna...@googlemail.com wrote:
 
  what do you understand when you say optimize? Unless you tell us what
  this code does in your case and what you'd expect it doing its
  impossible to give you any reasonable answer.
 
  simon
 
  On Fri, Dec 2, 2011 at 4:54 AM, KARTHIK SHIVAKUMAR
  nskarthi...@gmail.com wrote:
   Hi
  
   Spec
   O/s win os 7
   Jdk : 1.6.0_29
   Lucene  lucene-core-3.3.0
  
  
  
   Finally after Indexing successfully ,Why this Code does not optimize (
   sample code )
  
  INDEX_WRITER.optimize(100);
  INDEX_WRITER.commit();
  INDEX_WRITER.close();
  
  
   *N.S.KARTHIK
   R.M.S.COLONY
   BEHIND BANK OF INDIA
   R.M.V 2ND STAGE
   BANGALORE
   560094*
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 
 
 
  --
  *N.S.KARTHIK
  R.M.S.COLONY
  BEHIND BANK OF INDIA
  R.M.V 2ND STAGE
  BANGALORE
  560094*

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: [JOB] Lucid Imagination is hiring

2011-12-05 Thread KARTHIK SHIVAKUMAR
Hi

Too bad during Recession

Am from INDIA ;(



with regards
karthik

On Mon, Dec 5, 2011 at 9:10 PM, Grant Ingersoll gsing...@apache.org wrote:

 Hi All,

 If you've wanted a full time job working on Lucene or Solr, we have two
 positions open that just might be of interest.  The job descriptions are
 below.   Interested candidates should submit their resumes off list to
 care...@lucidimagination.com.

 You can learn more on our website:
 http://www.lucidimagination.com/about/careers.

 Thanks,
 Grant

 -Open Source Software Engineer
 DESCRIPTION
 Lucid Imagination is looking for a software engineer to work on the open
 source Apache Solr and Lucene projects. As part of Lucid's open source
 team, you will help implement features and provide fixes for issues in the
 world's premier open source search server and library. You will also work
 closely with Lucid's research team and technical support team to enable
 both community and customer consumption of Solr and Lucene.

 REQUIREMENTS
• Strong interest in working on high performance and large scale
 problems.
• Understanding of debugging and performance testing in a highly
 concurrent systems.
• Core Java expertise.
• Experience writing unit tests and working with continuous
 integration tools.
• Willingness to participate in and contribute to a vibrant,
 fast-paced open source community.
• Strong interpersonal, written and verbal communication skills.
• Desire to learn and be a part of a startup.
• Degree in computer science or related field.
• Experience with Lucene, Solr, Hadoop and related NoSQL
 technologies is not required, but is considered a bonus.

 EXPERIENCE
 0-5 years programming experience in Java.

 SALARY
 Based on experience

 LOCATION
 Raleigh/Durham/Chapel Hill area (preferred)

 TRAVEL
 Minimal (occasional trips to California)

 - Senior Consultant
 DESCRIPTION
 Lucid Imagination is currently looking to hire a Senior Consultant to be
 part of our Professional Services team.

 REQUIREMENTS
• Experience working with Lucene and/or Solr required.
• Establish yourself as a credible, reliable, likable, genuine, and
 trustworthy advisor to your customers.
• Provide expert-level advisory services to a wide range of
 customers with varying degrees of technical knowledge.
• Clearly identify customer pain points, priorities, and success
 criteria at the onset of each engagement.
• Resolve complex search issues in and around the Lucene/Solr
 ecosystem.
• Document recommendations in the form of Best Practice Assessments.
• Identify opportunities to provide customers with additional value
 through follow-on products and/or services.
• Communicate high-value use cases and customer feedback to our
 Product Development and Engineering teams.
• Contribute to the open source community by donating needed bug
 fixes and improvements; answering message boards; documenting existing
 code; and blogging.
• Support Business Development through product demos and customer
 QA.
• Collaborate on internal Lucid projects.
• Develop training materials and deliver classroom training on
 occasion.

 EXPERIENCE
• BS or higher in Engineering or Computer Science preferred.
• 3 or more years of IT Consulting and/or Professional Services
 experience required.
• Some Java development experience.
• Some experience with common scripting languages
 (Perl/Python/Ruby).
• Exposure to other related open source projects (Mahout, Hadoop,
 Tika, etc.) a plus.
• Experience with other commercial and open source search
 technologies a plus.
• Enterprise Search, eCommerce, and/or Business Intelligence
 experience a plus.
• Experience working in a startup a plus.

 SALARY
 Based on experience

 LOCATION:
 San Francisco/Bay Area (preferred)

 TRAVEL:
 10-20%




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: Use multiple lucene indices

2011-12-05 Thread KARTHIK SHIVAKUMAR
hi

 would the memory usage go through the roof?

Yup 

My past experience got me pickels  in there...



with regards
karthik

On Mon, Dec 5, 2011 at 11:28 PM, Rui Wang rw...@ebi.ac.uk wrote:

 Hi All,

 We are planning to use lucene in our project, but not entirely sure about
 some of the design decisions were made. Below are the details, any
 comments/suggestions are more than welcome.

 The requirements of the project are below:

 1. We have  tens of thousands of files, their size ranging from 500M to a
 few terabytes, and majority of the contents in these files will not be
 accessed frequently.

 2. We are planning to keep less accessed contents outside of our database,
 store them on the file system.

 3. We also have code to get the binary position of these contents in the
 files. Using these binary positions, we can quickly retrieve the contents
 and convert them into our domain objects.

 We think Lucene provides a scalable solution for storing and indexing
 these binary positions, so the idea is that each piece of the content in
 the files will a document, each document will have at least an ID field to
 identify to content and a binary position field contains the starting and
 stop position of the content. Having done some performance testing, it
 seems to us that Lucene is well capable of doing this.

 At the moment, we are planning to create one Lucene index per file, so if
 we have new files to be added to the system, we can simply generate a new
 index. The problem is do with searching, this approach means that we need
 to create an new IndexSearcher every time a file is accessed through our
 web service. We knew that it is rather expensive to open a new
 IndexSearcher, and are thinking of using some kind of pooling mechanism.
 Our questions are:

 1. Is this one index per file approach a viable solution? What do you
 think about pooling IndexSearcher?

 2. If we have many IndexSearchers opened at the same time, would the
 memory usage go through the roof? I couldn't find any document on how
 Lucene use allocate memory.

 Thank you very much for your help.

 Many thanks,
 Rui Wang
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: lucene-core-3.3.0 not optimizing

2011-12-02 Thread KARTHIK SHIVAKUMAR
Hi

I have used Index and Optimize   5+ Million XML docs  in Lucene 1.x7
years ago,

And this piece of IndexWriter.optimize used to Merger all the bits and
pieces of the created into 1 big file.

I have not tracked the API changes since 7 yearsand with
lucene-core-3.3.0 ...on google  not able to  find the solutions Why this is
happening.


with regards
karthik

On Fri, Dec 2, 2011 at 12:37 PM, Simon Willnauer 
simon.willna...@googlemail.com wrote:

 what do you understand when you say optimize? Unless you tell us what
 this code does in your case and what you'd expect it doing its
 impossible to give you any reasonable answer.

 simon

 On Fri, Dec 2, 2011 at 4:54 AM, KARTHIK SHIVAKUMAR
 nskarthi...@gmail.com wrote:
  Hi
 
  Spec
  O/s win os 7
  Jdk : 1.6.0_29
  Lucene  lucene-core-3.3.0
 
 
 
  Finally after Indexing successfully ,Why this Code does not optimize (
  sample code )
 
 INDEX_WRITER.optimize(100);
 INDEX_WRITER.commit();
 INDEX_WRITER.close();
 
 
  *N.S.KARTHIK
  R.M.S.COLONY
  BEHIND BANK OF INDIA
  R.M.V 2ND STAGE
  BANGALORE
  560094*

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


lucene-core-3.3.0 not optimizing

2011-12-01 Thread KARTHIK SHIVAKUMAR
Hi

Spec
O/s win os 7
Jdk : 1.6.0_29
Lucene  lucene-core-3.3.0



Finally after Indexing successfully ,Why this Code does not optimize (
sample code )

INDEX_WRITER.optimize(100);
INDEX_WRITER.commit();
INDEX_WRITER.close();


*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: Lucene index inside of a web app?

2011-12-01 Thread KARTHIK SHIVAKUMAR
Hi

 generated Lucene index

What if u need to upgrade this with More docs

Best approach is Inject the Real path of the Index ( c:/temp/Indexes )  to
the Web server Application via web.xml

By this approach u can even achieve

1) Load balancing of multiple Web servers pointing to same Index files
2) Update /Delete /Re-index with out the Web application being interrupted



with regards
Karthik

On Tue, Nov 29, 2011 at 12:25 AM, okayndc bodymo...@gmail.com wrote:

 Awesome.  Thanks guys!

 On Mon, Nov 28, 2011 at 12:19 PM, Uwe Schindler u...@thetaphi.de wrote:

  You can store the index in WEB_INF directory, just use something:
  ServletContext.getRealPath(/WEB-INF/data/myIndexName);
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
   -Original Message-
   From: Ian Lea [mailto:ian@gmail.com]
   Sent: Monday, November 28, 2011 6:11 PM
   To: java-user@lucene.apache.org
   Subject: Re: Lucene index inside of a web app?
  
   Using a static string is fine - it just wasn't clear from your original
  post what it
   was.
  
   I usually use a full path read from a properties file so that I can
  change
  it
   without a recompile, have different settings on test/live/whatever
  systems, etc.
   Works for me, but isn't the only way to do it.
  
   If you know where your app lives, you could use a full path pointing to
   somewhere within that tree, or you could use a partial path that the
 app
  server
   will interpret relative to something.  Which is fine too - take your
 pick
  of
   whatever works for you.
  
  
   --
   Ian.
  
  
   On Mon, Nov 28, 2011 at 4:40 PM, okayndc bodymo...@gmail.com wrote:
Hi,
   
Thanks for your response.  Yes, LUCENE_INDEX_DIRECTORY is a static
string which contains the file system path of the index (for example,
   c:\\index).
 Is this good practice?  If not,  what should the full path to an
index look like?
   
Thanks
   
On Mon, Nov 28, 2011 at 4:54 AM, Ian Lea ian@gmail.com wrote:
   
What is LUCENE_INDEX_DIRECTORY?  Some static string in your app?
   
Lucene knows nothing about your app, JSP, or what app server you are
using.  It requires a file system path and it is up to you to
 provide
that.  I always use a full path since I prefer to store indexes
outside the app and it avoids complications with what the app server
considers the default directory. But if you want to store it inside,
without specifying full path, look at the docs for your app server.
   
   
--
Ian.
   
   
On Sun, Nov 27, 2011 at 2:10 AM, okayndc bodymo...@gmail.com
 wrote:
 Hello,

 I want to store the generated Lucene index inside of my Java
 application, preferably within a folder where my JSP files are
 located.  I also want
to
 be able to search from the index within the web app. I've been
 using the LUCENE_INDEX_DIRECTORY but, this is on a file system
 (currently my hard drive).  Should I continue to use
 LUCENE_INDEX_DIRECTORY if I want the Lucene index inside the app
 or
 use something else.  I was a bit confused about this.  Btw, the
  Lucene index
   content comes from a database.

 Any help is appreciated

   
   
 -
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
   
   
   
  
   -
   To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
   For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: Improving indexing speed

2011-11-17 Thread KARTHIK SHIVAKUMAR
Hi

the file to be indexed depends on the type of Document / data extractor 

My Document types are usually XML type and   every  time 2+ Million XML's
are indexed and time taken is less then 5 minuts.




with regards
karthik

On Fri, Nov 11, 2011 at 1:17 AM, Ian Lea ian@gmail.com wrote:

 And how long does it take just to read and parse the files, without
 indexing them?  Often that is the problem - nothing to do with lucene.

 There is plenty of good advice in
 http://wiki.apache.org/lucene-java/ImproveIndexingSpeed.  A good match
 on the subject of your message!

 --
 Ian.


 On Thu, Nov 10, 2011 at 7:22 PM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
  can you provide more information about your setup? things like how
  much time does it take to index you documents, how many docs do you
  index, what are your index writer settings, how many cores do you
  have, where do you read from and write to (disks). oh and what version
  of lucene are you using?
 
  thanks,
 
  simon
 
  On Thu, Nov 10, 2011 at 10:40 AM, antony jospeh
  antony.joseph.webm...@gmail.com wrote:
  Hi all,
 
  I have a large number of files in a directory need to be index them. All
  the files are in specific format need to parse to extract information
 after
  that i had to index.
  Single thread process one file at a time then i decided to use multi
  threads when the main thread that loops the directory and pass the file
  into pool of worker threads using a queue
  all of the which share same index writer, How ever there is no any
  significant changes in indexing speed
 
  Any hints I am doing wrong or any suggestion
 
 
  Thanks
  Antony
 
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: Reopening an index reader still giving me deleted records ?

2011-11-08 Thread KARTHIK SHIVAKUMAR
Hi

BUT still find records that have been deleted and no longer exist in the
index

Lucene API 3.3.0  has something like

*
org.apache.lucene.index.IndexCommit

Has this been used after deletion of the records plz check

The indexex may still be there and may popup on new search



with regards
karthik
*


*
*





On Mon, Nov 7, 2011 at 7:06 PM, Paul Taylor paul_t...@fastmail.fm wrote:


 I build indexes from scratch every three hours in a seperate process, then
 when they are built I replace the old indexes with these new ones in my
 search server.
 Then I tell the search to reload the indexes as follows:

 public void reloadIndex() throws CorruptIndexException, IOException {

if (indexSearcher != null) {
IndexReader oldReader = indexSearcher.getIndexReader()**;
IndexReader newReader = oldReader.reopen();
if (oldReader != newReader) {
Similarity similarity = indexSearcher.getSimilarity();
indexSearcher = new IndexSearcher(newReader);
indexSearcher.setSimilarity(**similarity);
this.setLastServerUpdatedDate(**);
oldReader.close();
}
}
}


 What I'm finding is that new searches are finding new records added BUT
 still find records that have been deleted and no longer exist in the index.
 This is in one query, thus must mean I'm getting NEW AND OLD results for my
 new index reader rather than still using the old one, so am I
 misunderstanding what reopen() does, or it not working correctly ?

 thanks Paul

 (This code is only other plaae where reader is accessed:

 public Results searchLucene(String query, int offset, int limit) throws
 IOException, ParseException {

IndexSearcher searcher=null;
try {
searcher = getIndexSearcher();
searcher.getIndexReader().**incRef();
TopDocs topdocs = searcher.search(parseQuery(**query), offset
 + limit);
searchCount.incrementAndGet();
return processResults(searcher, topdocs, offset);
}
finally {
searcher.getIndexReader().**decRef();
}
}
 )




 --**--**-
 To unsubscribe, e-mail: 
 java-user-unsubscribe@lucene.**apache.orgjava-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: 
 java-user-help@lucene.apache.**orgjava-user-h...@lucene.apache.org




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: No subsearcher in Lucene 3.3?

2011-09-02 Thread KARTHIK SHIVAKUMAR
HI

Long time ago I used to do the same ...

I used to name the merger Index  unique names ..so at run time If the
Query returned from  1STMERGER  then the path relevant to 1STMEGEr will be
used.


similarly  U NEED NOT STORE A NEW COLUMN FOR THIS SAKE..


1MERGER  =  /temp/MERGER1  Finally   the PATH OF INDEX SEARCH IS
C:/TEMP/MERGER1
1MERGER  =  /temp/MERGER2  Finally   the PATH OF INDEX SEARCH IS
D:/TEMP/MERGER2



HOPE THIS HELPS


WITH REGARDS
KARTHIK

On Tue, Aug 30, 2011 at 9:59 PM, Joe MA mrj...@comcast.net wrote:


 Thanks for the replies.  Here is why I need the subreader (or subsearcher
 in earlier Lucene versions):

 I have multiple collections of documents, say broken out by years (it's
 more complex than this, but this illustrates the use case):

 Collection1  D:/some folder/2009/*.pdf
 (lots of PDF files)
 Collection2  D:/another folder/2010/*.pdf
  (lots of different PDF files)

 And so forth.  So in the example above, I would have two indicies, one for
 each year.When I index, I store the *relative* path of each document as
 a field.  For example, 'link:2009/file1.pdf' or 'link2010/file1.pdf' etc .
  I do not store the full path to the files in the index.  This has a huge
 advantage because we can move the documents to another file system or server
 or path without rebuilding the index.  I stored the required base path to
 the documents in each collection in a database, external to the collection.
   For example, in the above example, Collection1 would have a base path of
 D:/some folder/. Therefore, to actually access a document referenced
 in a collection, you would concat base_path retrieved from the database to
 the link field retrieved from the collection.   I would think this is a
 very common approach.

 When searching a single collection, no problem.  But if I want to search
 the two collections at the same time, I need to know which collection the
 hit came from so I can retrieve the base_path from the database.  These
 base_paths can be different.  As mentioned, this was trivial in Lucene 1.x
 and 2.x as I just grabbed the subsearcher from the result, which would for
 example return a 1 or 2 indicating which of the two collections the result
 came from.  Then I can build the path to the file.  In other words,
 subsearcher gave me the foreign key I needed to map to additional external
 information associated with each index during a multisearch.  That is now
 gone in Lucene 3.3.

 I guess a real simple solution is just to store a new field with each
 document uniquely identifying which collection.  So in the example above, I
 could create a new field foreign_key_index  for each document which would
 be Collection1 or Collection2 respectively.  This would surely work, but
 it would break backwards compatibility of my system and would require me to
 rebuild every collection.  Also seems pretty extensive for something so
 simple.

 If there is another way to do this, please advise.  Thanks in advance and
 much appreciated.

 - JMA



 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Monday, August 29, 2011 8:05 PM
 To: java-user@lucene.apache.org
 Subject: RE: No subsearcher in Lucene 3.3?

 Why do you need to know the subreader? If you want to get the document's
 stored fields, use the MultiReader.

 If you really want to know the subreader, use this:

 http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/util/ReaderUtil.html#subReader(int,
 org.apache.lucene.index.IndexReader)

 But this is somewhat slow, so don’t use in inner loops.

 Devon suggested:
  If I'm understanding your question correctly, in the Collector, you are
 told which IndexReader you are working with when the setNextReader method is
 called. Hopefully that helps.

 This does not work as expected, because the Collector gets the lowest level
 readers, which are in fact sub-sub-readers (as each single IndexReader
 contains itself of more SegmentReaders, unless you have optimized
 sub-indexes).

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


  -Original Message-
  From: Joseph MarkAnthony [mailto:mrj...@comcast.net]
  Sent: Monday, August 29, 2011 8:54 PM
  To: java-user@lucene.apache.org
  Subject: No subsearcher in Lucene 3.3?
 
  Greetings,
  In the past (Lucene version 2.x) I successfully used
  MultiSearcher.subsearcher() to identify the searchable within a
  MultiSearcher to which a hit belonged.
 
  In moving to Lucene 3.3, MultiSearcher is now deprecated, and I am
  trying to create a standard IndexSearcher over a MultiReader.  I
  haven't gotten this to work yet but it appears to be the correct
  approach.  However, I cannot find any corresponding subsearcher
  method that could identify which subreader is the one that finds the hit.
 
  For example, it used to be straightforward:
 
  Create a MultiSearcher over several Searchables, and call
 

How can i index a Java Bean into Lucene application ?

2011-08-06 Thread KARTHIK SHIVAKUMAR
Hi

How can i  index a  Java Bean  into Lucene  application ?  instead of a
file

API  : IndexWriter writer = new IndexWriter(*FSDirectory.open(INDEX_DIR)*,
new StandardAnalyzer(Version.LUCENE_CURRENT), true,
IndexWriter.MaxFieldLength.LIMITED);

Is there any alternate for the same .

ex:

* package com.web.beans.searchdata;*

* public class SearchIndexHtmlData {

public String CONTENT =NA;
public String DATEOFCREATION  =NA;
public String DATEOFINDEXCREATION =NA;


public String getCONTENT() {
return CONTENT;
}
public void setCONTENT(String cONTENT) {
CONTENT = cONTENT;
}
public String getDATEOFCREATION() {
return DATEOFCREATION;
}
public void setDATEOFCREATION(String dATEOFCREATION) {
DATEOFCREATION = dATEOFCREATION;
}
public String getDATEOFINDEXCREATION() {
return DATEOFINDEXCREATION;
}
public void setDATEOFINDEXCREATION(String dATEOFINDEXCREATION) {
DATEOFINDEXCREATION = dATEOFINDEXCREATION;
}
}*


-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*