Re: Indexing a Date/DateTime/Time field in Lucene 4
I did use the Date into millisec and stored the long into index, this helped me to convert the searched index into any date format later on the o/p. On Wed, Apr 5, 2017 at 6:08 PM, Frederik Van Hoyweghen < frederik.vanhoyweg...@chapoo.com> wrote: > Hey everyone, > > I'm seeing some conflicting suggestions concerning the type of field to use > for indexing a Date/DateTime/Time value. > > Some suggest conversion using DateTools.timeToString() and using a > StringField, > while others suggest using the long value of getTime() and using a > LongField (this is supposed to perform better using NumericRangeQuery). > > What are your opinions on this? > > Kind regards, > Frederik > -- *N.S.KARTHIKR.M.S.COLONYBEHIND BANK OF INDIAR.M.V 2ND STAGEBANGALORE560094*
Re: Indexing and searching a DateTime range
Hi Long time ago,.. I used to store datetime in millisecond . TermRangequery used to work in perfect condition Convert all datetime to millisecond and index the same. On search condition again convert datetime to millisecond and use TermRangequery. With regards Karthik On Feb 9, 2015 1:24 PM, Gergely Nagy foge...@gmail.com wrote: Hi Lucene users, I am in the beginning of implementing a Lucene application which would supposedly search through some log files. One of the requirements is to return results between a time range. Let's say these are two lines in a series of log files: 2015-02-08 00:02:06.852Z INFO... ... 2015-02-08 18:02:04.012Z INFO... Now I need to search for these lines and return all the text in-between. I was using this demo application to build an index: http://lucene.apache.org/core/4_10_3/demo/src-html/org/apache/lucene/demo/IndexFiles.html After that my first thought was using a term range query like this: TermRangeQuery query = TermRangeQuery.newStringRange(contents, 2015-02-08 00:02:06.852Z, 2015-02-08 18:02:04.012Z, true, true); But for some reason this didn't return any results. Then I was Googling for a while how to solve this problem, but all the datetime examples I found are searching based on a much simpler field. Those examples usually use a field like this: doc.add(new LongField(modified, file.lastModified(), Field.Store.NO)); So I was wondering, how can I index these log files to make a range query work on them? Any ideas? Maybe my approach is completely wrong. I am still new to Lucene so any help is appreciated. Thank you. Gergely Nagy
Re: Can some terms from analysis be silently dropped when indexing? Because I'm pretty sure I'm seeing that happen.
some terms from analysis be silently dropped when indexing Then I presume the same need to be also be exempted/dropped while searching process. else the desired results are not as expected. with regards karthik On Mon, Aug 25, 2014 at 12:52 PM, Trejkaz trej...@trypticon.org wrote: It seems like nobody knows the answer, so I'm just going to file a bug. TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- *N.S.KARTHIKR.M.S.COLONYBEHIND BANK OF INDIAR.M.V 2ND STAGEBANGALORE560094*
Re: Re-indexing a particular field only without re-indexing the entire enclosing document in the index
Hi Update Index for the dynamic data I have done this in Past ..It worked for me long time ago, All u need is have a piece of Code to Search and find the Specific Doc within the Index's ( probably using the Unique name for document ) Then delete the same and insert the same Fresh Document alone. All of this need to be done in Iteration for large set of docs. with regards karthik On Wed, Apr 25, 2012 at 12:37 PM, Torsten Krah tk...@fachschaft.imn.htwk-leipzig.de wrote: Am Dienstag, den 24.04.2012, 21:57 +0530 schrieb KARTHIK SHIVAKUMAR: Simple Techniques is to use Update Index for the dynamic data colum rather then re-indexing the whole document. Just for interest, how do you do that? -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: Re-indexing a particular field only without re-indexing the entire enclosing document in the index
Hi Simple Techniques is to use Update Index for the dynamic data colum rather then re-indexing the whole document. with regards karthik On Mon, Apr 23, 2012 at 9:01 PM, Jong Kim jong.luc...@gmail.com wrote: Hi, I'm sure that this is very common use case that probably hundreds of people have asked the same question in the past, but I haven't been able to find an exact answer to my question. I have a system where each document in the Lucene index comprises of at least one field containing very large number of terms (for example, entire text from the content of potentially very large text files) and another metadata field that is much smaller. The first field is rarely modified hence remains mostly static, while the second field is modified very frequently. Currently, I'm re-indexing the entire Lucene document whenever the value of the second field changes from the source side. Needless to say, this yields very inefficient system, because significant amount of the system resources are being wasted in effectively re-indexing what has not changed. Is there any good way to solve this design problem? Obviously, an alternative design would be to split the index into two, and maintain static (and large) data in one index and the other dynamic part in the other index. However, this approach is not acceptable due to our data pattern where the match on the first index yields very large result set, and filtering them against the second index is very inefficient due to high ratio of disjoint data. In other word, while the alternate approach significantly reduces the indexing-time overhead, resulting search is unacceptably expensive. Any design help would be highly appreciated. Thanks /Jong -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: lucene-3.0.3
Hi lucene-3.0.3 can be used for searching a text from Lucene 's primary job is to do a text search. May it be PDF/HTML/XML/MSword/PPT/XLS U have to have the code for plugin to do 2 things 1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS) 2) Index this processed text using Lucene The indexed process can be later used for Searching thru the required content. ;) with regards karthik On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH prasad.kokep...@ness.comwrote: Hi, lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc, xls, msg, TXT files. For this we have any common function to accomplish this. Please help me on this. Thanks Prasad -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: Can't get a hit
Hi My suggestion U should have a Common coloum which stores Unique Identity of the Data being Index. ex - Name+Date of Record This helps in replacing the duplicates with latest by using TermQuery search /Replace process. This also helps in Maintaining unique record List with out duplicates with regars' karthik On Thu, Dec 29, 2011 at 9:56 PM, Cheng zhoucheng2...@gmail.com wrote: Hi, I need to save a list of records into an index on hard drive. I keep a writer and a reader open till the end of the operation. My issue is that I need to compare each of the new records with each of the records that have been saved into the index. There are plenty of duplicate records in the original list. To my surprise, I can't find a hit for a duplicate record on the fly although I use the writer.commit() for every record that were being saved. However, if I intentionally stopped the operations (some of the records being saved), I re-ran the list of records and lots of hits occurs. Please help! Thanks! -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: Lucene bangalore chapter
Hi I definitely think there is NONE.. ;) with regards karthik On Tue, Dec 6, 2011 at 11:41 AM, Vinaya Kumar Thimmappa vthimma...@ariba.com wrote: is there a lucene Bangalore chapter ? -Vinaya - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: tokenizing text using language analyzer but preserving stopwords if possible
Hi tokenize the original foreign text into words Need to Identify the Appropriate analyzer ( foreign language before Indexing ...) with regards karthik On Wed, Dec 7, 2011 at 4:57 PM, Avi Rosenschein arosensch...@gmail.comwrote: On Wed, Dec 7, 2011 at 00:41, Ilya Zavorin izavo...@caci.com wrote: I need to implement a quick and dirty or poor man's translation of a foreign language document by looking up each word in a dictionary and replacing it with the English translation. So what I need is to tokenize the original foreign text into words and then access each word, look it up and get its translation. However, if possible, I also need to preserve non-words, i.e. stopwords so that I could replicate them in the output stream without translating. If the latter is not possible then I just need to preserve the order of the original words so that their translations have the same order in the output. Can I accomplish this using Lucene components? I presume I'd have to start by creating an analyzer for the foreign language, but then what? How do I (i) tokenize, (ii) access words in the correct order, (iii) also access non-words if possible? You can always use something like StandardAnalyzer for the specific language, with an empty stopword list (so that no words are treated as stopwords). A bit trickier might be dealing with punctuation - depending on the analyzer, you might be able to get these to parse as separate tokens. -- Avi Thanks much Ilya Zavorin -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: Lucene index inside of a web app?
Hi Check http://tomcat.apache.org 80% of the Web containers follow the same stattegy web.xml is well explained in this URL. cBy the way which WEB Container do u use ? with regards karthik On Fri, Dec 2, 2011 at 7:54 PM, okayndc bodymo...@gmail.com wrote: What would the web.xml look like? I'm lost. On Thu, Dec 1, 2011 at 11:04 PM, KARTHIK SHIVAKUMAR nskarthi...@gmail.comwrote: Hi generated Lucene index What if u need to upgrade this with More docs Best approach is Inject the Real path of the Index ( c:/temp/Indexes ) to the Web server Application via web.xml By this approach u can even achieve 1) Load balancing of multiple Web servers pointing to same Index files 2) Update /Delete /Re-index with out the Web application being interrupted with regards Karthik On Tue, Nov 29, 2011 at 12:25 AM, okayndc bodymo...@gmail.com wrote: Awesome. Thanks guys! On Mon, Nov 28, 2011 at 12:19 PM, Uwe Schindler u...@thetaphi.de wrote: You can store the index in WEB_INF directory, just use something: ServletContext.getRealPath(/WEB-INF/data/myIndexName); - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ian Lea [mailto:ian@gmail.com] Sent: Monday, November 28, 2011 6:11 PM To: java-user@lucene.apache.org Subject: Re: Lucene index inside of a web app? Using a static string is fine - it just wasn't clear from your original post what it was. I usually use a full path read from a properties file so that I can change it without a recompile, have different settings on test/live/whatever systems, etc. Works for me, but isn't the only way to do it. If you know where your app lives, you could use a full path pointing to somewhere within that tree, or you could use a partial path that the app server will interpret relative to something. Which is fine too - take your pick of whatever works for you. -- Ian. On Mon, Nov 28, 2011 at 4:40 PM, okayndc bodymo...@gmail.com wrote: Hi, Thanks for your response. Yes, LUCENE_INDEX_DIRECTORY is a static string which contains the file system path of the index (for example, c:\\index). Is this good practice? If not, what should the full path to an index look like? Thanks On Mon, Nov 28, 2011 at 4:54 AM, Ian Lea ian@gmail.com wrote: What is LUCENE_INDEX_DIRECTORY? Some static string in your app? Lucene knows nothing about your app, JSP, or what app server you are using. It requires a file system path and it is up to you to provide that. I always use a full path since I prefer to store indexes outside the app and it avoids complications with what the app server considers the default directory. But if you want to store it inside, without specifying full path, look at the docs for your app server. -- Ian. On Sun, Nov 27, 2011 at 2:10 AM, okayndc bodymo...@gmail.com wrote: Hello, I want to store the generated Lucene index inside of my Java application, preferably within a folder where my JSP files are located. I also want to be able to search from the index within the web app. I've been using the LUCENE_INDEX_DIRECTORY but, this is on a file system (currently my hard drive). Should I continue to use LUCENE_INDEX_DIRECTORY if I want the Lucene index inside the app or use something else. I was a bit confused about this. Btw, the Lucene index content comes from a database. Any help is appreciated - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094* -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: lucene-core-3.3.0 not optimizing
Hi LUCENE-3454 http://issues.apache.org/jira/browse/LUCENE-3454: So u mean the code has changed with this API ... Does any body have any sample code snippet or is there a sample to play around with regards karthik On Fri, Dec 2, 2011 at 3:44 PM, Ian Lea ian@gmail.com wrote: Well, calling optimize(maxNumSegments) will (from the javadocs on recent releases) Optimize the index down to = maxNumSegments. So optimize(100) won't get you down to 1 big file, unless you are using compound files perhaps. Maybe it did something different 7 years ago but that seems very unlikely. In 3.5.0 all optimize() calls are deprecated anyway. I suggest you read the release notes and the javadocs, upgrade to 3.5.0 and remove all optimize() calls altogether. -- Ian. On Fri, Dec 2, 2011 at 9:58 AM, KARTHIK SHIVAKUMAR nskarthi...@gmail.com wrote: Hi I have used Index and Optimize 5+ Million XML docs in Lucene 1.x7 years ago, And this piece of IndexWriter.optimize used to Merger all the bits and pieces of the created into 1 big file. I have not tracked the API changes since 7 yearsand with lucene-core-3.3.0 ...on google not able to find the solutions Why this is happening. with regards karthik On Fri, Dec 2, 2011 at 12:37 PM, Simon Willnauer simon.willna...@googlemail.com wrote: what do you understand when you say optimize? Unless you tell us what this code does in your case and what you'd expect it doing its impossible to give you any reasonable answer. simon On Fri, Dec 2, 2011 at 4:54 AM, KARTHIK SHIVAKUMAR nskarthi...@gmail.com wrote: Hi Spec O/s win os 7 Jdk : 1.6.0_29 Lucene lucene-core-3.3.0 Finally after Indexing successfully ,Why this Code does not optimize ( sample code ) INDEX_WRITER.optimize(100); INDEX_WRITER.commit(); INDEX_WRITER.close(); *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094* - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094* - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: [JOB] Lucid Imagination is hiring
Hi Too bad during Recession Am from INDIA ;( with regards karthik On Mon, Dec 5, 2011 at 9:10 PM, Grant Ingersoll gsing...@apache.org wrote: Hi All, If you've wanted a full time job working on Lucene or Solr, we have two positions open that just might be of interest. The job descriptions are below. Interested candidates should submit their resumes off list to care...@lucidimagination.com. You can learn more on our website: http://www.lucidimagination.com/about/careers. Thanks, Grant -Open Source Software Engineer DESCRIPTION Lucid Imagination is looking for a software engineer to work on the open source Apache Solr and Lucene projects. As part of Lucid's open source team, you will help implement features and provide fixes for issues in the world's premier open source search server and library. You will also work closely with Lucid's research team and technical support team to enable both community and customer consumption of Solr and Lucene. REQUIREMENTS • Strong interest in working on high performance and large scale problems. • Understanding of debugging and performance testing in a highly concurrent systems. • Core Java expertise. • Experience writing unit tests and working with continuous integration tools. • Willingness to participate in and contribute to a vibrant, fast-paced open source community. • Strong interpersonal, written and verbal communication skills. • Desire to learn and be a part of a startup. • Degree in computer science or related field. • Experience with Lucene, Solr, Hadoop and related NoSQL technologies is not required, but is considered a bonus. EXPERIENCE 0-5 years programming experience in Java. SALARY Based on experience LOCATION Raleigh/Durham/Chapel Hill area (preferred) TRAVEL Minimal (occasional trips to California) - Senior Consultant DESCRIPTION Lucid Imagination is currently looking to hire a Senior Consultant to be part of our Professional Services team. REQUIREMENTS • Experience working with Lucene and/or Solr required. • Establish yourself as a credible, reliable, likable, genuine, and trustworthy advisor to your customers. • Provide expert-level advisory services to a wide range of customers with varying degrees of technical knowledge. • Clearly identify customer pain points, priorities, and success criteria at the onset of each engagement. • Resolve complex search issues in and around the Lucene/Solr ecosystem. • Document recommendations in the form of Best Practice Assessments. • Identify opportunities to provide customers with additional value through follow-on products and/or services. • Communicate high-value use cases and customer feedback to our Product Development and Engineering teams. • Contribute to the open source community by donating needed bug fixes and improvements; answering message boards; documenting existing code; and blogging. • Support Business Development through product demos and customer QA. • Collaborate on internal Lucid projects. • Develop training materials and deliver classroom training on occasion. EXPERIENCE • BS or higher in Engineering or Computer Science preferred. • 3 or more years of IT Consulting and/or Professional Services experience required. • Some Java development experience. • Some experience with common scripting languages (Perl/Python/Ruby). • Exposure to other related open source projects (Mahout, Hadoop, Tika, etc.) a plus. • Experience with other commercial and open source search technologies a plus. • Enterprise Search, eCommerce, and/or Business Intelligence experience a plus. • Experience working in a startup a plus. SALARY Based on experience LOCATION: San Francisco/Bay Area (preferred) TRAVEL: 10-20% -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: Use multiple lucene indices
hi would the memory usage go through the roof? Yup My past experience got me pickels in there... with regards karthik On Mon, Dec 5, 2011 at 11:28 PM, Rui Wang rw...@ebi.ac.uk wrote: Hi All, We are planning to use lucene in our project, but not entirely sure about some of the design decisions were made. Below are the details, any comments/suggestions are more than welcome. The requirements of the project are below: 1. We have tens of thousands of files, their size ranging from 500M to a few terabytes, and majority of the contents in these files will not be accessed frequently. 2. We are planning to keep less accessed contents outside of our database, store them on the file system. 3. We also have code to get the binary position of these contents in the files. Using these binary positions, we can quickly retrieve the contents and convert them into our domain objects. We think Lucene provides a scalable solution for storing and indexing these binary positions, so the idea is that each piece of the content in the files will a document, each document will have at least an ID field to identify to content and a binary position field contains the starting and stop position of the content. Having done some performance testing, it seems to us that Lucene is well capable of doing this. At the moment, we are planning to create one Lucene index per file, so if we have new files to be added to the system, we can simply generate a new index. The problem is do with searching, this approach means that we need to create an new IndexSearcher every time a file is accessed through our web service. We knew that it is rather expensive to open a new IndexSearcher, and are thinking of using some kind of pooling mechanism. Our questions are: 1. Is this one index per file approach a viable solution? What do you think about pooling IndexSearcher? 2. If we have many IndexSearchers opened at the same time, would the memory usage go through the roof? I couldn't find any document on how Lucene use allocate memory. Thank you very much for your help. Many thanks, Rui Wang - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: lucene-core-3.3.0 not optimizing
Hi I have used Index and Optimize 5+ Million XML docs in Lucene 1.x7 years ago, And this piece of IndexWriter.optimize used to Merger all the bits and pieces of the created into 1 big file. I have not tracked the API changes since 7 yearsand with lucene-core-3.3.0 ...on google not able to find the solutions Why this is happening. with regards karthik On Fri, Dec 2, 2011 at 12:37 PM, Simon Willnauer simon.willna...@googlemail.com wrote: what do you understand when you say optimize? Unless you tell us what this code does in your case and what you'd expect it doing its impossible to give you any reasonable answer. simon On Fri, Dec 2, 2011 at 4:54 AM, KARTHIK SHIVAKUMAR nskarthi...@gmail.com wrote: Hi Spec O/s win os 7 Jdk : 1.6.0_29 Lucene lucene-core-3.3.0 Finally after Indexing successfully ,Why this Code does not optimize ( sample code ) INDEX_WRITER.optimize(100); INDEX_WRITER.commit(); INDEX_WRITER.close(); *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094* - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
lucene-core-3.3.0 not optimizing
Hi Spec O/s win os 7 Jdk : 1.6.0_29 Lucene lucene-core-3.3.0 Finally after Indexing successfully ,Why this Code does not optimize ( sample code ) INDEX_WRITER.optimize(100); INDEX_WRITER.commit(); INDEX_WRITER.close(); *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: Lucene index inside of a web app?
Hi generated Lucene index What if u need to upgrade this with More docs Best approach is Inject the Real path of the Index ( c:/temp/Indexes ) to the Web server Application via web.xml By this approach u can even achieve 1) Load balancing of multiple Web servers pointing to same Index files 2) Update /Delete /Re-index with out the Web application being interrupted with regards Karthik On Tue, Nov 29, 2011 at 12:25 AM, okayndc bodymo...@gmail.com wrote: Awesome. Thanks guys! On Mon, Nov 28, 2011 at 12:19 PM, Uwe Schindler u...@thetaphi.de wrote: You can store the index in WEB_INF directory, just use something: ServletContext.getRealPath(/WEB-INF/data/myIndexName); - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ian Lea [mailto:ian@gmail.com] Sent: Monday, November 28, 2011 6:11 PM To: java-user@lucene.apache.org Subject: Re: Lucene index inside of a web app? Using a static string is fine - it just wasn't clear from your original post what it was. I usually use a full path read from a properties file so that I can change it without a recompile, have different settings on test/live/whatever systems, etc. Works for me, but isn't the only way to do it. If you know where your app lives, you could use a full path pointing to somewhere within that tree, or you could use a partial path that the app server will interpret relative to something. Which is fine too - take your pick of whatever works for you. -- Ian. On Mon, Nov 28, 2011 at 4:40 PM, okayndc bodymo...@gmail.com wrote: Hi, Thanks for your response. Yes, LUCENE_INDEX_DIRECTORY is a static string which contains the file system path of the index (for example, c:\\index). Is this good practice? If not, what should the full path to an index look like? Thanks On Mon, Nov 28, 2011 at 4:54 AM, Ian Lea ian@gmail.com wrote: What is LUCENE_INDEX_DIRECTORY? Some static string in your app? Lucene knows nothing about your app, JSP, or what app server you are using. It requires a file system path and it is up to you to provide that. I always use a full path since I prefer to store indexes outside the app and it avoids complications with what the app server considers the default directory. But if you want to store it inside, without specifying full path, look at the docs for your app server. -- Ian. On Sun, Nov 27, 2011 at 2:10 AM, okayndc bodymo...@gmail.com wrote: Hello, I want to store the generated Lucene index inside of my Java application, preferably within a folder where my JSP files are located. I also want to be able to search from the index within the web app. I've been using the LUCENE_INDEX_DIRECTORY but, this is on a file system (currently my hard drive). Should I continue to use LUCENE_INDEX_DIRECTORY if I want the Lucene index inside the app or use something else. I was a bit confused about this. Btw, the Lucene index content comes from a database. Any help is appreciated - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: Improving indexing speed
Hi the file to be indexed depends on the type of Document / data extractor My Document types are usually XML type and every time 2+ Million XML's are indexed and time taken is less then 5 minuts. with regards karthik On Fri, Nov 11, 2011 at 1:17 AM, Ian Lea ian@gmail.com wrote: And how long does it take just to read and parse the files, without indexing them? Often that is the problem - nothing to do with lucene. There is plenty of good advice in http://wiki.apache.org/lucene-java/ImproveIndexingSpeed. A good match on the subject of your message! -- Ian. On Thu, Nov 10, 2011 at 7:22 PM, Simon Willnauer simon.willna...@googlemail.com wrote: can you provide more information about your setup? things like how much time does it take to index you documents, how many docs do you index, what are your index writer settings, how many cores do you have, where do you read from and write to (disks). oh and what version of lucene are you using? thanks, simon On Thu, Nov 10, 2011 at 10:40 AM, antony jospeh antony.joseph.webm...@gmail.com wrote: Hi all, I have a large number of files in a directory need to be index them. All the files are in specific format need to parse to extract information after that i had to index. Single thread process one file at a time then i decided to use multi threads when the main thread that loops the directory and pass the file into pool of worker threads using a queue all of the which share same index writer, How ever there is no any significant changes in indexing speed Any hints I am doing wrong or any suggestion Thanks Antony - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: Reopening an index reader still giving me deleted records ?
Hi BUT still find records that have been deleted and no longer exist in the index Lucene API 3.3.0 has something like * org.apache.lucene.index.IndexCommit Has this been used after deletion of the records plz check The indexex may still be there and may popup on new search with regards karthik * * * On Mon, Nov 7, 2011 at 7:06 PM, Paul Taylor paul_t...@fastmail.fm wrote: I build indexes from scratch every three hours in a seperate process, then when they are built I replace the old indexes with these new ones in my search server. Then I tell the search to reload the indexes as follows: public void reloadIndex() throws CorruptIndexException, IOException { if (indexSearcher != null) { IndexReader oldReader = indexSearcher.getIndexReader()**; IndexReader newReader = oldReader.reopen(); if (oldReader != newReader) { Similarity similarity = indexSearcher.getSimilarity(); indexSearcher = new IndexSearcher(newReader); indexSearcher.setSimilarity(**similarity); this.setLastServerUpdatedDate(**); oldReader.close(); } } } What I'm finding is that new searches are finding new records added BUT still find records that have been deleted and no longer exist in the index. This is in one query, thus must mean I'm getting NEW AND OLD results for my new index reader rather than still using the old one, so am I misunderstanding what reopen() does, or it not working correctly ? thanks Paul (This code is only other plaae where reader is accessed: public Results searchLucene(String query, int offset, int limit) throws IOException, ParseException { IndexSearcher searcher=null; try { searcher = getIndexSearcher(); searcher.getIndexReader().**incRef(); TopDocs topdocs = searcher.search(parseQuery(**query), offset + limit); searchCount.incrementAndGet(); return processResults(searcher, topdocs, offset); } finally { searcher.getIndexReader().**decRef(); } } ) --**--**- To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.orgjava-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.**orgjava-user-h...@lucene.apache.org -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*
Re: No subsearcher in Lucene 3.3?
HI Long time ago I used to do the same ... I used to name the merger Index unique names ..so at run time If the Query returned from 1STMERGER then the path relevant to 1STMEGEr will be used. similarly U NEED NOT STORE A NEW COLUMN FOR THIS SAKE.. 1MERGER = /temp/MERGER1 Finally the PATH OF INDEX SEARCH IS C:/TEMP/MERGER1 1MERGER = /temp/MERGER2 Finally the PATH OF INDEX SEARCH IS D:/TEMP/MERGER2 HOPE THIS HELPS WITH REGARDS KARTHIK On Tue, Aug 30, 2011 at 9:59 PM, Joe MA mrj...@comcast.net wrote: Thanks for the replies. Here is why I need the subreader (or subsearcher in earlier Lucene versions): I have multiple collections of documents, say broken out by years (it's more complex than this, but this illustrates the use case): Collection1 D:/some folder/2009/*.pdf (lots of PDF files) Collection2 D:/another folder/2010/*.pdf (lots of different PDF files) And so forth. So in the example above, I would have two indicies, one for each year.When I index, I store the *relative* path of each document as a field. For example, 'link:2009/file1.pdf' or 'link2010/file1.pdf' etc . I do not store the full path to the files in the index. This has a huge advantage because we can move the documents to another file system or server or path without rebuilding the index. I stored the required base path to the documents in each collection in a database, external to the collection. For example, in the above example, Collection1 would have a base path of D:/some folder/. Therefore, to actually access a document referenced in a collection, you would concat base_path retrieved from the database to the link field retrieved from the collection. I would think this is a very common approach. When searching a single collection, no problem. But if I want to search the two collections at the same time, I need to know which collection the hit came from so I can retrieve the base_path from the database. These base_paths can be different. As mentioned, this was trivial in Lucene 1.x and 2.x as I just grabbed the subsearcher from the result, which would for example return a 1 or 2 indicating which of the two collections the result came from. Then I can build the path to the file. In other words, subsearcher gave me the foreign key I needed to map to additional external information associated with each index during a multisearch. That is now gone in Lucene 3.3. I guess a real simple solution is just to store a new field with each document uniquely identifying which collection. So in the example above, I could create a new field foreign_key_index for each document which would be Collection1 or Collection2 respectively. This would surely work, but it would break backwards compatibility of my system and would require me to rebuild every collection. Also seems pretty extensive for something so simple. If there is another way to do this, please advise. Thanks in advance and much appreciated. - JMA -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Monday, August 29, 2011 8:05 PM To: java-user@lucene.apache.org Subject: RE: No subsearcher in Lucene 3.3? Why do you need to know the subreader? If you want to get the document's stored fields, use the MultiReader. If you really want to know the subreader, use this: http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/util/ReaderUtil.html#subReader(int, org.apache.lucene.index.IndexReader) But this is somewhat slow, so don’t use in inner loops. Devon suggested: If I'm understanding your question correctly, in the Collector, you are told which IndexReader you are working with when the setNextReader method is called. Hopefully that helps. This does not work as expected, because the Collector gets the lowest level readers, which are in fact sub-sub-readers (as each single IndexReader contains itself of more SegmentReaders, unless you have optimized sub-indexes). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Joseph MarkAnthony [mailto:mrj...@comcast.net] Sent: Monday, August 29, 2011 8:54 PM To: java-user@lucene.apache.org Subject: No subsearcher in Lucene 3.3? Greetings, In the past (Lucene version 2.x) I successfully used MultiSearcher.subsearcher() to identify the searchable within a MultiSearcher to which a hit belonged. In moving to Lucene 3.3, MultiSearcher is now deprecated, and I am trying to create a standard IndexSearcher over a MultiReader. I haven't gotten this to work yet but it appears to be the correct approach. However, I cannot find any corresponding subsearcher method that could identify which subreader is the one that finds the hit. For example, it used to be straightforward: Create a MultiSearcher over several Searchables, and call
How can i index a Java Bean into Lucene application ?
Hi How can i index a Java Bean into Lucene application ? instead of a file API : IndexWriter writer = new IndexWriter(*FSDirectory.open(INDEX_DIR)*, new StandardAnalyzer(Version.LUCENE_CURRENT), true, IndexWriter.MaxFieldLength.LIMITED); Is there any alternate for the same . ex: * package com.web.beans.searchdata;* * public class SearchIndexHtmlData { public String CONTENT =NA; public String DATEOFCREATION =NA; public String DATEOFINDEXCREATION =NA; public String getCONTENT() { return CONTENT; } public void setCONTENT(String cONTENT) { CONTENT = cONTENT; } public String getDATEOFCREATION() { return DATEOFCREATION; } public void setDATEOFCREATION(String dATEOFCREATION) { DATEOFCREATION = dATEOFCREATION; } public String getDATEOFINDEXCREATION() { return DATEOFINDEXCREATION; } public void setDATEOFINDEXCREATION(String dATEOFINDEXCREATION) { DATEOFINDEXCREATION = dATEOFINDEXCREATION; } }* -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094*