[JOB] Lucid Imagination is hiring

2011-12-05 Thread Grant Ingersoll
Hi All,

If you've wanted a full time job working on Lucene or Solr, we have two 
positions open that just might be of interest.  The job descriptions are below. 
  Interested candidates should submit their resumes off list to 
care...@lucidimagination.com.  

You can learn more on our website: 
http://www.lucidimagination.com/about/careers.

Thanks,
Grant

-Open Source Software Engineer
DESCRIPTION
Lucid Imagination is looking for a software engineer to work on the open source 
Apache Solr and Lucene projects. As part of Lucid's open source team, you will 
help implement features and provide fixes for issues in the world's premier 
open source search server and library. You will also work closely with Lucid's 
research team and technical support team to enable both community and customer 
consumption of Solr and Lucene.

REQUIREMENTS
• Strong interest in working on high performance and large scale 
problems.
• Understanding of debugging and performance testing in a highly 
concurrent systems.
• Core Java expertise.
• Experience writing unit tests and working with continuous integration 
tools.
• Willingness to participate in and contribute to a vibrant, fast-paced 
open source community.
• Strong interpersonal, written and verbal communication skills.
• Desire to learn and be a part of a startup.
• Degree in computer science or related field.
• Experience with Lucene, Solr, Hadoop and related NoSQL technologies 
is not required, but is considered a bonus.

EXPERIENCE
0-5 years programming experience in Java.

SALARY
Based on experience

LOCATION
Raleigh/Durham/Chapel Hill area (preferred)

TRAVEL
Minimal (occasional trips to California)

- Senior Consultant
DESCRIPTION
Lucid Imagination is currently looking to hire a Senior Consultant to be part 
of our Professional Services team. 

REQUIREMENTS
• Experience working with Lucene and/or Solr required.
• Establish yourself as a credible, reliable, likable, genuine, and 
trustworthy advisor to your customers.
• Provide expert-level advisory services to a wide range of customers 
with varying degrees of technical knowledge.
• Clearly identify customer pain points, priorities, and success 
criteria at the onset of each engagement.
• Resolve complex search issues in and around the Lucene/Solr ecosystem.
• Document recommendations in the form of Best Practice Assessments.
• Identify opportunities to provide customers with additional value 
through follow-on products and/or services.
• Communicate high-value use cases and customer feedback to our Product 
Development and Engineering teams.
• Contribute to the open source community by donating needed bug fixes 
and improvements; answering message boards; documenting existing code; and 
blogging.
• Support Business Development through product demos and customer QA. 
• Collaborate on internal Lucid projects.
• Develop training materials and deliver classroom training on occasion.

EXPERIENCE
• BS or higher in Engineering or Computer Science preferred. 
• 3 or more years of IT Consulting and/or Professional Services 
experience required.
• Some Java development experience.
• Some experience with common scripting languages (Perl/Python/Ruby).
• Exposure to other related open source projects (Mahout, Hadoop, Tika, 
etc.) a plus.
• Experience with other commercial and open source search technologies 
a plus.
• Enterprise Search, eCommerce, and/or Business Intelligence experience 
a plus.
• Experience working in a startup a plus.

SALARY
Based on experience

LOCATION:
San Francisco/Bay Area (preferred)

TRAVEL:
10-20%


Mixing norms and no norms in the same document

2011-12-05 Thread Rob Hasselbaum
Hi. I'm indexing about 20,000 documents that could potentially have a few
thousand fields with the same field name. I've read in the mailing list
archives that there is no hard limit to the number of fields in a document,
but that storing norms can be a problem because of the RAM overhead.

I don't plan to boost documents or this particular set of like-named
fields, so I think I can index them with ANALYZED_NO_NORMS. But will this
cause a problem with scoring if I want to boost other fields in the same
document?

Thanks!


Re: Mixing norms and no norms in the same document

2011-12-05 Thread Simon Willnauer
On Mon, Dec 5, 2011 at 5:44 PM, Rob Hasselbaum r...@hasselbaum.net wrote:
 Hi. I'm indexing about 20,000 documents that could potentially have a few
 thousand fields with the same field name. I've read in the mailing list
 archives that there is no hard limit to the number of fields in a document,
 but that storing norms can be a problem because of the RAM overhead.

 I don't plan to boost documents or this particular set of like-named
 fields, so I think I can index them with ANALYZED_NO_NORMS. But will this
 cause a problem with scoring if I want to boost other fields in the same
 document?

 Thanks!

as long as you are consistent you should not have any problems. You
can have different norm settings on any field. Yet, if you index the
same field with and without norms you will end up with no norms
eventually.

simon

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Use multiple lucene indices

2011-12-05 Thread Rui Wang
Hi All, 

We are planning to use lucene in our project, but not entirely sure about some 
of the design decisions were made. Below are the details, any 
comments/suggestions are more than welcome. 

The requirements of the project are below:

1. We have  tens of thousands of files, their size ranging from 500M to a few 
terabytes, and majority of the contents in these files will not be accessed 
frequently. 

2. We are planning to keep less accessed contents outside of our database, 
store them on the file system.

3. We also have code to get the binary position of these contents in the files. 
Using these binary positions, we can quickly retrieve the contents and convert 
them into our domain objects. 

We think Lucene provides a scalable solution for storing and indexing these 
binary positions, so the idea is that each piece of the content in the files 
will a document, each document will have at least an ID field to identify to 
content and a binary position field contains the starting and stop position of 
the content. Having done some performance testing, it seems to us that Lucene 
is well capable of doing this. 

At the moment, we are planning to create one Lucene index per file, so if we 
have new files to be added to the system, we can simply generate a new index. 
The problem is do with searching, this approach means that we need to create an 
new IndexSearcher every time a file is accessed through our web service. We 
knew that it is rather expensive to open a new IndexSearcher, and are thinking 
of using some kind of pooling mechanism. Our questions are:

1. Is this one index per file approach a viable solution? What do you think 
about pooling IndexSearcher?

2. If we have many IndexSearchers opened at the same time, would the memory 
usage go through the roof? I couldn't find any document on how Lucene use 
allocate memory. 

Thank you very much for your help. 

Many thanks,
Rui Wang
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re:Use multiple lucene indices

2011-12-05 Thread liugangc
hi, below is some hints from my experience:
1. if you use one index per file, and many indexsearcher open at the same time, 
you may meet 'too many open files' error. you have to increase file_max value 
of os. 
2. if  these index files have less concurrent access, i think it's reasonable 
that open new searcher for every access. meanwhile, if you use lucene sort 
feature, field cache may consume many memory. thus  too many opened 
indexsearcher at the same time could exhaust all memory of your machine.


--
gang liu
email: liuga...@gmail.com



At 2011-12-06 01:58:29,Rui Wang rw...@ebi.ac.uk wrote:
Hi All, 

We are planning to use lucene in our project, but not entirely sure about some 
of the design decisions were made. Below are the details, any 
comments/suggestions are more than welcome. 

The requirements of the project are below:

1. We have  tens of thousands of files, their size ranging from 500M to a few 
terabytes, and majority of the contents in these files will not be accessed 
frequently. 

2. We are planning to keep less accessed contents outside of our database, 
store them on the file system.

3. We also have code to get the binary position of these contents in the 
files. Using these binary positions, we can quickly retrieve the contents and 
convert them into our domain objects. 

We think Lucene provides a scalable solution for storing and indexing these 
binary positions, so the idea is that each piece of the content in the files 
will a document, each document will have at least an ID field to identify to 
content and a binary position field contains the starting and stop position of 
the content. Having done some performance testing, it seems to us that Lucene 
is well capable of doing this. 

At the moment, we are planning to create one Lucene index per file, so if we 
have new files to be added to the system, we can simply generate a new index. 
The problem is do with searching, this approach means that we need to create 
an new IndexSearcher every time a file is accessed through our web service. We 
knew that it is rather expensive to open a new IndexSearcher, and are thinking 
of using some kind of pooling mechanism. Our questions are:

1. Is this one index per file approach a viable solution? What do you think 
about pooling IndexSearcher?

2. If we have many IndexSearchers opened at the same time, would the memory 
usage go through the roof? I couldn't find any document on how Lucene use 
allocate memory. 

Thank you very much for your help. 

Many thanks,
Rui Wang
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



SpanNearQuery and matching spans inside the first span

2011-12-05 Thread Trejkaz
Supposing I have a document with just hi there as the text.

If I do a span query like this:

near(near(term('hi'), term('there'), slop=0, forwards),
term('hi'), slop=1, any-direction)

that returns no hits.  However, if I do a span query like this:

near(near(term('hi'), term('there'), slop=0, forwards),
term('there'), slop=1, any-direction)

that returns the document.

It seems that the rule is that if the two spans *start* at the same
position, then they are not considered near each other.  But from
the POV of a user (and from this developer) this is lop-sided because
in both situations, the second span was inside the first span.  It
seems like they should either both be considered hits, or both be
considered non-hits.

I am wondering what others think about this and whether there is any
way to manipulate/rewrite the query to get a more balanced-looking
result.

(I'm sure it gets particularly hairy, though, when your two spans
overlap only partially... is that near or not?)

TX

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene index inside of a web app?

2011-12-05 Thread KARTHIK SHIVAKUMAR
Hi

Check  http://tomcat.apache.org

80% of the Web containers follow the same stattegy


web.xml is well explained in this URL.

cBy the way which WEB Container do u use ?

with regards
karthik

On Fri, Dec 2, 2011 at 7:54 PM, okayndc bodymo...@gmail.com wrote:

 What would the web.xml look like?  I'm lost.

 On Thu, Dec 1, 2011 at 11:04 PM, KARTHIK SHIVAKUMAR
 nskarthi...@gmail.comwrote:

  Hi
 
   generated Lucene index
 
  What if u need to upgrade this with More docs
 
  Best approach is Inject the Real path of the Index ( c:/temp/Indexes )
  to
  the Web server Application via web.xml
 
  By this approach u can even achieve
 
  1) Load balancing of multiple Web servers pointing to same Index
 files
  2) Update /Delete /Re-index with out the Web application being
 interrupted
 
 
 
  with regards
  Karthik
 
  On Tue, Nov 29, 2011 at 12:25 AM, okayndc bodymo...@gmail.com wrote:
 
   Awesome.  Thanks guys!
  
   On Mon, Nov 28, 2011 at 12:19 PM, Uwe Schindler u...@thetaphi.de
 wrote:
  
You can store the index in WEB_INF directory, just use something:
ServletContext.getRealPath(/WEB-INF/data/myIndexName);
   
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
   
   
 -Original Message-
 From: Ian Lea [mailto:ian@gmail.com]
 Sent: Monday, November 28, 2011 6:11 PM
 To: java-user@lucene.apache.org
 Subject: Re: Lucene index inside of a web app?

 Using a static string is fine - it just wasn't clear from your
  original
post what it
 was.

 I usually use a full path read from a properties file so that I can
change
it
 without a recompile, have different settings on test/live/whatever
systems, etc.
 Works for me, but isn't the only way to do it.

 If you know where your app lives, you could use a full path
 pointing
  to
 somewhere within that tree, or you could use a partial path that
 the
   app
server
 will interpret relative to something.  Which is fine too - take
 your
   pick
of
 whatever works for you.


 --
 Ian.


 On Mon, Nov 28, 2011 at 4:40 PM, okayndc bodymo...@gmail.com
  wrote:
  Hi,
 
  Thanks for your response.  Yes, LUCENE_INDEX_DIRECTORY is a
 static
  string which contains the file system path of the index (for
  example,
 c:\\index).
   Is this good practice?  If not,  what should the full path to an
  index look like?
 
  Thanks
 
  On Mon, Nov 28, 2011 at 4:54 AM, Ian Lea ian@gmail.com
  wrote:
 
  What is LUCENE_INDEX_DIRECTORY?  Some static string in your app?
 
  Lucene knows nothing about your app, JSP, or what app server you
  are
  using.  It requires a file system path and it is up to you to
   provide
  that.  I always use a full path since I prefer to store indexes
  outside the app and it avoids complications with what the app
  server
  considers the default directory. But if you want to store it
  inside,
  without specifying full path, look at the docs for your app
  server.
 
 
  --
  Ian.
 
 
  On Sun, Nov 27, 2011 at 2:10 AM, okayndc bodymo...@gmail.com
   wrote:
   Hello,
  
   I want to store the generated Lucene index inside of my Java
   application, preferably within a folder where my JSP files are
   located.  I also want
  to
   be able to search from the index within the web app. I've been
   using the LUCENE_INDEX_DIRECTORY but, this is on a file system
   (currently my hard drive).  Should I continue to use
   LUCENE_INDEX_DIRECTORY if I want the Lucene index inside the
 app
   or
   use something else.  I was a bit confused about this.  Btw,
 the
Lucene index
 content comes from a database.
  
   Any help is appreciated
  
 
 
   -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail:
 java-user-h...@lucene.apache.org
 
 
 


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
   
   
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
   
   
  
 
 
 
  --
  *N.S.KARTHIK
  R.M.S.COLONY
  BEHIND BANK OF INDIA
  R.M.V 2ND STAGE
  BANGALORE
  560094*
 




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: lucene-core-3.3.0 not optimizing

2011-12-05 Thread KARTHIK SHIVAKUMAR
Hi

LUCENE-3454 http://issues.apache.org/jira/browse/LUCENE-3454:

So u mean the code has changed with this API ...

Does any body have any sample code snippet   or is there a sample to
play around


with regards
karthik


On Fri, Dec 2, 2011 at 3:44 PM, Ian Lea ian@gmail.com wrote:

 Well, calling optimize(maxNumSegments) will (from the javadocs on
 recent releases) Optimize the index down to = maxNumSegments.  So
 optimize(100) won't get you down to 1 big file, unless you are using
 compound files perhaps.  Maybe it did something different 7 years ago
 but that seems very unlikely.

 In 3.5.0 all optimize() calls are deprecated anyway.  I suggest you
 read the release notes and the javadocs, upgrade to 3.5.0 and remove
 all optimize() calls altogether.


 --
 Ian.


 On Fri, Dec 2, 2011 at 9:58 AM, KARTHIK SHIVAKUMAR
 nskarthi...@gmail.com wrote:
  Hi
 
  I have used Index and Optimize   5+ Million XML docs  in Lucene 1.x7
  years ago,
 
  And this piece of IndexWriter.optimize used to Merger all the bits and
  pieces of the created into 1 big file.
 
  I have not tracked the API changes since 7 yearsand with
  lucene-core-3.3.0 ...on google  not able to  find the solutions Why this
 is
  happening.
 
 
  with regards
  karthik
 
  On Fri, Dec 2, 2011 at 12:37 PM, Simon Willnauer 
  simon.willna...@googlemail.com wrote:
 
  what do you understand when you say optimize? Unless you tell us what
  this code does in your case and what you'd expect it doing its
  impossible to give you any reasonable answer.
 
  simon
 
  On Fri, Dec 2, 2011 at 4:54 AM, KARTHIK SHIVAKUMAR
  nskarthi...@gmail.com wrote:
   Hi
  
   Spec
   O/s win os 7
   Jdk : 1.6.0_29
   Lucene  lucene-core-3.3.0
  
  
  
   Finally after Indexing successfully ,Why this Code does not optimize (
   sample code )
  
  INDEX_WRITER.optimize(100);
  INDEX_WRITER.commit();
  INDEX_WRITER.close();
  
  
   *N.S.KARTHIK
   R.M.S.COLONY
   BEHIND BANK OF INDIA
   R.M.V 2ND STAGE
   BANGALORE
   560094*
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 
 
 
  --
  *N.S.KARTHIK
  R.M.S.COLONY
  BEHIND BANK OF INDIA
  R.M.V 2ND STAGE
  BANGALORE
  560094*

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: [JOB] Lucid Imagination is hiring

2011-12-05 Thread KARTHIK SHIVAKUMAR
Hi

Too bad during Recession

Am from INDIA ;(



with regards
karthik

On Mon, Dec 5, 2011 at 9:10 PM, Grant Ingersoll gsing...@apache.org wrote:

 Hi All,

 If you've wanted a full time job working on Lucene or Solr, we have two
 positions open that just might be of interest.  The job descriptions are
 below.   Interested candidates should submit their resumes off list to
 care...@lucidimagination.com.

 You can learn more on our website:
 http://www.lucidimagination.com/about/careers.

 Thanks,
 Grant

 -Open Source Software Engineer
 DESCRIPTION
 Lucid Imagination is looking for a software engineer to work on the open
 source Apache Solr and Lucene projects. As part of Lucid's open source
 team, you will help implement features and provide fixes for issues in the
 world's premier open source search server and library. You will also work
 closely with Lucid's research team and technical support team to enable
 both community and customer consumption of Solr and Lucene.

 REQUIREMENTS
• Strong interest in working on high performance and large scale
 problems.
• Understanding of debugging and performance testing in a highly
 concurrent systems.
• Core Java expertise.
• Experience writing unit tests and working with continuous
 integration tools.
• Willingness to participate in and contribute to a vibrant,
 fast-paced open source community.
• Strong interpersonal, written and verbal communication skills.
• Desire to learn and be a part of a startup.
• Degree in computer science or related field.
• Experience with Lucene, Solr, Hadoop and related NoSQL
 technologies is not required, but is considered a bonus.

 EXPERIENCE
 0-5 years programming experience in Java.

 SALARY
 Based on experience

 LOCATION
 Raleigh/Durham/Chapel Hill area (preferred)

 TRAVEL
 Minimal (occasional trips to California)

 - Senior Consultant
 DESCRIPTION
 Lucid Imagination is currently looking to hire a Senior Consultant to be
 part of our Professional Services team.

 REQUIREMENTS
• Experience working with Lucene and/or Solr required.
• Establish yourself as a credible, reliable, likable, genuine, and
 trustworthy advisor to your customers.
• Provide expert-level advisory services to a wide range of
 customers with varying degrees of technical knowledge.
• Clearly identify customer pain points, priorities, and success
 criteria at the onset of each engagement.
• Resolve complex search issues in and around the Lucene/Solr
 ecosystem.
• Document recommendations in the form of Best Practice Assessments.
• Identify opportunities to provide customers with additional value
 through follow-on products and/or services.
• Communicate high-value use cases and customer feedback to our
 Product Development and Engineering teams.
• Contribute to the open source community by donating needed bug
 fixes and improvements; answering message boards; documenting existing
 code; and blogging.
• Support Business Development through product demos and customer
 QA.
• Collaborate on internal Lucid projects.
• Develop training materials and deliver classroom training on
 occasion.

 EXPERIENCE
• BS or higher in Engineering or Computer Science preferred.
• 3 or more years of IT Consulting and/or Professional Services
 experience required.
• Some Java development experience.
• Some experience with common scripting languages
 (Perl/Python/Ruby).
• Exposure to other related open source projects (Mahout, Hadoop,
 Tika, etc.) a plus.
• Experience with other commercial and open source search
 technologies a plus.
• Enterprise Search, eCommerce, and/or Business Intelligence
 experience a plus.
• Experience working in a startup a plus.

 SALARY
 Based on experience

 LOCATION:
 San Francisco/Bay Area (preferred)

 TRAVEL:
 10-20%




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Re: Use multiple lucene indices

2011-12-05 Thread KARTHIK SHIVAKUMAR
hi

 would the memory usage go through the roof?

Yup 

My past experience got me pickels  in there...



with regards
karthik

On Mon, Dec 5, 2011 at 11:28 PM, Rui Wang rw...@ebi.ac.uk wrote:

 Hi All,

 We are planning to use lucene in our project, but not entirely sure about
 some of the design decisions were made. Below are the details, any
 comments/suggestions are more than welcome.

 The requirements of the project are below:

 1. We have  tens of thousands of files, their size ranging from 500M to a
 few terabytes, and majority of the contents in these files will not be
 accessed frequently.

 2. We are planning to keep less accessed contents outside of our database,
 store them on the file system.

 3. We also have code to get the binary position of these contents in the
 files. Using these binary positions, we can quickly retrieve the contents
 and convert them into our domain objects.

 We think Lucene provides a scalable solution for storing and indexing
 these binary positions, so the idea is that each piece of the content in
 the files will a document, each document will have at least an ID field to
 identify to content and a binary position field contains the starting and
 stop position of the content. Having done some performance testing, it
 seems to us that Lucene is well capable of doing this.

 At the moment, we are planning to create one Lucene index per file, so if
 we have new files to be added to the system, we can simply generate a new
 index. The problem is do with searching, this approach means that we need
 to create an new IndexSearcher every time a file is accessed through our
 web service. We knew that it is rather expensive to open a new
 IndexSearcher, and are thinking of using some kind of pooling mechanism.
 Our questions are:

 1. Is this one index per file approach a viable solution? What do you
 think about pooling IndexSearcher?

 2. If we have many IndexSearchers opened at the same time, would the
 memory usage go through the roof? I couldn't find any document on how
 Lucene use allocate memory.

 Thank you very much for your help.

 Many thanks,
 Rui Wang
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*


Lucene bangalore chapter

2011-12-05 Thread Vinaya Kumar Thimmappa
is there a lucene Bangalore chapter ? 


-Vinaya


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org