RE: lucene Input and Output format

2017-08-02 Thread Ranganath B N
Hi,

  It's not about the file formats. Rather It is about LuceneInputFormat and 
LuceneOutputFormat interfaces which deals with getsplit(), getRecordReader() 
and getRecordWriter() methods. Are there any 
Implementations for these interfaces?


Thanks,
Ranganath B. N. 

-Original Message-
From: Adrien Grand [mailto:jpou...@gmail.com] 
Sent: Tuesday, August 01, 2017 7:23 PM
To: java-user@lucene.apache.org
Cc: Vadiraj Muradi
Subject: Re: lucene Input and Output format

Which part of the index do you want to learn about? Here are some descriptions 
of the file formats:
 - terms dict:
http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.html
 - postings:
http://lucene.apache.org/core/6_6_0/core/index.html?org/apache/lucene/index/IndexableField.html
 - doc values:
http://lucene.apache.org/core/6_6_0/core/index.html?org/apache/lucene/index/IndexableField.html
 - stored fields:
http://lucene.apache.org/core/6_6_0/core/index.html?org/apache/lucene/index/IndexableField.html

Le lun. 31 juil. 2017 à 15:02, Ranganath B N  a écrit 
:

>
>
>
> Hi All,
>
>  Can you point me to some of the implementations  of lucene Input 
> and Output format? I wanted to know them to  understand the 
> distributed implementation approach.
>
>
> Thanks,
> Ranganath B. N.
>


Re: lucene Input and Output format

2017-08-02 Thread Ian Lea
What are the full package names for these interfaces?  I don't think they
are org.apache.lucene.


--
Ian.


On Wed, Aug 2, 2017 at 9:00 AM, Ranganath B N 
wrote:

> Hi,
>
>   It's not about the file formats. Rather It is about LuceneInputFormat
> and LuceneOutputFormat interfaces which deals with getsplit(),
> getRecordReader() and getRecordWriter() methods. Are there any
> Implementations for these interfaces?
>
>
> Thanks,
> Ranganath B. N.
>
> -Original Message-
> From: Adrien Grand [mailto:jpou...@gmail.com]
> Sent: Tuesday, August 01, 2017 7:23 PM
> To: java-user@lucene.apache.org
> Cc: Vadiraj Muradi
> Subject: Re: lucene Input and Output format
>
> Which part of the index do you want to learn about? Here are some
> descriptions of the file formats:
>  - terms dict:
> http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/
> codecs/blocktree/BlockTreeTermsWriter.html
>  - postings:
> http://lucene.apache.org/core/6_6_0/core/index.html?org/
> apache/lucene/index/IndexableField.html
>  - doc values:
> http://lucene.apache.org/core/6_6_0/core/index.html?org/
> apache/lucene/index/IndexableField.html
>  - stored fields:
> http://lucene.apache.org/core/6_6_0/core/index.html?org/
> apache/lucene/index/IndexableField.html
>
> Le lun. 31 juil. 2017 à 15:02, Ranganath B N  a
> écrit :
>
> >
> >
> >
> > Hi All,
> >
> >  Can you point me to some of the implementations  of lucene Input
> > and Output format? I wanted to know them to  understand the
> > distributed implementation approach.
> >
> >
> > Thanks,
> > Ranganath B. N.
> >
>


Re: Lucene 6.6: "Too many open files"

2017-08-02 Thread Nawab Zada Asad Iqbal
Thanks Uwe,

That worked actually. After running for 3 hours, I observed about 88% of
indexing rate as compared to 4.5.0 without any file descriptor issues. It
seems that I can probably do some tweaking to get same throughput as
before. I looked at the code and the default values for
ConcurrentMergeStrategy are -1 (and the solr process intelligently decides
the value). Is there a way to know what is the default being employed? can
I start with maxThreadCount and  maxMergeCount = 10 ?


Regards
Nawab

On Tue, Aug 1, 2017 at 9:35 AM, Uwe Schindler  wrote:

> Hi,
>
> You should reset those settings back to defaults (remove the inner
> settings in the factory). 30 merge threads will eat up all your file
> handles. In earlier versions of Lucene, internal limitations in IndexWriter
> make it unlikely, that you spawn too many threads, so 30 had no effect.
>
> In Lucene 6, the number of merges and threads are automatically choosen by
> your disk type (SSD detection) and CPU count. So you should definitely use
> defaults first and only ever change it for good reasons (if told you by
> specialists).
>
> Uwe
>
> Am 1. August 2017 17:25:43 MESZ schrieb Nawab Zada Asad Iqbal <
> khi...@gmail.com>:
> >Thanks Jigar
> >I haven't tweaked ConcurrentMergeStrategy between 4.5.0 and 6.6. Here
> >is
> >what I have:
> >
> > >class="org.apache.lucene.index.ConcurrentMergeScheduler">
> >  30
> >  30
> >
> >
> >
> >On Mon, Jul 31, 2017 at 8:56 PM, Jigar Shah 
> >wrote:
> >
> >> I faced such problem when I used nomergepolicy, and did some code to
> >manual
> >> merging segments which had bug and I had same issue. Make sure you
> >have
> >> default AFAIR ConcurrentMergeStrategy enabled. And its is configured
> >with
> >> appropriate settings.
> >>
> >> On Jul 31, 2017 11:21 PM, "Erick Erickson" 
> >> wrote:
> >>
> >> > No, nothing's changed fundamentally. But you say:
> >> >
> >> > "We have some batch indexing scripts, which
> >> > flood the solr servers with indexing requests (while keeping
> >> open-searcher
> >> > false)"
> >> >
> >> > What is your commit interval? Regardless of whether openSearcher is
> >false
> >> > or not, background merging continues apace with every commit. By
> >any
> >> chance
> >> > did you change your merge policy (or not copy the one from 4x to
> >6x)?
> >> Shot
> >> > in the dark...
> >> >
> >> > Best,
> >> > Erick
> >> >
> >> > On Mon, Jul 31, 2017 at 7:15 PM, Nawab Zada Asad Iqbal
> > >> >
> >> > wrote:
> >> > > Hi,
> >> > >
> >> > > I am upgrading from solr4.5 to solr6.6 and hitting this issue
> >during
> >> > > complete reindexing scenario.  We have some batch indexing
> >scripts,
> >> which
> >> > > flood the solr servers with indexing requests (while keeping
> >> > open-searcher
> >> > > false) for many hours and then perform one commit. This used to
> >work
> >> fine
> >> > > with 4.5, but with 6.6, i get 'Too many open files' within a
> >couple of
> >> > > minutes. I have checked that "ulimit" is same between old and new
> >> > servers.
> >> > >
> >> > > Has something fundamentally changed in recent lucene versions,
> >which
> >> > keeps
> >> > > file descriptors around for a longer time?
> >> > >
> >> > >
> >> > > Here is a sample error message:
> >> > > at org.apache.lucene.index.IndexWriter.ensureOpen(
> >> > IndexWriter.java:749)
> >> > > at org.apache.lucene.index.IndexWriter.ensureOpen(
> >> > IndexWriter.java:763)
> >> > > at org.apache.lucene.index.IndexWriter.commit(
> >> IndexWriter.java:3206)
> >> > > at
> >> > > org.apache.solr.update.DirectUpdateHandler2.commit(
> >> > DirectUpdateHandler2.java:644)
> >> > > at
> >> > >
> >org.apache.solr.update.processor.RunUpdateProcessor.processCommit(
> >> > RunUpdateProcessorFactory.java:93)
> >> > > at
> >> > >
> >org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(
> >> > UpdateRequestProcessor.java:68)
> >> > > at
> >> > > org.apache.solr.update.processor.DistributedUpdateProcessor.
> >> > doLocalCommit(DistributedUpdateProcessor.java:1894)
> >> > > at
> >> > > org.apache.solr.update.processor.DistributedUpdateProcessor.
> >> > processCommit(DistributedUpdateProcessor.java:1871)
> >> > > at
> >> > > org.apache.solr.update.processor.LogUpdateProcessorFactory$
> >> >
> >LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160)
> >> > > at
> >> > >
> >org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(
> >> > UpdateRequestProcessor.java:68)
> >> > > at
> >> > > org.apache.solr.handler.RequestHandlerUtils.handleCommit(
> >> > RequestHandlerUtils.java:68)
> >> > > at
> >> > >
> >org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
> >> > ContentStreamHandlerBase.java:62)
> >> > > at
> >> > > org.apache.solr.handler.RequestHandlerBase.handleRequest(
> >> > RequestHandlerBase.java:173)
> >> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> >> > > at org.apache.solr.servlet.HttpSolrCall.execute(
> >> > HttpSolrC

Re: Lucene 6.6: "Too many open files"

2017-08-02 Thread Uwe Schindler
Hi,

It's documented in the Javadocs of ConcurrentMergeScheduler. It depends on the 
number of CPUs (with some upper bound) and if the index is on SSD. Without SSD 
it uses only one thread for merging.

Uwe

Am 2. August 2017 22:01:51 MESZ schrieb Nawab Zada Asad Iqbal 
:
>Thanks Uwe,
>
>That worked actually. After running for 3 hours, I observed about 88%
>of
>indexing rate as compared to 4.5.0 without any file descriptor issues.
>It
>seems that I can probably do some tweaking to get same throughput as
>before. I looked at the code and the default values for
>ConcurrentMergeStrategy are -1 (and the solr process intelligently
>decides
>the value). Is there a way to know what is the default being employed?
>can
>I start with maxThreadCount and  maxMergeCount = 10 ?
>
>
>Regards
>Nawab
>
>On Tue, Aug 1, 2017 at 9:35 AM, Uwe Schindler  wrote:
>
>> Hi,
>>
>> You should reset those settings back to defaults (remove the inner
>> settings in the factory). 30 merge threads will eat up all your file
>> handles. In earlier versions of Lucene, internal limitations in
>IndexWriter
>> make it unlikely, that you spawn too many threads, so 30 had no
>effect.
>>
>> In Lucene 6, the number of merges and threads are automatically
>choosen by
>> your disk type (SSD detection) and CPU count. So you should
>definitely use
>> defaults first and only ever change it for good reasons (if told you
>by
>> specialists).
>>
>> Uwe
>>
>> Am 1. August 2017 17:25:43 MESZ schrieb Nawab Zada Asad Iqbal <
>> khi...@gmail.com>:
>> >Thanks Jigar
>> >I haven't tweaked ConcurrentMergeStrategy between 4.5.0 and 6.6.
>Here
>> >is
>> >what I have:
>> >
>> >> >class="org.apache.lucene.index.ConcurrentMergeScheduler">
>> >  30
>> >  30
>> >
>> >
>> >
>> >On Mon, Jul 31, 2017 at 8:56 PM, Jigar Shah 
>> >wrote:
>> >
>> >> I faced such problem when I used nomergepolicy, and did some code
>to
>> >manual
>> >> merging segments which had bug and I had same issue. Make sure you
>> >have
>> >> default AFAIR ConcurrentMergeStrategy enabled. And its is
>configured
>> >with
>> >> appropriate settings.
>> >>
>> >> On Jul 31, 2017 11:21 PM, "Erick Erickson"
>
>> >> wrote:
>> >>
>> >> > No, nothing's changed fundamentally. But you say:
>> >> >
>> >> > "We have some batch indexing scripts, which
>> >> > flood the solr servers with indexing requests (while keeping
>> >> open-searcher
>> >> > false)"
>> >> >
>> >> > What is your commit interval? Regardless of whether openSearcher
>is
>> >false
>> >> > or not, background merging continues apace with every commit. By
>> >any
>> >> chance
>> >> > did you change your merge policy (or not copy the one from 4x to
>> >6x)?
>> >> Shot
>> >> > in the dark...
>> >> >
>> >> > Best,
>> >> > Erick
>> >> >
>> >> > On Mon, Jul 31, 2017 at 7:15 PM, Nawab Zada Asad Iqbal
>> >> >> >
>> >> > wrote:
>> >> > > Hi,
>> >> > >
>> >> > > I am upgrading from solr4.5 to solr6.6 and hitting this issue
>> >during
>> >> > > complete reindexing scenario.  We have some batch indexing
>> >scripts,
>> >> which
>> >> > > flood the solr servers with indexing requests (while keeping
>> >> > open-searcher
>> >> > > false) for many hours and then perform one commit. This used
>to
>> >work
>> >> fine
>> >> > > with 4.5, but with 6.6, i get 'Too many open files' within a
>> >couple of
>> >> > > minutes. I have checked that "ulimit" is same between old and
>new
>> >> > servers.
>> >> > >
>> >> > > Has something fundamentally changed in recent lucene versions,
>> >which
>> >> > keeps
>> >> > > file descriptors around for a longer time?
>> >> > >
>> >> > >
>> >> > > Here is a sample error message:
>> >> > > at org.apache.lucene.index.IndexWriter.ensureOpen(
>> >> > IndexWriter.java:749)
>> >> > > at org.apache.lucene.index.IndexWriter.ensureOpen(
>> >> > IndexWriter.java:763)
>> >> > > at org.apache.lucene.index.IndexWriter.commit(
>> >> IndexWriter.java:3206)
>> >> > > at
>> >> > > org.apache.solr.update.DirectUpdateHandler2.commit(
>> >> > DirectUpdateHandler2.java:644)
>> >> > > at
>> >> > >
>> >org.apache.solr.update.processor.RunUpdateProcessor.processCommit(
>> >> > RunUpdateProcessorFactory.java:93)
>> >> > > at
>> >> > >
>>
>>org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(
>> >> > UpdateRequestProcessor.java:68)
>> >> > > at
>> >> > > org.apache.solr.update.processor.DistributedUpdateProcessor.
>> >> > doLocalCommit(DistributedUpdateProcessor.java:1894)
>> >> > > at
>> >> > > org.apache.solr.update.processor.DistributedUpdateProcessor.
>> >> > processCommit(DistributedUpdateProcessor.java:1871)
>> >> > > at
>> >> > > org.apache.solr.update.processor.LogUpdateProcessorFactory$
>> >> >
>> >LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160)
>> >> > > at
>> >> > >
>>
>>org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(
>> >> > UpdateRequestProcessor.java:68)
>> >> > > at
>> >> > > org.apache.solr.handler.RequestHandlerUtils.handleCommit(
>> >> > Reques

Re: Migration to Lucene 6.x

2017-08-02 Thread krish mohan
Hi
In Lucene 3.x, for search word with special characters, phrase query is
formed.

For eg:
 For input *google-chrome-stable*, query formed as "google chrome
stable". But in Lucene 6.x, I can't achieve this. Is there any way to
achieve it?

On Mon, Jul 31, 2017 at 2:53 PM, krish mohan 
wrote:

> Correction in link. QueryParser(Version matchVersion, String f, Analyzer
> a)
> 
>
> On Mon, Jul 31, 2017 at 2:50 PM, krish mohan 
> wrote:
>
>> Hi
>>I'm using Lucene 4.10.4. QueryParser in LUCENE_30 forms phrase query
>> for input with special characers ($,/,-,...)
>>
>>For eg:
>>  For input *google-chrome-stable*, query formed as "google
>> chrome stable".
>>
>>  Using QueryParser(Version matchVersion, String f, Analyzer a)
>> 
>>  ,
>> I pass the version as  LUCENE_30 and achieved this behavior.
>>
>> But in Lucene 6.x this constructor is removed. Is there any way
>> to achieve this in Lucene 6.x ?
>>
>
>