date:20140319

Re: w/10 ? [was: Partial Counts in SOLR]

2014-03-19 Thread Salman Akram

Yup!


On Thu, Mar 20, 2014 at 5:13 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> Guessing it's surround query parser's support for "within" backed by span
> queries.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Mar 19, 2014 4:44 PM, "T. Kuro Kurosaka"  wrote:
>
> > In the thread "Partial Counts in SOLR", Salman gave us this sample query:
> >
> >  ((stock or share*) w/10 (sale or sell* or sold or bought or buy* or
> >> purchase* or repurchase*)) w/10 (executive or director)
> >>
> >
> > I'm not familiar with this w/10 notation. What does this mean,
> > and what parser(s) supports this syntax?
> >
> > Kuro
> >
> >
>



-- 
Regards,

Salman Akram

Re: Excessive Heap Usage from docValues?

2014-03-19 Thread Toke Eskildsen

On Wed, 2014-03-19 at 22:01 +0100, tradergene wrote:
> I have a Solr index with about 32 million docs.  Each doc is relatively
> small but has multiple dynamic fields that are storing INTs.  The initial
> problem that I had to resolve is that we were running into OOMs (on a 48GB
> heap, 130GB on-disk index).  I narrowed that issue down to Lucene FieldCache
> filling up the heap due to all the dynamic fields.

48GB heap for a 130GB, 32M docs index sounds excessive.  Could you tell
us how many unique fields your searcher uses in total for faceting and
maybe the overall layout of your index? Is this perhaps a case of many
distinct groups of data put in the same index, where the searches are
always within a single group and each group has its own fields for
faceting? Are the fields single- or multi-valued?

- Toke Eskildsen, State and University Library, Denmark

Solr4.7 No live SolrServers available to handle this request

2014-03-19 Thread Sathya

Hi Friends,

I am new to Solr. I have 5 solr node in 5 different machine. When i index
the data, sometimes "*No live SolrServers available to handle this request*"
exception occur in 1 or 2 machines. 

I dont know why its happen and how to solve this. Kindly help me to solve
this issue.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-7-No-live-SolrServers-available-to-handle-this-request-tp4125679.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: searche for single char number when ngram min is 3

2014-03-19 Thread Alexandre Rafalovitch

Does NGram factory support keyword token-type protection? If so, it
could be just a matter of marking a number as keyword.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Mar 19, 2014 at 11:02 PM, Jack Krupansky
 wrote:
> Interesting point. I think it would be nice to have an option to treat
> numeric sequences (or maybe with commas and decimal point as well) as
> integral tokens that won't be split by ngramming. It's worth a Jira.
>
> OTOH, you have to make a value judgment whether a query for "3.14" should
> only exact match "3.14" or also ngram match "3.14159", etc.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Andreas Owen
> Sent: Wednesday, March 19, 2014 11:44 AM
> To: solr-user@lucene.apache.org
> Subject: searche for single char number when ngram min is 3
>
>
> Is there a way to tell ngramfilterfactory while indexing that number shall
> never be tokenized? then the query should be able to find numbers.
> Or do i have to change the ngram min for numbers to 1, if that is possible?
> So to speak put the hole number as token and not all possible tokens.
> Or can i tell the query to search numbers differently woth WT, LCF or
> whatever?
>
> I attached a doc with screenshots from solr analyzer
>
>
> -Original Message-
> From: Andreas Owen [mailto:a...@conx.ch]
> Sent: Donnerstag, 13. März 2014 13:44
> To: solr-user@lucene.apache.org
> Subject: RE: Re[2]: NOT SOLVED searches for single char tokens instead of
> from 3 uppwards
>
> I have gotten nearly everything to work. There are to queries where i dont
> get back what i want.
>
> "avaloq frage 1" -> only returns if i set minGramSize=1 while indexing
> "yh_cug" -> query parser doesn't remove "_" but the indexer does (WDF) so
> there is no match
>
> Is there a way to also query the hole term "avaloq frage 1" without
> tokenizing it?
>
> Fieldtype:
>
> 
>  
>
>
> 
>  words="lang/stopwords_de.txt" format="snowball"
> enablePositionIncrements="true"/> 
>   
>  
> 
>  generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
>   
>   
> 
> 
> 
>  words="lang/stopwords_de.txt" format="snowball"
> enablePositionIncrements="true"/> 
> 
> 
>  
> 
>
>
> -Original Message-
> From: Andreas Owen [mailto:a...@conx.ch]
> Sent: Mittwoch, 12. März 2014 18:39
> To: solr-user@lucene.apache.org
> Subject: RE: Re[2]: NOT SOLVED searches for single char tokens instead of
> from 3 uppwards
>
> Hi Jack,
>
> do you know how i can use local parameters in my solrconfig? The params are
> visible in the debugquery-output but solr doesn't parse them.
>
> 
> {!q.op=OR} (*:* -organisations:["" TO *] -roles:["" TO *])
> (+organisations:($org) +roles:($r)) (-organisations:["" TO *] +roles:($r))
> (+organisations:($org) -roles:["" TO *]) 
>
>
> -Original Message-
> From: Andreas Owen [mailto:a...@conx.ch]
> Sent: Mittwoch, 12. März 2014 14:44
> To: solr-user@lucene.apache.org
> Subject: Re[2]: NOT SOLVED searches for single char tokens instead of from 3
> uppwards
>
> yes that is exactly what happend in the analyzer. the term i searched for
> was listed on both sides (index & query).
>
> here's the rest:
>
> 
>
>
>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>
> -Original-Nachricht-
>>
>> Von: "Jack Krupansky" 
>> An: solr-user@lucene.apache.org
>> Datum: 12/03/2014 13:25
>> Betreff: Re: NOT SOLVED searches for single char tokens instead of
>> from 3 uppwards
>>
>> You didn't show the new index analyzer - it's tricky to assure that
>> index and query are compatible, but the Admin UI Analysis page can help.
>>
>> Generally, using pure defaults for WDF is not what you want,
>> especially for query time. Usually there needs to be a slight
>> asymmetry between index and query for WDF - index generates more terms
>> than query.
>>
>> -- Jack Krupansky
>>
>> -Original Message-
>> From: Andreas Owen
>> Sent: Wednesday, March 12, 2014 6:20 AM
>> To: solr-user@lucene.apache.org
>> Subject: RE: NOT SOLVED searches for single char tokens instead of
>> from 3 uppwards
>>
>> I now have the following:
>>
>> 
>> 
>> > types="at-under-alpha.txt"/> > class="solr.LowerCaseFilterFactory"/>
>> > words="lang/stopwords_de.txt" format="snowball"
>> enablePositionIncrements="true"/>  > class="solr.GermanNormalizationFilterFactory"/>
>> 
>>   
>>
>> The gui analysis shows me that wdf doesn't cut the underscore anymore
>> but it still returns 0 result

Re: Excessive Heap Usage from docValues?

2014-03-19 Thread Otis Gospodnetic

Hi,

Which type of doc values? See Wiki or reference guide for a list of types.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Mar 19, 2014 5:02 PM, "tradergene"  wrote:

> Hello All,
>
> I'm hoping to get your assistance in debugging what seems like a memory
> issue.
>
> I have a Solr index with about 32 million docs.  Each doc is relatively
> small but has multiple dynamic fields that are storing INTs.  The initial
> problem that I had to resolve is that we were running into OOMs (on a 48GB
> heap, 130GB on-disk index).  I narrowed that issue down to Lucene
> FieldCache
> filling up the heap due to all the dynamic fields.  To mitigate this, I
> enabled docValues on the schema for many of the dynamicField culprits.
>  This
> dropped the FieldCache down to almost nothing.
>
> Now, when re-indexing for docValues functionality, I ran into OOMs as soon
> as I reached 12 million of the 32 million documents.  Before enabling
> docValues, I was able to load up Solr on a 48GB heap but ran into problems
> after enough unique searches occurred (normal FieldCache issue).  Now, with
> docValues, a 48GB heap is giving me OOM after 12 million docs indexed.  I
> split the collection into 10 shards and with 2 nodes (48GB heap each) was
> able to get up to 21 million docs indexed.  Now, I've had to move the
> shards
> to more nodes and am up to 10 shards across 4 nodes and am hoping to be
> able
> to get all 32 million docs indexed.  This will be 48GB x 4 heap which seems
> really excessive for an index that was only 132GB pre-docValues.
>
> I would love some thoughts as to whether I'm expecting too much efficiency
> with docValues enabled.  I was under the impression that docValues would
> increase storage requirements on disk (which it has), but l thought that
> RAM
> usage would go down during searching (which I haven't tested) as well as
> indexing.
>
> Thanks for any assistance anyone can provide.
>
> Gene
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Excessive-Heap-Usage-from-docValues-tp4125577.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: w/10 ? [was: Partial Counts in SOLR]

2014-03-19 Thread Otis Gospodnetic

Hi,

Guessing it's surround query parser's support for "within" backed by span
queries.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Mar 19, 2014 4:44 PM, "T. Kuro Kurosaka"  wrote:

> In the thread "Partial Counts in SOLR", Salman gave us this sample query:
>
>  ((stock or share*) w/10 (sale or sell* or sold or bought or buy* or
>> purchase* or repurchase*)) w/10 (executive or director)
>>
>
> I'm not familiar with this w/10 notation. What does this mean,
> and what parser(s) supports this syntax?
>
> Kuro
>
>

Re: Zookeeper exceptions - SEVERE

2014-03-19 Thread Chris W

Thanks. Temporarily got over the problem by specifying custom limits
through jute.maxbuffer=




On Tue, Mar 18, 2014 at 9:45 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Sorry guys I spoke too fast. I looked at the code again. No it doesn't
> correlate with commits at all. I was mistaken.
>
> On Wed, Mar 19, 2014 at 10:06 AM, Chris W  wrote:
> > Thanks, Shawn and Shalin
> >
> > How does the frequency of commit affect zookeeper?
> >
> >
> > Thanks
> >
> >
> > On Tue, Mar 18, 2014 at 9:12 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> >> SolrCloud will update Zookeeper on state changes (node goes to
> >> recovery, comes back up etc) or for leader election and during
> >> collection API commands. It doesn't correlate directly with indexing
> >> but is correlated with how frequently you call commit.
> >>
> >> On Wed, Mar 19, 2014 at 5:46 AM, Shawn Heisey 
> wrote:
> >> > On 3/18/2014 5:46 PM, Chris W wrote:
> >> >>
> >> >> I am running a 3 node zookeeper 3.4.5  Quorum. I am running into
> issues
> >> >> with Zookeeper transaction logs
> >> >>
> >> >>   [myid:2] - ERROR [main:QuorumPeer@453] - Unable to load database
> on
> >> disk
> >> >> java.io.IOException: Unreasonable length = 1048587
> >> >> at
> >> >>
> >>
> org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
> >> >> at
> >> >>
> org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
> >> >> at
> >> >>
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
> >> >>
> >> >> To unblock temporarily, i deleted the most recent txn log. How do i
> tell
> >> >> zookeeper to not grow the transaction log beyond x MegaBytes?
> >> >>
> >> >> How often does the transaction log get updated? Does zk transactions
> log
> >> >> grow everytime we index data into a new collection?
> >> >
> >> >
> >> > Zookeeper is a separate project at Apache.  ZK file management is
> >> discussed
> >> > in the ZK documentation.
> >> >
> >> >
> >>
> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
> >> >
> >> > There is a bug filed on Zookeeper for the issue you are seeing, with a
> >> > fairly simple workaround.  It is fixed in the 3.4.6 version, which was
> >> > released last week.  I will see whether we can get ZK upgraded to
> 3.4.6
> >> in
> >> > the Solr 4.8.0 release.  I don't think we want to risk doing that
> >> upgrade in
> >> > 4.7.1, but I could be wrong.
> >> >
> >> > https://issues.apache.org/jira/browse/ZOOKEEPER-1513
> >> > http://zookeeper.apache.org/releases.html
> >> >
> >> > I am actually not sure how often SolrCloud updates Zookeeper.  It
> happens
> >> > whenever the collections API is called for sure, and it may happen
> >> anytime
> >> > you index data as well.
> >> >
> >> > Thanks,
> >> > Shawn
> >> >
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
> >
> >
> >
> > --
> > Best
> > --
> > C
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Best
-- 
C

Re: Indexing large documents

2014-03-19 Thread Tom Burton-West

Hi Stephen,

We regularly index documents in the range of 500KB-8GB on machines that
have about 10GB devoted to Solr.  In order to avoid OOM's on Solr versions
prior to Solr 4.0, we use a separate indexing machine(s) from the search
server machine(s) and also set the termIndexInterval to 8 times that of the
default 128
1024 (See
http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again for
a description of the problem, although the solution we are using is
different, termIndexInterval rather than termInfosDivisor)

I would like to second Otis' suggestion that you consider breaking large
documents into smaller sub-documents.   We are currently not doing that and
we believe that relevance ranking is not working well at all.

 If you consider that most relevance ranking algorithms were designed,
tested, and tuned on TREC newswire-size documents (average 300 words) or
truncated web documents (average 1,000-3,000 words), it seems likely that
they may not work well with book size documents (average 100,000 words).
 Ranking algorithms that use IDF will be particularly affected.

We are currently investigating grouping and block-join options.
Unfortunately, our data does not have good mark-up or metadata to allow
splitting books by chapter.  We have investigated indexing pages of books,
but  due to many issues including performance and scalability  (We index
the full-text of 11 million books and indexing on the page level it would
result in 3.3 billion solr documents), we haven't arrived at a workable
solution for our use case.   At the moment the main bottleneck is memory
use for faceting, but we intend to experiment with docValues to see if the
increase in index size is worth the reduction in memory use.

Presently block-join indexing does not implement scoring, although we hope
that will change in the near future and the relevance ranking for grouping
will rank the group by the highest ranking member.   So if you split a book
into chapters, it would rank the book by the highest ranking chapter.
 This may be appropriate for your use case as Otis suggested.  In our use
case sometimes this is appropriate, but we are investigating the
possibility of other methods of scoring the group based on a more flexible
function of the scores of the members (i.e scoring book based on function
of scores of chapters).

Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search

On Tue, Mar 18, 2014 at 11:17 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> I think you probably want to split giant documents because you / your users
> probably want to be able to find smaller sections of those big docs that
> are best matches to their queries.  Imagine querying War and Peace.  Almost
> any regular word your query for will produce a match.  Yes, you may want to
> enable field collapsing aka grouping.  I've seen facet counts get messed up
> when grouping is turned on, but have not confirmed if this is a (known) bug
> or not.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Tue, Mar 18, 2014 at 10:52 PM, Stephen Kottmann <
> stephen_kottm...@h3biomedicine.com> wrote:
>
> > Hi Solr Users,
> >
> > I'm looking for advice on best practices when indexing large documents
> > (100's of MB or even 1 to 2 GB text files). I've been hunting around on
> > google and the mailing list, and have found some suggestions of splitting
> > the logical document up into multiple solr documents. However, I haven't
> > been able to find anything that seems like conclusive advice.
> >
> > Some background...
> >
> > We've been using solr with great success for some time on a project that
> is
> > mostly indexing very structured data - ie. mainly based on ingesting
> > through DIH.
> >
> > I've now started a new project and we're trying to make use of solr
> again -
> > however, in this project we are indexing mostly unstructured data - pdfs,
> > powerpoint, word, etc. I've not done much configuration - my solr
> instance
> > is very close to the example provided in the distribution aside from some
> > minor schema changes. Our index is relatively small at this point ( ~3k
> > documents ), and for initial indexing I am pulling documents from a http
> > data source, running them through Tika, and then pushing to solr using
> > solrj. For the most part this is working great... until I hit one of
> these
> > huge text files and then OOM on indexing.
> >
> > I've got a modest JVM - 4GB allocated. Obviously I can throw more memory
> at
> > it, but it seems like maybe there's a more robust solution that would
> scale
> > better.
> >
> > Is splitting the logical document into multiple solr documents best
> > practice here? If so, what are the considerations or pitfalls of doing
> this
> > that I should be paying attention to. I guess when querying I always need
> > to use a group by field to prevent multiple hits for the same document.
>

Re: How to return more fields on Solr 4.5.1 Suggester?

2014-03-19 Thread Ahmet Arslan



Hey Omer,

Create a copy movie_title and use edgy_text described here : 
http://searchhub.org/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

With this approach you can request whatever field you want with fl parameter.

Ahmet


On Monday, March 17, 2014 3:48 PM, Erick Erickson  
wrote:
Perhaps index the concatenation of the
two fields, something like this:

hard rain (1998)!14

Then have the app layer peel off the !14 for
displaying the title to the user. Then use the
14 however you need to.

Best,
Erick


On Mon, Mar 17, 2014 at 6:28 AM, Lajos  wrote:
> Hi Omer,
>
> That's not how its meant to work; the suggester is giving you potentially
> matching terms by looking at the set of terms for the given field across the
> index.
>
> Possibly you want to look at the MoreLikeThis component or handler? It will
> return matching documents, from which you have access to the fields you
> want.
>
> Regards,
>
> Lajos
>
>
>
> On 17/03/2014 14:05, omer sonmez wrote:
>>
>>
>> I am using Solr 4.5.1 to suggest movies for my system. What i need solr to
>> return not only the move_title but also the movie_id that belongs to the
>> movie. As an example; this is kind of what i need:
>>      
>>          0
>>          1
>>      
>>      
>>          
>>              
>>                  6
>>                  0
>>                  3
>>                  
>>                      
>>                          hard eight
>> (1996)
>>                          144
>>                      
>>                      
>>                          hard rain
>> (1998)
>>                          14
>>                      
>>                      
>>                          harlem (1993)
>>                          1044
>>                      
>>                  
>>              
>>          
>>      
>> 
>> My search component config is like :> class="solr.SpellCheckComponent">
>>      
>>          suggest
>>          > name="classname">org.apache.solr.spelling.suggest.Suggester
>>          > name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
>>          name_autocomplete
>>          true
>>      
>> 
>> My request hadler config is like:> class="org.apache.solr.handler.component.SearchHandler">
>>      
>>          true
>>          suggest
>>          10
>>      
>>      
>>          suggest
>>      
>> 
>> and my shema config is like below:> indexed="true" stored="true" multiValued="false" required="true"/>
>>     > multiValued="false" />
>>
>>     
>>     > stored="true" multiValued="false" />
>>
>> 
>> how can i manage to get other fiels using suggester in solr 4.5.1?
>> Thanks,
>>
>

Excessive Heap Usage from docValues?

2014-03-19 Thread tradergene

Hello All,

I'm hoping to get your assistance in debugging what seems like a memory
issue.

I have a Solr index with about 32 million docs.  Each doc is relatively
small but has multiple dynamic fields that are storing INTs.  The initial
problem that I had to resolve is that we were running into OOMs (on a 48GB
heap, 130GB on-disk index).  I narrowed that issue down to Lucene FieldCache
filling up the heap due to all the dynamic fields.  To mitigate this, I
enabled docValues on the schema for many of the dynamicField culprits.  This
dropped the FieldCache down to almost nothing.

Now, when re-indexing for docValues functionality, I ran into OOMs as soon
as I reached 12 million of the 32 million documents.  Before enabling
docValues, I was able to load up Solr on a 48GB heap but ran into problems
after enough unique searches occurred (normal FieldCache issue).  Now, with
docValues, a 48GB heap is giving me OOM after 12 million docs indexed.  I
split the collection into 10 shards and with 2 nodes (48GB heap each) was
able to get up to 21 million docs indexed.  Now, I've had to move the shards
to more nodes and am up to 10 shards across 4 nodes and am hoping to be able
to get all 32 million docs indexed.  This will be 48GB x 4 heap which seems
really excessive for an index that was only 132GB pre-docValues.

I would love some thoughts as to whether I'm expecting too much efficiency
with docValues enabled.  I was under the impression that docValues would
increase storage requirements on disk (which it has), but l thought that RAM
usage would go down during searching (which I haven't tested) as well as
indexing.

Thanks for any assistance anyone can provide.

Gene



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Excessive-Heap-Usage-from-docValues-tp4125577.html
Sent from the Solr - User mailing list archive at Nabble.com.

w/10 ? [was: Partial Counts in SOLR]

2014-03-19 Thread T. Kuro Kurosaka


In the thread "Partial Counts in SOLR", Salman gave us this sample query:


((stock or share*) w/10 (sale or sell* or sold or bought or buy* or
purchase* or repurchase*)) w/10 (executive or director)


I'm not familiar with this w/10 notation. What does this mean,
and what parser(s) supports this syntax?

Kuro

Re: Filter in terms component

2014-03-19 Thread Ahmet Arslan

Hi,

If you just need counts may be you can make use of 
http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions

Ahmet

On Wednesday, March 19, 2014 9:49 PM, Jilani Shaik  
wrote:
Hi Ahmet,

I have gone through the facet component, as our application has 300+
million docs and it very time consuming with this component and also it
uses cache. So I have gone through the terms component where Solr is
reading index for field terms, is there any approach where I can get the
terms using the filter. So that I can restrict some of the document terms
in counts.

Basically we have set of documents where we want to show the terms count
based on those filters with set name. Instead of reading entire index.

Please let me know if you need any details to throw some more pointers

Thanks,
Jilani

On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan  wrote:

> Hi Jilani,
>
> What features of terms component are you after? If if it is just
> terms.prefix, it could be simulated with facet component with facet.prefix
> parameter. faceting component respects filter queries.
>
>
>
> On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik 
> wrote:
> Hi,
>
> I have huge index and using Solr. I need terms component with filter by a
> field. Please let me know is there anything that I can get it.
>
> Please provide me some pointers, even to develop this by going through the
> Lucene.
>
> Please suggest.
>
> Thanks,
> Jilani
>
>

Re: Filter in terms component

2014-03-19 Thread Jilani Shaik

Hi Ahmet,

I have gone through the facet component, as our application has 300+
million docs and it very time consuming with this component and also it
uses cache. So I have gone through the terms component where Solr is
reading index for field terms, is there any approach where I can get the
terms using the filter. So that I can restrict some of the document terms
in counts.

Basically we have set of documents where we want to show the terms count
based on those filters with set name. Instead of reading entire index.

Please let me know if you need any details to throw some more pointers

Thanks,
Jilani

On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan  wrote:

> Hi Jilani,
>
> What features of terms component are you after? If if it is just
> terms.prefix, it could be simulated with facet component with facet.prefix
> parameter. faceting component respects filter queries.
>
>
>
> On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik 
> wrote:
> Hi,
>
> I have huge index and using Solr. I need terms component with filter by a
> field. Please let me know is there anything that I can get it.
>
> Please provide me some pointers, even to develop this by going through the
> Lucene.
>
> Please suggest.
>
> Thanks,
> Jilani
>
>

Re: Filter in terms component

2014-03-19 Thread Ahmet Arslan

Hi Jilani,

What features of terms component are you after? If if it is just terms.prefix, 
it could be simulated with facet component with facet.prefix parameter. 
faceting component respects filter queries.



On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik  
wrote:
Hi,

I have huge index and using Solr. I need terms component with filter by a
field. Please let me know is there anything that I can get it.

Please provide me some pointers, even to develop this by going through the
Lucene.

Please suggest.

Thanks,
Jilani

Filter in terms component

2014-03-19 Thread Jilani Shaik

Hi,

I have huge index and using Solr. I need terms component with filter by a
field. Please let me know is there anything that I can get it.

Please provide me some pointers, even to develop this by going through the
Lucene.

Please suggest.

Thanks,
Jilani

Re: Partial Counts in SOLR

2014-03-19 Thread Salman Akram

This was one example. Users can even add phrase searches with
wildcards/proximity etc so can't really use stemming.

Sharding is definitely something we are already looking into.


On Wed, Mar 19, 2014 at 6:59 PM, Erick Erickson wrote:

> Yes, that'll be slow. Wildcards are, at best, interesting and at worst
> resource consumptive. Especially when you're doing this kind of
> positioning information as well.
>
> Consider looking at the problem sideways. That is, what is your
> purpose in searching for, say, "buy*"? You want to find buy, buying,
> buyers, etc? Would you get bette results if you just stemmed and
> omitted the wildcards?
>
> Do you have a restricted vocabulary that would allow you to define
> synonyms for the "important" words and all their variants at index
> time and use that?
>
> Finally, of course, you could shard your index (or add more shards if
> you're already sharding) if you really _must_ support these kinds of
> queries and can't work around the problem.
>
> Best,
> Erick
>
> On Tue, Mar 18, 2014 at 11:21 PM, Salman Akram
>  wrote:
> > Anyone?
> >
> >
> > On Mon, Mar 17, 2014 at 12:03 PM, Salman Akram <
> > salman.ak...@northbaysolutions.net> wrote:
> >
> >> Below is one of the sample slow query that takes mins!
> >>
> >> ((stock or share*) w/10 (sale or sell* or sold or bought or buy* or
> >> purchase* or repurchase*)) w/10 (executive or director)
> >>
> >> If a filter is used it comes in fq but what can be done about plain
> >> keyword search?
> >>
> >>
> >> On Sun, Mar 16, 2014 at 4:37 AM, Erick Erickson <
> erickerick...@gmail.com>wrote:
> >>
> >>> What are our complex queries? You
> >>> say that your app will very rarely see the
> >>> same query thus you aren't using caches...
> >>> But, if you can move some of your
> >>> clauses to fq clauses, then the filterCache
> >>> might well be used to good effect.
> >>>
> >>>
> >>>
> >>> On Thu, Mar 13, 2014 at 7:22 AM, Salman Akram
> >>>  wrote:
> >>> > 1- SOLR 4.6
> >>> > 2- We do but right now I am talking about plain keyword queries just
> >>> sorted
> >>> > by date. Once this is better will start looking into caches which we
> >>> > already changed a little.
> >>> > 3- As I said the contents are not stored in this index. Some other
> >>> metadata
> >>> > fields are but with normal queries its super fast so I guess even if
> I
> >>> > change there it will be a minor difference. We have SSD and quite
> fast
> >>> too.
> >>> > 4- That's something we need to do but even in low workload those
> queries
> >>> > take a lot of time
> >>> > 5- Every 10 mins and currently no auto warming as user queries are
> >>> rarely
> >>> > same and also once its fully warmed those queries are still slow.
> >>> > 6- Nops.
> >>> >
> >>> > On Thu, Mar 13, 2014 at 5:38 PM, Dmitry Kan 
> >>> wrote:
> >>> >
> >>> >> 1. What is your solr version? In 4.x family the proximity searches
> have
> >>> >> been optimized among other query types.
> >>> >> 2. Do you use the filter queries? What is the situation with the
> cache
> >>> >> utilization ratios? Optimize (= i.e. bump up the respective cache
> >>> sizes) if
> >>> >> you have low hitratios and many evictions.
> >>> >> 3. Can you avoid storing some fields and only index them? When the
> >>> field is
> >>> >> stored and it is retrieved in the result, there are couple of disk
> >>> seeks
> >>> >> per field=> search slows down. Consider SSD disks.
> >>> >> 4. Do you monitor your system in terms of RAM / cache stats / GC? Do
> >>> you
> >>> >> observe STW GC pauses?
> >>> >> 5. How often do you commit & do you have the autowarming / external
> >>> warming
> >>> >> configured?
> >>> >> 6. If you use faceting, consider storing DocValues for facet fields.
> >>> >>
> >>> >> some solr wiki docs:
> >>> >>
> >>> >>
> >>>
> https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram <
> >>> >> salman.ak...@northbaysolutions.net> wrote:
> >>> >>
> >>> >> > Well some of the searches take minutes.
> >>> >> >
> >>> >> > Below are some stats about this particular index that I am talking
> >>> about:
> >>> >> >
> >>> >> > Index size = 400GB (Using CommonGrams so without that the index is
> >>> around
> >>> >> > 180GB)
> >>> >> > Position File = 280GB
> >>> >> > Total Docs = 170 million (just indexed for searching - for
> >>> highlighting
> >>> >> > contents are stored in another index)
> >>> >> > Avg Doc Size = Few hundred KBs
> >>> >> > RAM = 384GB (it has other indexes too but still OS cache can have
> >>> 60-80%
> >>> >> of
> >>> >> > the total index cached)
> >>> >> >
> >>> >> > Phrase queries run pretty fast with CG but complex versions of
> >>> wildcard
> >>> >> and
> >>> >> > proximity queries can be really slow. I know using CG will make
> them
> >>> slow
> >>> >> > but they just take too long. By default sorting is on date but
> users
> >>> have
> >>> >> > few other parameters too on w

underscore in query error

2014-03-19 Thread Andreas Owen

If I use the underscore in the query I don't get any results. If I remove
the underscore it finds the docs with underscore.

Can I tell solr  to search through the ngtf instead of the wdf or is there
any better solution?

 

Query: yh_cug

 

I attached a doc with the analyzer output

Re: Newbie Question: Master Index or 100s Small Index

2014-03-19 Thread Shawn Heisey


On 3/19/2014 4:55 AM, Colin R wrote:

My question is an architecture one.

These photos are currently indexed and searched in three ways.

1: The 14M pictures from above are split into a few hundred indexes that
feed a single website. This means index sizes of between 100 and 500,000
entries each.

2: 95% of these same photos are also wanted for searching on a global site.
Index size of 12M plus.

3: 80% of these same photos are also required for smaller group sites. Index
sizes of between 400K and 4M.

We currently make changes the single indexes and then merge into groups and
global. Due to the size of the numbers, is it worth changing or not.

Is it quicker/better to just have one big 14M index and filter the
complexities for each website or is it better to still maintain hundreds of
indexes so we are searching smaller one. Bear in mind, we get thousands of
changes a day PLUS very busy search servers.


My primary use for Solr is an archive of 92 million documents, most of 
which are photos.  We have thousands of new photos every day.  I haven't 
been cleared to mention what company it's for.


This screenshot of my status servlet page answers tons of questions 
about my index, but if you have additional questions, ask:


https://www.dropbox.com/s/6p1puq1gq3j8nln/solr-status-servlet.png

Here are some details about each host that you cannot see in the 
screenshot: 6 SATA disks in RAID10 with 3TB of usable space.  64GB of 
RAM.  Dual quad-core Intel E54xx series CPUs.Chain A is running Solr 
4.2.1 on Java 6, chain B is running Solr 4.6.1 on Java 7, with some 
additional plugin software that increases the index size.  There is one 
Solr process per host, with a 6GB heap.


As long as you index fields that can be used to filter searches 
according to what a user is allowed to see, I don't see any problem with 
putting all of your data into one index.The main thing you'll want to be 
sure of is that you have enough RAM to effectively cache your index.  
Because you have SSD, you probably don't need to have enough RAM to 
cache ALL of the index data, but it wouldn't hurt.  With 36GB of RAM per 
machine, you will probably have enough.


Thanks,
Shawn

Re: underscore in query error

2014-03-19 Thread Erick Erickson

Attachments don't come through the user
list very well, you might have to put
it up on pastebin or some such and provide
a link.

But my guess is that your analysis chain is
doing something interesting you don't expect,
the analyzer output you tried to paste would
help here.

Also, if you could provide the fieldType
definition you're using, and the results of adding
&debug=query to your URL that would
help too.

Best,
Erick

On Wed, Mar 19, 2014 at 9:18 AM, Andreas Owen  wrote:
> If I use the underscore in the query I don't get any results. If I remove
> the underscore it finds the docs with underscore.
>
> Can I tell solr  to search through the ngtf instead of the wdf or is there
> any better solution?
>
>
>
> Query: yh_cug
>
>
>
> I attached a doc with the analyzer output

Re: join and filter query with AND

2014-03-19 Thread Erick Erickson

It looks to me like you're feeding this from some
kind of text file and you really _do_ have a
line break after "Stara

Or have a line break in the string you paste into the URL
or something similar.

Kind of shooting in the dark though.

Erick

On Wed, Mar 19, 2014 at 8:48 AM, Marcin Rzewucki  wrote:
> Hi,
>
> I have the following issue with join query parser and filter query. For
> such query:
>
> *:*
> 
> (({!join from=inner_id to=outer_id fromIndex=othercore}city:"Stara
> Zagora")) AND (prod:214)
> 
>
> I got error:
> 
> 
> org.apache.solr.search.SyntaxError: Cannot parse 'city:"Stara': Lexical
> error at line 1, column 12. Encountered:  after : "\"Stara"
> 
> 400
> 
>
> Stack:
> DEBUG - 2014-03-19 13:35:20.825; org.eclipse.jetty.servlet.ServletHandler;
> chain=SolrRequestFilter->default
> DEBUG - 2014-03-19 13:35:20.826;
> org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
> SolrRequestFilter
> ERROR - 2014-03-19 13:35:20.828; org.apache.solr.common.SolrException;
> org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
> Cannot parse 'city:"Stara': Lexical error at line 1, column 12.  E
> ncountered:  after : "\"Stara"
> at
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:179)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:364)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.solr.search.SyntaxError: Cannot parse 'city:"Stara':
> Lexical error at line 1, column 12.  Encountered:  after : "\"Stara"
> at
> org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:159)
> at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
> at org.apache.solr.search.QParser.getQuery(QParser.java:141)
> at
> org.apache.solr.search.JoinQParserPlugin$1.parse(JoinQParserPlugin.java:93)
> at org.apache.solr.search.QParser.getQuery(QParser.java:141)
> at
> org.apache.solr.parser.SolrQueryParserBase.getLocalParams(SolrQueryParserBase.java:832)
> at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:212)
> at org.apache.solr

Re: Best SSD block size for large SOLR indexes

2014-03-19 Thread Shawn Heisey


On 3/19/2014 12:09 AM, Salman Akram wrote:

Thanks for the info. The articles were really useful but still seems I have
to do my own testing to find the right page size? I thought for large
indexes there would already be some tests done in SOLR community.

Side note: We are heavily using Microsoft technology (.NET etc) for
development so by looking at all the pros/cons decided to stick with
Windows. Wasn't rude ;)


Assuming you are only going to be putting Solr data on it, or anything 
else you put on it will also consist of large files, I would probably go 
with a cluster size at least 64KB for an NTFS volume, and I might 
consider 128KB or 256KB.  There *ARE* a few small files in a Solr index, 
but not enough of them for the wasted space to become a problem.


The easiest way to configure Solr to use a different location than the 
program directory is to change the solr home.


Thanks,
Shawn

Re: searche for single char number when ngram min is 3

2014-03-19 Thread Jack Krupansky

Interesting point. I think it would be nice to have an option to treat 
numeric sequences (or maybe with commas and decimal point as well) as 
integral tokens that won't be split by ngramming. It's worth a Jira.


OTOH, you have to make a value judgment whether a query for "3.14" should 
only exact match "3.14" or also ngram match "3.14159", etc.


-- Jack Krupansky

-Original Message- 
From: Andreas Owen

Sent: Wednesday, March 19, 2014 11:44 AM
To: solr-user@lucene.apache.org
Subject: searche for single char number when ngram min is 3

Is there a way to tell ngramfilterfactory while indexing that number shall 
never be tokenized? then the query should be able to find numbers.
Or do i have to change the ngram min for numbers to 1, if that is possible? 
So to speak put the hole number as token and not all possible tokens.
Or can i tell the query to search numbers differently woth WT, LCF or 
whatever?


I attached a doc with screenshots from solr analyzer


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch]
Sent: Donnerstag, 13. März 2014 13:44
To: solr-user@lucene.apache.org
Subject: RE: Re[2]: NOT SOLVED searches for single char tokens instead of 
from 3 uppwards


I have gotten nearly everything to work. There are to queries where i dont 
get back what i want.


"avaloq frage 1" -> only returns if i set minGramSize=1 while indexing
"yh_cug" -> query parser doesn't remove "_" but the indexer does (WDF) so 
there is no match


Is there a way to also query the hole term "avaloq frage 1" without 
tokenizing it?


Fieldtype:


 
   
   

words="lang/stopwords_de.txt" format="snowball" 
enablePositionIncrements="true"/> 

  
 


generateNumberParts="1" catenateWords="1" catenateNumbers="1" 
catenateAll="0" splitOnCaseChange="1"/>

  
  



words="lang/stopwords_de.txt" format="snowball" 
enablePositionIncrements="true"/> 



 



-Original Message-
From: Andreas Owen [mailto:a...@conx.ch]
Sent: Mittwoch, 12. März 2014 18:39
To: solr-user@lucene.apache.org
Subject: RE: Re[2]: NOT SOLVED searches for single char tokens instead of 
from 3 uppwards


Hi Jack,

do you know how i can use local parameters in my solrconfig? The params are 
visible in the debugquery-output but solr doesn't parse them.



{!q.op=OR} (*:* -organisations:["" TO *] -roles:["" TO *]) 
(+organisations:($org) +roles:($r)) (-organisations:["" TO *] +roles:($r)) 
(+organisations:($org) -roles:["" TO *]) 



-Original Message-
From: Andreas Owen [mailto:a...@conx.ch]
Sent: Mittwoch, 12. März 2014 14:44
To: solr-user@lucene.apache.org
Subject: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 
uppwards


yes that is exactly what happend in the analyzer. the term i searched for 
was listed on both sides (index & query).


here's the rest:


   
   
   
   
   generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

   
   protected="protwords.txt"/>

   
 

-Original-Nachricht- 

Von: "Jack Krupansky" 
An: solr-user@lucene.apache.org
Datum: 12/03/2014 13:25
Betreff: Re: NOT SOLVED searches for single char tokens instead of
from 3 uppwards

You didn't show the new index analyzer - it's tricky to assure that
index and query are compatible, but the Admin UI Analysis page can help.

Generally, using pure defaults for WDF is not what you want,
especially for query time. Usually there needs to be a slight
asymmetry between index and query for WDF - index generates more terms 
than query.


-- Jack Krupansky

-Original Message-
From: Andreas Owen
Sent: Wednesday, March 12, 2014 6:20 AM
To: solr-user@lucene.apache.org
Subject: RE: NOT SOLVED searches for single char tokens instead of
from 3 uppwards

I now have the following:



 
  

  

The gui analysis shows me that wdf doesn't cut the underscore anymore
but it still returns 0 results?

Output:


  yh_cug
  yh_cug
  (+DisjunctionMaxQuery((tags:yh_cug^10.0 |
links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 |
url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 |
breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0
|
editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0))
((expiration:[1394619501862 TO *]
(+MatchAllDocsQuery(*:*) -expiration:*))^6.0)
FunctionQuery((div(int(clicks),max(int(displays),const(1^8.0))/no_
coord
  +(tags:yh_cug^10.0 |
links:yh_cug^5.0 |
thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 |
h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 |
contentmanager:yh_cug^5.0 | title:yh_cug^20.0 |
editorschoice:yh_cug^200.0 |
doctype:yh_cug^10.0) ((expiration:[1394619501862 TO *]
(+*:* -expiration:*))^6.0)
(div(int(clicks),max(int(displays),const(1^8.0
  
  
yh_cug
  
  
DidntFindAnySynonyms
No synonyms found for this query.  Check
your synonyms file.
  
  
ExtendedDismaxQParser


  (expiration:[NOW T

Re: More heap usage in Solr during indexing

2014-03-19 Thread solr2020

We are doing Autocommit for every five minutes.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/More-heap-usage-in-Solr-during-indexing-tp4124898p4125497.html
Sent from the Solr - User mailing list archive at Nabble.com.

searche for single char number when ngram min is 3

2014-03-19 Thread Andreas Owen

Is there a way to tell ngramfilterfactory while indexing that number shall 
never be tokenized? then the query should be able to find numbers.
Or do i have to change the ngram min for numbers to 1, if that is possible? So 
to speak put the hole number as token and not all possible tokens.
Or can i tell the query to search numbers differently woth WT, LCF or whatever?

I attached a doc with screenshots from solr analyzer


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Donnerstag, 13. März 2014 13:44
To: solr-user@lucene.apache.org
Subject: RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 
3 uppwards

I have gotten nearly everything to work. There are to queries where i dont get 
back what i want.

"avaloq frage 1"-> only returns if i set minGramSize=1 while 
indexing
"yh_cug"-> query parser doesn't remove "_" but the 
indexer does (WDF) so there is no match

Is there a way to also query the hole term "avaloq frage 1" without tokenizing 
it?

Fieldtype:


   


 
 
 
  


   
   


 
 


  
 


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch]
Sent: Mittwoch, 12. März 2014 18:39
To: solr-user@lucene.apache.org
Subject: RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 
3 uppwards

Hi Jack,

do you know how i can use local parameters in my solrconfig? The params are 
visible in the debugquery-output but solr doesn't parse them.


{!q.op=OR} (*:* -organisations:["" TO *] -roles:["" TO 
*]) (+organisations:($org) +roles:($r)) (-organisations:["" TO *] +roles:($r)) 
(+organisations:($org) -roles:["" TO *]) 


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch]
Sent: Mittwoch, 12. März 2014 14:44
To: solr-user@lucene.apache.org
Subject: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 
uppwards

yes that is exactly what happend in the analyzer. the term i searched for was 
listed on both sides (index & query).

here's the rest:










  

-Original-Nachricht- 
> Von: "Jack Krupansky" 
> An: solr-user@lucene.apache.org
> Datum: 12/03/2014 13:25
> Betreff: Re: NOT SOLVED searches for single char tokens instead of 
> from 3 uppwards
> 
> You didn't show the new index analyzer - it's tricky to assure that 
> index and query are compatible, but the Admin UI Analysis page can help.
> 
> Generally, using pure defaults for WDF is not what you want, 
> especially for query time. Usually there needs to be a slight 
> asymmetry between index and query for WDF - index generates more terms than 
> query.
> 
> -- Jack Krupansky
> 
> -Original Message-
> From: Andreas Owen
> Sent: Wednesday, March 12, 2014 6:20 AM
> To: solr-user@lucene.apache.org
> Subject: RE: NOT SOLVED searches for single char tokens instead of 
> from 3 uppwards
> 
> I now have the following:
> 
> 
> 
>  types="at-under-alpha.txt"/>  class="solr.LowerCaseFilterFactory"/>
>  words="lang/stopwords_de.txt" format="snowball" 
> enablePositionIncrements="true"/>   class="solr.GermanNormalizationFilterFactory"/>
> 
>   
> 
> The gui analysis shows me that wdf doesn't cut the underscore anymore 
> but it still returns 0 results?
> 
> Output:
> 
> 
>   yh_cug
>   yh_cug
>   (+DisjunctionMaxQuery((tags:yh_cug^10.0 |
> links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 |
> url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 |
> breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0
> |
> editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0))
> ((expiration:[1394619501862 TO *]
> (+MatchAllDocsQuery(*:*) -expiration:*))^6.0) 
> FunctionQuery((div(int(clicks),max(int(displays),const(1^8.0))/no_
> coord
>   +(tags:yh_cug^10.0 |
> links:yh_cug^5.0 |
> thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 |
> h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 |
> contentmanager:yh_cug^5.0 | title:yh_cug^20.0 |
> editorschoice:yh_cug^200.0 |
> doctype:yh_cug^10.0) ((expiration:[1394619501862 TO *]
> (+*:* -expiration:*))^6.0)
> (div(int(clicks),max(int(displays),const(1^8.0
>   
>   
> yh_cug
>   
>   
> DidntFindAnySynonyms
> No synonyms found for this query.  Check 
> your synonyms file.
>   
>   
> ExtendedDismaxQParser
> 
> 
>   (expiration:[NOW TO *] OR (*:* -expiration:*))^6
> 
> 
>   (expiration:[1394619501862 TO *]
> (+MatchAllDocsQuery(*:*) -expiration:*))^6.0
> 
> 
>   div(clicks,max(displays,1))^8
> 
>   
>   
> ExtendedDismaxQParser
> 
> 
>   div(clicks,max(disp

join and filter query with AND

2014-03-19 Thread Marcin Rzewucki

Hi,

I have the following issue with join query parser and filter query. For
such query:

*:*

(({!join from=inner_id to=outer_id fromIndex=othercore}city:"Stara
Zagora")) AND (prod:214)


I got error:


org.apache.solr.search.SyntaxError: Cannot parse 'city:"Stara': Lexical
error at line 1, column 12. Encountered:  after : "\"Stara"

400


Stack:
DEBUG - 2014-03-19 13:35:20.825; org.eclipse.jetty.servlet.ServletHandler;
chain=SolrRequestFilter->default
DEBUG - 2014-03-19 13:35:20.826;
org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
SolrRequestFilter
ERROR - 2014-03-19 13:35:20.828; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
Cannot parse 'city:"Stara': Lexical error at line 1, column 12.  E
ncountered:  after : "\"Stara"
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:179)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:364)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.solr.search.SyntaxError: Cannot parse 'city:"Stara':
Lexical error at line 1, column 12.  Encountered:  after : "\"Stara"
at
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:159)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
at org.apache.solr.search.QParser.getQuery(QParser.java:141)
at
org.apache.solr.search.JoinQParserPlugin$1.parse(JoinQParserPlugin.java:93)
at org.apache.solr.search.QParser.getQuery(QParser.java:141)
at
org.apache.solr.parser.SolrQueryParserBase.getLocalParams(SolrQueryParserBase.java:832)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:212)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:107)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:189)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:139)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:189)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:107)
at
org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:96)
at
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBas

Re: Sort by exact match

2014-03-19 Thread Erick Erickson

Sorting applies to the entire result set,
there's no notion of "sort some docs one
way and sort others another way". So
I don't know any OOB way to do what
you want.

I don't know what your response time
requirements are, but you could do this
by firing off two queries and collating
the results. If the performance is
acceptable, it would be faster to code
than writing a custom plugin, which would
probably do much the same thing anyway.

The second query wouldn't have to fire
if the first one returned a page full of results...

FWIW,
Erick

On Wed, Mar 19, 2014 at 6:05 AM, Rok Rejc  wrote:
> Hi all,
>
> I have a field in the index - lets call it Name. Name can have one or more
> words. I want to query all documents which match by name (full or partial
> match), and order the results:
> - first display result(s) with exact matches
> - after that display results with partial matched and order them
> alphabeticaly
>
> To achive this I have created two Name fields in the index:
> - NameUnTokenized which uses KeywordTokenizer and
> - NameTokenized which uses StandardTokenizer
>
> But now I have no clue how to write a query. Is this possible with standard
> query and sort functions?  How?
>
> Other option is to write a custom plugin which will perform two queries and
> merge results (that shouldn't be a problem).
>
> Many thanks in advance.

Re: frange and field with hyphen

2014-03-19 Thread Erick Erickson

Jack's solution works, but I really, truly,
strongly recommend that you follow the
usual Java variable-naming conventions
for your fields. In fact, I tend to use
only lower case and underscores.

The reason is that you'll run into this again
and again and again. Your front-end will
forget to put the function in. You'll spend
a lot of hours chasing this down that you
could spend doing _useful_ work. The
next person to inherit this project will
fall over this as well. And on and on and
on.

There, rant ended ..

Best,
Erick

On Wed, Mar 19, 2014 at 5:36 AM, Marcin Rzewucki  wrote:
> Wow, that was fast reply :)
> It works. Thank you!
>
>
> On 19 March 2014 13:24, Jack Krupansky  wrote:
>
>> For any "improperly" named field (that don't use the java identifier
>> conventions), you simply need to use the field function with the field name
>> in apostrophes:
>>
>> div(acc_curr_834_2-1900_tl,1)
>>
>> becomes:
>>
>> div(field('acc_curr_834_2-1900_tl'),1)
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Marcin Rzewucki
>> Sent: Wednesday, March 19, 2014 8:13 AM
>> To: solr-user@lucene.apache.org
>> Subject: frange and field with hyphen
>>
>>
>> Hi everyone,
>>
>> I got the following issue recently. I'm trying to use frange on a field
>> which has hyphen in name:
>>
>> 
>> true
>> on
>> *:*
>> xml
>> 
>> 
>> {!frange l=1 u=99}sub(if(1,
>> div(acc_curr_834_2-1900_tl,
>> 1), 0), 1)
>> 
>> 
>> 2.2
>> 
>> 
>>
>> I got the following error:
>>
>> DEBUG - 2014-03-19 12:11:53.805; org.eclipse.jetty.servlet.ServletHandler;
>> chain=SolrRequestFilter->default
>> DEBUG - 2014-03-19 12:11:53.805;
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
>> SolrRequestFilter
>> ERROR - 2014-03-19 12:11:53.806; org.apache.solr.common.SolrException;
>> org.apache.solr.common.SolrException: undefined field: "acc_curr_834_2"
>>at
>> org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1172)
>>at
>> org.apache.solr.search.FunctionQParser.parseValueSource(
>> FunctionQParser.java:361)
>>at
>> org.apache.solr.search.FunctionQParser.parseValueSource(
>> FunctionQParser.java:223)
>>at
>> org.apache.solr.search.ValueSourceParser$11.parse(
>> ValueSourceParser.java:174)
>>at
>> org.apache.solr.search.FunctionQParser.parseValueSource(
>> FunctionQParser.java:352)
>>at
>> org.apache.solr.search.FunctionQParser.parseValueSource(
>> FunctionQParser.java:223)
>>at
>> org.apache.solr.search.ValueSourceParser$73.parse(
>> ValueSourceParser.java:775)
>>at
>> org.apache.solr.search.FunctionQParser.parseValueSource(
>> FunctionQParser.java:352)
>>at
>> org.apache.solr.search.FunctionQParser.parseValueSource(
>> FunctionQParser.java:223)
>>at
>> org.apache.solr.search.ValueSourceParser$18.parse(
>> ValueSourceParser.java:252)
>>at
>> org.apache.solr.search.FunctionQParser.parseValueSource(
>> FunctionQParser.java:352)
>>at
>> org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68)
>>at org.apache.solr.search.QParser.getQuery(QParser.java:141)
>>at
>> org.apache.solr.search.SolrReturnFields.add(SolrReturnFields.java:285)
>>at
>> org.apache.solr.search.SolrReturnFields.parseFieldList(
>> SolrReturnFields.java:112)
>>at
>> org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:98)
>>at
>> org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:74)
>>at
>> org.apache.solr.handler.component.QueryComponent.
>> prepare(QueryComponent.java:122)
>>at
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(
>> SearchHandler.java:200)
>>at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
>> RequestHandlerBase.java:135)
>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
>>at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(
>> SolrDispatchFilter.java:780)
>>at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> SolrDispatchFilter.java:427)
>>at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> SolrDispatchFilter.java:217)
>>at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
>> doFilter(ServletHandler.java:1419)
>>at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>>at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
>> ScopedHandler.java:137)
>>at
>> org.eclipse.jetty.security.SecurityHandler.handle(
>> SecurityHandler.java:557)
>>at
>> org.eclipse.jetty.server.session.SessionHandler.
>> doHandle(SessionHandler.java:231)
>>at
>> org.eclipse.jetty.server.handler.ContextHandler.
>> doHandle(ContextHandler.java:1075)
>>at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>>at
>> org.eclipse.jetty.server.session.SessionHandler.
>> doScope(SessionHandler.java:193)
>>at
>

Re: Newbie Question: Master Index or 100s Small Index

2014-03-19 Thread Erick Erickson

Oh My. 2(something) is ancient, I second your move
to scrap the current situation and start over. I'm
really curious what the _reason_ for such a complex
setup are/were.

I second Toke's comments. This is actually
quite small by modern Solr/Lucene standards.

Personally I would index them all to a single index,
include something like a 'source' field that allowed
one to restrict the returned documents by a filter
query (fq) clause.

Toke makes the point that you will get subtly different
search results because the tf/idf calculations are
slightly different across your entire corpus than
within various sub-sections, but I suspect that you
won't notice it. Test and see, you can change later.

One thing to look at is the new hard/soft commit
distinction, see:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

The short form is you want to define your hard
autocommit to be fairly short (maybe 1 minute?)
with openSearcher=false for durability and your
soft commit whatever latency you need for being
able to search the newly-added docs.

I don't know how you're feeding docs to Solr, but
if you're using the ExtractingRequestHandler,
you are
1> transmitting the entire document over the wire,
only to throw most of it away. I'm guessing your 1.5K
of data is just a few percent of the total file size.
2> you're putting the extraction work on the same
box running Solr.

If that machine is overloaded, consider moving the Tika
processing over to one or more clients and only
sending the data you actually want to index over to Solr,
See:
http://searchhub.org/2012/02/14/indexing-with-solrj/

Best,
Erick

On Wed, Mar 19, 2014 at 7:02 AM, Colin R  wrote:
> Hi Toke
>
> Our current configuration Lucene 2.(something) with RAILO/CFML app server.
>
> 10K drives, Quad Core, 16GB, Two servers. But the indexing and searching are
> starting to fail and our developer is no longer with us so it is quicker to
> rebuild than fix all the code.
>
> Our existing config is lots of indexes with merges into the larger ones.
>
> They are still running very fast but indexing is causing us issues.
>
> Thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Newbie-Question-Master-Index-or-100s-Small-Index-tp4125407p4125447.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Newbie Question: Master Index or 100s Small Index

2014-03-19 Thread Colin R

Hi Toke

Our current configuration Lucene 2.(something) with RAILO/CFML app server.

10K drives, Quad Core, 16GB, Two servers. But the indexing and searching are
starting to fail and our developer is no longer with us so it is quicker to
rebuild than fix all the code.

Our existing config is lots of indexes with merges into the larger ones.

They are still running very fast but indexing is causing us issues.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newbie-Question-Master-Index-or-100s-Small-Index-tp4125407p4125447.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Partial Counts in SOLR

2014-03-19 Thread Erick Erickson

Yes, that'll be slow. Wildcards are, at best, interesting and at worst
resource consumptive. Especially when you're doing this kind of
positioning information as well.

Consider looking at the problem sideways. That is, what is your
purpose in searching for, say, "buy*"? You want to find buy, buying,
buyers, etc? Would you get bette results if you just stemmed and
omitted the wildcards?

Do you have a restricted vocabulary that would allow you to define
synonyms for the "important" words and all their variants at index
time and use that?

Finally, of course, you could shard your index (or add more shards if
you're already sharding) if you really _must_ support these kinds of
queries and can't work around the problem.

Best,
Erick

On Tue, Mar 18, 2014 at 11:21 PM, Salman Akram
 wrote:
> Anyone?
>
>
> On Mon, Mar 17, 2014 at 12:03 PM, Salman Akram <
> salman.ak...@northbaysolutions.net> wrote:
>
>> Below is one of the sample slow query that takes mins!
>>
>> ((stock or share*) w/10 (sale or sell* or sold or bought or buy* or
>> purchase* or repurchase*)) w/10 (executive or director)
>>
>> If a filter is used it comes in fq but what can be done about plain
>> keyword search?
>>
>>
>> On Sun, Mar 16, 2014 at 4:37 AM, Erick Erickson 
>> wrote:
>>
>>> What are our complex queries? You
>>> say that your app will very rarely see the
>>> same query thus you aren't using caches...
>>> But, if you can move some of your
>>> clauses to fq clauses, then the filterCache
>>> might well be used to good effect.
>>>
>>>
>>>
>>> On Thu, Mar 13, 2014 at 7:22 AM, Salman Akram
>>>  wrote:
>>> > 1- SOLR 4.6
>>> > 2- We do but right now I am talking about plain keyword queries just
>>> sorted
>>> > by date. Once this is better will start looking into caches which we
>>> > already changed a little.
>>> > 3- As I said the contents are not stored in this index. Some other
>>> metadata
>>> > fields are but with normal queries its super fast so I guess even if I
>>> > change there it will be a minor difference. We have SSD and quite fast
>>> too.
>>> > 4- That's something we need to do but even in low workload those queries
>>> > take a lot of time
>>> > 5- Every 10 mins and currently no auto warming as user queries are
>>> rarely
>>> > same and also once its fully warmed those queries are still slow.
>>> > 6- Nops.
>>> >
>>> > On Thu, Mar 13, 2014 at 5:38 PM, Dmitry Kan 
>>> wrote:
>>> >
>>> >> 1. What is your solr version? In 4.x family the proximity searches have
>>> >> been optimized among other query types.
>>> >> 2. Do you use the filter queries? What is the situation with the cache
>>> >> utilization ratios? Optimize (= i.e. bump up the respective cache
>>> sizes) if
>>> >> you have low hitratios and many evictions.
>>> >> 3. Can you avoid storing some fields and only index them? When the
>>> field is
>>> >> stored and it is retrieved in the result, there are couple of disk
>>> seeks
>>> >> per field=> search slows down. Consider SSD disks.
>>> >> 4. Do you monitor your system in terms of RAM / cache stats / GC? Do
>>> you
>>> >> observe STW GC pauses?
>>> >> 5. How often do you commit & do you have the autowarming / external
>>> warming
>>> >> configured?
>>> >> 6. If you use faceting, consider storing DocValues for facet fields.
>>> >>
>>> >> some solr wiki docs:
>>> >>
>>> >>
>>> https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram <
>>> >> salman.ak...@northbaysolutions.net> wrote:
>>> >>
>>> >> > Well some of the searches take minutes.
>>> >> >
>>> >> > Below are some stats about this particular index that I am talking
>>> about:
>>> >> >
>>> >> > Index size = 400GB (Using CommonGrams so without that the index is
>>> around
>>> >> > 180GB)
>>> >> > Position File = 280GB
>>> >> > Total Docs = 170 million (just indexed for searching - for
>>> highlighting
>>> >> > contents are stored in another index)
>>> >> > Avg Doc Size = Few hundred KBs
>>> >> > RAM = 384GB (it has other indexes too but still OS cache can have
>>> 60-80%
>>> >> of
>>> >> > the total index cached)
>>> >> >
>>> >> > Phrase queries run pretty fast with CG but complex versions of
>>> wildcard
>>> >> and
>>> >> > proximity queries can be really slow. I know using CG will make them
>>> slow
>>> >> > but they just take too long. By default sorting is on date but users
>>> have
>>> >> > few other parameters too on which they can sort.
>>> >> >
>>> >> > I wanted to avoid creating multiple indexes (maybe based on years)
>>> but
>>> >> > seems that to search on partial data that's the only feasible way.
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan 
>>> >> wrote:
>>> >> >
>>> >> > > As Hoss pointed out above, different projects have different
>>> >> > requirements.
>>> >> > > Some want to sort by date of ingestion reverse, which means that
>>> having
>>> >> > > posting lists organized in a re

Re: Newbie Question: Master Index or 100s Small Index

2014-03-19 Thread Toke Eskildsen

On Wed, 2014-03-19 at 13:28 +0100, Colin R wrote:
> My question is really regarding index architecture. One big or many small
> (with merged big ones)

One difference is that having a single index/collection gives you better
ranked searches within each collection. If you only use date/filename
sorting, that is of course irrelevant.

> In terms of bytes, each photo has a up to 1.5KB of data.

So about 20GB for the full index?

> Special requirements are search by date range, text, date range and text.
> Plus some boolean filtering. All results can be sorted by date or filename.

With no faceting, grouping or similar aggregating processing,
(re)opening of an index searcher should be very fast. The only thing
that takes a moment is the initial date or filename sorting. Asking for
minute-level data updates is thus very modest. With the information you
have given, you could aim for a few seconds.

None of the things you have said gives any cause for concern about
performance and even though you have an existing system running and is
upgrading to a presumably faster one, you sound concerned. Do you
currently have performance problems, and if so, what is your current
hardware?

- Toke Eskildsen, State and University Library, Denmark

Sort by exact match

2014-03-19 Thread Rok Rejc

Hi all,

I have a field in the index - lets call it Name. Name can have one or more
words. I want to query all documents which match by name (full or partial
match), and order the results:
- first display result(s) with exact matches
- after that display results with partial matched and order them
alphabeticaly

To achive this I have created two Name fields in the index:
- NameUnTokenized which uses KeywordTokenizer and
- NameTokenized which uses StandardTokenizer

But now I have no clue how to write a query. Is this possible with standard
query and sort functions?  How?

Other option is to write a custom plugin which will perform two queries and
merge results (that shouldn't be a problem).

Many thanks in advance.

Re: frange and field with hyphen

2014-03-19 Thread Marcin Rzewucki

Wow, that was fast reply :)
It works. Thank you!


On 19 March 2014 13:24, Jack Krupansky  wrote:

> For any "improperly" named field (that don't use the java identifier
> conventions), you simply need to use the field function with the field name
> in apostrophes:
>
> div(acc_curr_834_2-1900_tl,1)
>
> becomes:
>
> div(field('acc_curr_834_2-1900_tl'),1)
>
> -- Jack Krupansky
>
> -Original Message- From: Marcin Rzewucki
> Sent: Wednesday, March 19, 2014 8:13 AM
> To: solr-user@lucene.apache.org
> Subject: frange and field with hyphen
>
>
> Hi everyone,
>
> I got the following issue recently. I'm trying to use frange on a field
> which has hyphen in name:
>
> 
> true
> on
> *:*
> xml
> 
> 
> {!frange l=1 u=99}sub(if(1,
> div(acc_curr_834_2-1900_tl,
> 1), 0), 1)
> 
> 
> 2.2
> 
> 
>
> I got the following error:
>
> DEBUG - 2014-03-19 12:11:53.805; org.eclipse.jetty.servlet.ServletHandler;
> chain=SolrRequestFilter->default
> DEBUG - 2014-03-19 12:11:53.805;
> org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
> SolrRequestFilter
> ERROR - 2014-03-19 12:11:53.806; org.apache.solr.common.SolrException;
> org.apache.solr.common.SolrException: undefined field: "acc_curr_834_2"
>at
> org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1172)
>at
> org.apache.solr.search.FunctionQParser.parseValueSource(
> FunctionQParser.java:361)
>at
> org.apache.solr.search.FunctionQParser.parseValueSource(
> FunctionQParser.java:223)
>at
> org.apache.solr.search.ValueSourceParser$11.parse(
> ValueSourceParser.java:174)
>at
> org.apache.solr.search.FunctionQParser.parseValueSource(
> FunctionQParser.java:352)
>at
> org.apache.solr.search.FunctionQParser.parseValueSource(
> FunctionQParser.java:223)
>at
> org.apache.solr.search.ValueSourceParser$73.parse(
> ValueSourceParser.java:775)
>at
> org.apache.solr.search.FunctionQParser.parseValueSource(
> FunctionQParser.java:352)
>at
> org.apache.solr.search.FunctionQParser.parseValueSource(
> FunctionQParser.java:223)
>at
> org.apache.solr.search.ValueSourceParser$18.parse(
> ValueSourceParser.java:252)
>at
> org.apache.solr.search.FunctionQParser.parseValueSource(
> FunctionQParser.java:352)
>at
> org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68)
>at org.apache.solr.search.QParser.getQuery(QParser.java:141)
>at
> org.apache.solr.search.SolrReturnFields.add(SolrReturnFields.java:285)
>at
> org.apache.solr.search.SolrReturnFields.parseFieldList(
> SolrReturnFields.java:112)
>at
> org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:98)
>at
> org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:74)
>at
> org.apache.solr.handler.component.QueryComponent.
> prepare(QueryComponent.java:122)
>at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(
> SearchHandler.java:200)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:135)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
>at
> org.apache.solr.servlet.SolrDispatchFilter.execute(
> SolrDispatchFilter.java:780)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:427)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:217)
>at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1419)
>at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:137)
>at
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:557)
>at
> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:231)
>at
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1075)
>at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>at
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:193)
>at
> org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1009)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:135)
>at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:255)
>at
> org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:154)
>at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:116)
>at org.eclipse.jetty.server.Server.handle(Server.java:364)
>at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(
> AbstractHttpConnection.java:489)
>at
> org.eclipse.jetty.server.BlockingHttpConn

Re: Newbie Question: Master Index or 100s Small Index

2014-03-19 Thread Colin R

Hi Toke

Thanks for replying.

My question is really regarding index architecture. One big or many small
(with merged big ones)

We probably get 5-10K photos added each day. Others are updated, some are
deleted.

Updates need to happen quite fast (e.g. within minutes of our Databases
receiving them).

In terms of bytes, each photo has a up to 1.5KB of data.

Special requirements are search by date range, text, date range and text.
Plus some boolean filtering. All results can be sorted by date or filename.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newbie-Question-Master-Index-or-100s-Small-Index-tp4125407p4125429.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing large documents

2014-03-19 Thread Alexei Martchenko

Even the most non-structured data has to have some breakpoint. I've seen
projects running solr that used to index whole books one document per
chapter plus a synopsis boosted doc. The question here is how you need to
search and match those docs.


alexei martchenko
Facebook  |
Linkedin|
Steam  |
4sq| Skype: alexeiramone |
Github  | (11) 9 7613.0966 |


2014-03-18 23:52 GMT-03:00 Stephen Kottmann <
stephen_kottm...@h3biomedicine.com>:

> Hi Solr Users,
>
> I'm looking for advice on best practices when indexing large documents
> (100's of MB or even 1 to 2 GB text files). I've been hunting around on
> google and the mailing list, and have found some suggestions of splitting
> the logical document up into multiple solr documents. However, I haven't
> been able to find anything that seems like conclusive advice.
>
> Some background...
>
> We've been using solr with great success for some time on a project that is
> mostly indexing very structured data - ie. mainly based on ingesting
> through DIH.
>
> I've now started a new project and we're trying to make use of solr again -
> however, in this project we are indexing mostly unstructured data - pdfs,
> powerpoint, word, etc. I've not done much configuration - my solr instance
> is very close to the example provided in the distribution aside from some
> minor schema changes. Our index is relatively small at this point ( ~3k
> documents ), and for initial indexing I am pulling documents from a http
> data source, running them through Tika, and then pushing to solr using
> solrj. For the most part this is working great... until I hit one of these
> huge text files and then OOM on indexing.
>
> I've got a modest JVM - 4GB allocated. Obviously I can throw more memory at
> it, but it seems like maybe there's a more robust solution that would scale
> better.
>
> Is splitting the logical document into multiple solr documents best
> practice here? If so, what are the considerations or pitfalls of doing this
> that I should be paying attention to. I guess when querying I always need
> to use a group by field to prevent multiple hits for the same document. Are
> there issues with term frequency, etc that you need to work around?
>
> Really interested to hear how others are dealing with this.
>
> Thanks everyone!
> Stephen
>
> --
> [This e-mail message may contain privileged, confidential and/or
> proprietary information of H3 Biomedicine. If you believe that it has been
> sent to you in error, please contact the sender immediately and delete the
> message including any attachments, without copying, using, or distributing
> any of the information contained therein. This e-mail message should not be
> interpreted to include a digital or electronic signature that can be used
> to authenticate an agreement, contract or other legal document, nor to
> reflect an intention to be bound to any legally-binding agreement or
> contract.]
>

Re: frange and field with hyphen

2014-03-19 Thread Jack Krupansky

For any "improperly" named field (that don't use the java identifier 
conventions), you simply need to use the field function with the field name 
in apostrophes:


div(acc_curr_834_2-1900_tl,1)

becomes:

div(field('acc_curr_834_2-1900_tl'),1)

-- Jack Krupansky

-Original Message- 
From: Marcin Rzewucki

Sent: Wednesday, March 19, 2014 8:13 AM
To: solr-user@lucene.apache.org
Subject: frange and field with hyphen

Hi everyone,

I got the following issue recently. I'm trying to use frange on a field
which has hyphen in name:


true
on
*:*
xml


{!frange l=1 u=99}sub(if(1, div(acc_curr_834_2-1900_tl,
1), 0), 1)


2.2



I got the following error:

DEBUG - 2014-03-19 12:11:53.805; org.eclipse.jetty.servlet.ServletHandler;
chain=SolrRequestFilter->default
DEBUG - 2014-03-19 12:11:53.805;
org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
SolrRequestFilter
ERROR - 2014-03-19 12:11:53.806; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: undefined field: "acc_curr_834_2"
   at
org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1172)
   at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:361)
   at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
   at
org.apache.solr.search.ValueSourceParser$11.parse(ValueSourceParser.java:174)
   at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
   at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
   at
org.apache.solr.search.ValueSourceParser$73.parse(ValueSourceParser.java:775)
   at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
   at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
   at
org.apache.solr.search.ValueSourceParser$18.parse(ValueSourceParser.java:252)
   at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
   at
org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68)
   at org.apache.solr.search.QParser.getQuery(QParser.java:141)
   at
org.apache.solr.search.SolrReturnFields.add(SolrReturnFields.java:285)
   at
org.apache.solr.search.SolrReturnFields.parseFieldList(SolrReturnFields.java:112)
   at
org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:98)
   at
org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:74)
   at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:122)
   at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:200)
   at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
   at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
   at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
   at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
   at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
   at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
   at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
   at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
   at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
   at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
   at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
   at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
   at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
   at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
   at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
   at org.eclipse.jetty.server.Server.handle(Server.java:364)
   at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
   at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
   at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
   at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
   at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpPa

Re: Newbie Question: Master Index or 100s Small Index

2014-03-19 Thread Toke Eskildsen

On Wed, 2014-03-19 at 11:55 +0100, Colin R wrote:
> We run a central database of 14M (and growing) photos with dates, captions,
> keywords, etc. 
> 
> We currently upgrading from old Lucene Servers to latest Solr running with a
> couple of dedicated  servers (6 core, 36GB, 500SSD). Planning on using Solr
> Cloud.

What hardware are your past experiences based on? If they have less
cores, lower memory and spinning drives, I foresee that your question
can be reduced to which architecture you prefer from a logistic point of
view, rather than performance.

> We take in thousands of changes each day (big and small) so indexing may be
> a bigger problem than searching.

Thousands of updates in a day is a very low number. Do you have hard
requirements for update time, perform heavy faceting or do anything
special for this to be a cause of concern?

> Is it quicker/better to just have one big 14M index and filter the
> complexities for each website or is it better to still maintain hundreds of
> indexes so we are searching smaller one.

All else being equal, a search in a specific small index will be faster
than filtering on the large one. But as we know, all else is never
equal. A 14M document index in itself is not really a challenge for
Lucene/Solr, but this depends a lot on your specific setup. How large is
the 14M index in terms of bytes?

> Bear in mind, we get thousands of changes a day PLUS very busy search servers.

How many queries/second are we talking about here? What is a typical
query (faceting, grouping, special processing...)?

Regards,
Toke Eskildsen, State and University Library, Denmark

frange and field with hyphen

2014-03-19 Thread Marcin Rzewucki

Hi everyone,

I got the following issue recently. I'm trying to use frange on a field
which has hyphen in name:


true
on
*:*
xml


{!frange l=1 u=99}sub(if(1, div(acc_curr_834_2-1900_tl,
1), 0), 1)


2.2



I got the following error:

DEBUG - 2014-03-19 12:11:53.805; org.eclipse.jetty.servlet.ServletHandler;
chain=SolrRequestFilter->default
DEBUG - 2014-03-19 12:11:53.805;
org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
SolrRequestFilter
ERROR - 2014-03-19 12:11:53.806; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: undefined field: "acc_curr_834_2"
at
org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1172)
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:361)
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
at
org.apache.solr.search.ValueSourceParser$11.parse(ValueSourceParser.java:174)
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
at
org.apache.solr.search.ValueSourceParser$73.parse(ValueSourceParser.java:775)
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
at
org.apache.solr.search.ValueSourceParser$18.parse(ValueSourceParser.java:252)
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at
org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68)
at org.apache.solr.search.QParser.getQuery(QParser.java:141)
at
org.apache.solr.search.SolrReturnFields.add(SolrReturnFields.java:285)
at
org.apache.solr.search.SolrReturnFields.parseFieldList(SolrReturnFields.java:112)
at
org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:98)
at
org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:74)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:122)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:200)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:364)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-19 Thread Steve Rowe

I’m glad it’s working for you now, thanks for reporting back. - Steve

On Mar 19, 2014, at 5:32 AM, Martin de Vries  wrote:

> We are running stable now for a full day, so the bug has been fixed.
> 
> Many thanks!
> 
> Martin

Newbie Question: Master Index or 100s Small Index

2014-03-19 Thread Colin R

We run a central database of 14M (and growing) photos with dates, captions,
keywords, etc. 

We currently upgrading from old Lucene Servers to latest Solr running with a
couple of dedicated  servers (6 core, 36GB, 500SSD). Planning on using Solr
Cloud.

We take in thousands of changes each day (big and small) so indexing may be
a bigger problem than searching.

My question is an architecture one.

These photos are currently indexed and searched in three ways.

1: The 14M pictures from above are split into a few hundred indexes that
feed a single website. This means index sizes of between 100 and 500,000
entries each.

2: 95% of these same photos are also wanted for searching on a global site.
Index size of 12M plus.

3: 80% of these same photos are also required for smaller group sites. Index
sizes of between 400K and 4M.

We currently make changes the single indexes and then merge into groups and
global. Due to the size of the numbers, is it worth changing or not.

Is it quicker/better to just have one big 14M index and filter the
complexities for each website or is it better to still maintain hundreds of
indexes so we are searching smaller one. Bear in mind, we get thousands of
changes a day PLUS very busy search servers.

Thanks

Col



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newbie-Question-Master-Index-or-100s-Small-Index-tp4125407.html
Sent from the Solr - User mailing list archive at Nabble.com.

Support for wildcard queries in elevate.xml

2014-03-19 Thread Bratislav Stojanovic

Hi,

I have searched the mailing list archives but couldn't find the right
answer so far.

I want to elevate some results using instructions from
QueryElevationComponent page, but
I'm not sure how to set queries in *elevate.xml* file. My query looks like
this :

(content:"foobar" OR text:"foobar") AND last_modified:[ TO <
date.to>}

The trick is that *last_modified *param can be pretty much anything as it
is set from the search
page to filter results. How to set this in the text attribute in **?

My attempt was this, but it doesn't work :

**
* *
* *
* *
**

Does this file support wildcard queries? Is there any solution how to
achieve this, or do I have to manipulate result
xml from solr to change result order?

P.S. I'm using Solr 4.6 on Windows 7 x64 and Java 1.7.0_51 x64. My
elevate.xml file is in the collection1\data folder.

Thank you all in advance.

-- 
Bratislav Stojanovic, M.Sc.

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-19 Thread Martin de Vries


We are running stable now for a full day, so the bug has been fixed.

Many thanks!

Martin

How to secure Solr admin page?

2014-03-19 Thread Tony Xue

Hi all,

I was following the instructions in the official wiki:
https://wiki.apache.org/solr/SolrSecurity

But I don't have any idea about what directory I should put between
 to secure only admin page.

I tried to put /admin/* but it didn't work.


Tony

Re: About enableLazyFieldLoading and memory

2014-03-19 Thread david . davila

That could be an interesting test. Unfortunately now I don't have time to 
do that, but maybe in future.

In order to avoid these memory consumptions we have reduced DocumentCache, 
and we don't have any problems. Besides, big queries that can cause 
problems are never made twice, so the DocumentCache is not needed.

If I have time to check that out I'll post it.

Best regards,

David Dávila Atienza
AEAT - Departamento de Informática Tributaria
Subdirección de Tecnologías de Análisis de la Información e Investigación 
del Fraude
Teléfono: 917681160
Extensión: 30160



De: Miguel 
Para:   solr-user@lucene.apache.org, 
Fecha:  19/03/2014 08:35
Asunto: Re: About enableLazyFieldLoading and memory



An interesting check would be disable compression on stored fields, and to 
check if your searcher works better. Disable compression should increase 
stored and searcher should be quicker.

I have read that disable compression all you need to do is to write a new 
codec that uses a stored fields format which does not compress stored 
fields such as Lucene40StoredFieldsFormat.

Best regards

El 18/03/2014 14:47, Shawn Heisey escribió:
On 3/18/2014 7:18 AM, david.dav...@correo.aeat.es wrote:

yes, but if I use enableLazyFieldLoading=trueand my queries only request 
for very small fields like ID, DocumentCache shouldn't grow, although my 
stored fields are very big. Am I wrong?


Since Solr 4.1, stored fields are compressed.  This probably means that
in order to get a tiny field out, it must still retrieve an an entire
block of compressed data and uncompress it.

The information in the issue that added the compression feature says
that only one compressed block is ever retrieved for a complete document.

https://issues.apache.org/jira/browse/LUCENE-4226

I wonder if perhaps either Solr or Lucene is dropping all the data into
one or more caches even though you only requested the ID, simply because
it is already available after decompression.  This is only a guess, and
I hope I'm wrong.  If this is indeed happening, it would defeat lazy
field loading.  Can someone with a better understanding comment?

Thanks,
Shawn

Re: About enableLazyFieldLoading and memory

2014-03-19 Thread Miguel

An interesting check would be disable compression on stored fields, and 
to check if your searcher works better. Disable compression should 
increase stored and searcher should be quicker.


I have read that disable compression all you need to do is to write a 
new codec that uses a stored fields format which does not compress 
stored fields such as Lucene40StoredFieldsFormat 
.


Best regards

El 18/03/2014 14:47, Shawn Heisey escribió:

On 3/18/2014 7:18 AM, david.dav...@correo.aeat.es wrote:

yes, but if I use enableLazyFieldLoading=trueand my queries only request
for very small fields like ID, DocumentCache shouldn't grow, although my
stored fields are very big. Am I wrong?

Since Solr 4.1, stored fields are compressed.  This probably means that
in order to get a tiny field out, it must still retrieve an an entire
block of compressed data and uncompress it.

The information in the issue that added the compression feature says
that only one compressed block is ever retrieved for a complete document.

https://issues.apache.org/jira/browse/LUCENE-4226

I wonder if perhaps either Solr or Lucene is dropping all the data into
one or more caches even though you only requested the ID, simply because
it is already available after decompression.  This is only a guess, and
I hope I'm wrong.  If this is indeed happening, it would defeat lazy
field loading.  Can someone with a better understanding comment?

Thanks,
Shawn

45 matches

Mail list logo