Issues of deploying mutiple web applications to Solr Jetty

2016-05-08 Thread tjlp
 Hi,
At present I try to deploy multiple web applications to Solr Jetty. So I add 
context files for each web application under the folder $SOLR.HOME/contexts. 
Now I get one library conflict issues:one of web applications uses old version 
of slf-4j, which conflict with the slf-4j libraries used by Solr itself. So, 
some error messages are printed on console as follow. How to config the web 
applications that avoid the conflict and keep different version of slf-4j for 
different applications.

log4j:ERROR A "org.apache.log4j.EnhancedPatternLayout" object is not assignable
to a "org.apache.log4j.Layout" variable.
log4j:ERROR The class "org.apache.log4j.Layout" was loaded by
log4j:ERROR [WebAppClassLoader=QuartzWeb@77fbd92c] whereas object of type
log4j:ERROR "org.apache.log4j.EnhancedPatternLayout" was loaded by [startJarLoad
er@681a9515].
log4j:ERROR A "org.apache.log4j.EnhancedPatternLayout" object is not assignable
to a "org.apache.log4j.Layout" variable.
log4j:ERROR The class "org.apache.log4j.Layout" was loaded by
log4j:ERROR [WebAppClassLoader=QuartzWeb@77fbd92c] whereas object of type
log4j:ERROR "org.apache.log4j.EnhancedPatternLayout" was loaded by [startJarLoad
er@681a9515].
log4j:ERROR No layout set for the appender named [file].
log4j:ERROR No layout set for the appender named [CONSOLE].
ThanksLiu Peng


Re: Advice to add additional non-related fields to a collection or create a subset of it?

2016-05-08 Thread Derek Poh

Hi Erick

In my case, by denormalizing,that means putting the product and supplier 
information into one collection?

The supplier information arestored but not indexed in thecollection.

We haveidentified itwas a combination of a loop and bad source data that 
caused an endless loop under certain scenario.


Is it advisable to has as less number of queries to solr in a page?


On 5/6/2016 11:17 PM, Erick Erickson wrote:

Denormalizing the data is usually the first thing to try. That's
certainly the preferred option if it doesn't bloat the index
unacceptably.

But my real question is what have you done to try to figure out _why_
it's slow? Do you have some loop
like
for (each found document)
extract all the supplier IDs and query Solr for them)

? That's a fundamental design decision that will be expensive.

Have you examined the time each query takes to see if Solr is really
the bottleneck or whether it's "something else"? Mind you, I have no
clue what "something else" is here

Do you ever return lots of rows (i.e. thousands)?

Solr serves queries very quickly, so I'd concentrate on identifying what
is slow before jumping to a solution

Best,
Erick

On Wed, May 4, 2016 at 10:28 PM, Derek Poh  wrote:

Hi

We have a "product" collection and a "supplier" collection.
The "product" collection contains products information and "supplier"
collection contains the product's suppliers information.
We have a subsidiary page that query on "product" collection for the search.
The display result include product and supplier information.
This page will query the "product" collection to get the matching product
records.
 From this query a list of the matching product's supplier id is extracted
and used in a filter query against the "supplier" collection to get the
necessary supplier's information.

The loading of this page is very slow, it leads to timeout at times as well.
Beside looking at tweaking the codes of the page we are also looking at what
tweaking can be done on solr side. Reducing the number of queries generated
bythis page was one of the optionto try.

The main "product" collection is also use by our site main search page and
other subsidiary pages as well. So the query load on it is substantial.
It has about 6.5 million documents and index size of 38-39 GB.
It is setup as 1 shard with 5 replicas. Each replica is on it's own server.
Total of 5 servers.
There are other smaller collections with similar 1 shard 5 replicas setup
residing on these servers as well.

I am thinking of either
1. Index supplier information into the "product" collection.
2. Create another similar "product" collection for this page to use. This
collection will have lesser product fields and will include the required
supplier fields. But the number of documents in it will be the same as the
main "product" collection. The index size will be smallerthough.

With either 2 options we do not need to query "supplier" collection. So
there is one less query and hopefully it will improve the performance of
this page.

What is the advise between the 2 options?
Any other advice or options?

Derek

--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or
privileged information. If you are not the intended recipient or have
received this e-mail in error, please inform the sender immediately and
delete this e-mail (including any attachments) from your computer, and you
must not use, disclose to anyone else or copy this e-mail (including any
attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal,
regulatory compliance and/or other appropriate reasons.





--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: Solr 6 / Solrj RuntimeException: First tuple is not a metadata tuple

2016-05-08 Thread deniz
Joel Bernstein wrote
> It appears that the /sql handler is not sending the metadata Tuple.
> According to the log the parameter includeMetadata=true is being sent.
> This
> should trigger the sending of the metadata Tuple.
> 
> Is it possible that you are using a pre 6.0 release version of Solr from
> the master branch? The JDBC client appears to be from the 6.0 release but
> the server could be an older version.
> 
> The reason I ask this, is that older versions of the /sql handler don't
> have the metadata Tuple logic. So the query would be processed correctly
> but the metadata Tuple wouldn't be there.
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/


I have checked once more about the version of solr and clean up all of the
zookeeper data as well and restarted, but the problem is still going on... 

pre-6 versions actually support /sql handler? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-Solrj-RuntimeException-First-tuple-is-not-a-metadata-tuple-tp4274451p4275437.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fq behavior...

2016-05-08 Thread Bastien Latard - MDPI AG

Thank you guys!
I got it.

kr,
Bast


On 06/05/2016 17:27, Erick Erickson wrote:

>From Yonik's blog:
"By default, Solr resolves all of the filters before the main query"

By definition, the non-cached fq clause _must_ be
executed over the entire data set in order to be
cached. Otherwise, how could the next query
that uses an identical fq clause make use of the
cached value?

If cache=false, it's  a different story as per Yonik's
blog.

On Fri, May 6, 2016 at 7:25 AM, Shawn Heisey  wrote:

On 5/6/2016 12:07 AM, Bastien Latard - MDPI AG wrote:

Thank you Susmit, so the answer is:
fq queries are by default run before the main query.

Queries in fq parameters are normally executed in parallel with the main
query, unless they are a postfilter.  I am not sure that the standard
parser supports being run as a postfilter.  Some parsers (like geofilt)
do support that.

Susmit already gave you this link where some of that is explained:

http://yonik.com/advanced-filter-caching-in-solr/

Thanks,
Shawn



Kind regards,
Bastien Latard
Web engineer
--
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/



Re: Passing Ids in query takes more time

2016-05-08 Thread Bhaumik Joshi
Thanks Jeff. TermsQueryParser worked for me. 

Thanks & Regards,
Bhaumik Joshi


From: Jeff Wartes 
Sent: Thursday, May 5, 2016 8:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Passing Ids in query takes more time

An ID lookup is a very simple and fast query, for one ID. Or’ing a lookup for 
80k ids though is basically 80k searches as far as Solr is concerned, so it’s 
not altogether surprising that it takes a while. Your complaint seems to be 
that the query planner doesn’t know in advance that  should be 
run first, and then the id selection applied to the reduced set.

So, I can think of a few things for you to look at, in no particular order:

1. TermsQueryParser is designed for lists of terms, you might get better 
results from that: 
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

2. If your  is the real discriminating factor in your search, 
you could just search for  and then apply your ID list as a 
PostFilter: http://yonik.com/advanced-filter-caching-in-solr/
I guess that’d look something like &fq={!terms f= v="= 100 
should qualify it as a post filter, which only operates on an already-found 
result set instead of the full index. (Note: I haven’t confirmed that the Terms 
query parser supports post filtering.)

3. I’m not really aware of any storage engine that’ll love doing a filter on 
80k ids at once, but a key-value store like Cassandra might work out better for 
that.

4. There is a thing called a JoinQParserPlugin 
(https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser)
 that can join to another collection 
(https://issues.apache.org/jira/browse/SOLR-4905). But I’ve never used it, and 
there are some significant restrictions.




On 5/5/16, 2:46 AM, "Bhaumik Joshi"  wrote:

>Hi,
>
>
>I am retrieving ids from collection1 based on some query and passing those ids 
>as a query to collection2 so the query to collection2 which contains ids in it 
>takes much more time compare to normal query.
>
>
>Que. 1 - While passing ids to query why it takes more time compare to normal 
>query however we are narrowing the criteria by passing ids?
>
>e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower 
>(passing 80k ids takes 7-9 sec) than query-2: only  (700-800 
>ms). Both returns 250 records with same set of fields.
>
>
>Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
>pass those ids to other one) in efficient manner or any other way to get data 
>from one collection based on response of other collection?
>
>
>Thanks & Regards,
>
>Bhaumik Joshi

Re: Passing IDs in query takes more time

2016-05-08 Thread Bhaumik Joshi
Thanks Erick. TermsQueryParser worked for me. 

Thanks & Regards,
Bhaumik Joshi


From: Erick Erickson 
Sent: Friday, May 6, 2016 10:00 AM
To: solr-user
Subject: Re: Passing IDs in query takes more time

Well, you're parsing 80K IDs and forming them into a query. Consider
what has to happen. Even in the very best case of the 
being evaluated first, for every doc that satisfies that clause the inverted
index must be examined 80,000 times to see if that doc matches
one of the IDs in your huge clause for scoring purposes.

You might be better off by moving the 80K list to an fq clause like
fq={!cache=false}docid:(111 222 333).

Additionally, you probably want to use the TermsQueryParser, something like:
fq={!terms f=id cache=false}111,222,333
see:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

In any case, though, an 80K clause will slow things down considerably.

Best,
Erick

On Thu, May 5, 2016 at 2:42 AM, Bhaumik Joshi  wrote:
> Hi,
>
>
> I am retrieving ids from collection1 based on some query and passing those 
> ids as a query to collection2 so the query to collection2 which contains ids 
> in it takes much more time compare to normal query.
>
>
> Que. 1 - While passing ids to query why it takes more time compare to normal 
> query however we are narrowing the criteria by passing ids?
>
> e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower 
> (takes 7-9 sec) than
>
> only  (700-800 ms). Please note that in this case i am 
> passing 80k ids in  and retrieving 250 rows.
>
>
> Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
> pass those ids to other one) in efficient manner or any other way to get data 
> from one collection based on response of other collection?
>
>
> Thanks & Regards,
>
> Bhaumik Joshi

回复:Re: JDK requirements for Solr 5.5

2016-05-08 Thread tjlp
Thanks. I ask this question because we possibly have some "old" customers which 
still uses 32-bit OS.
- 原始邮件 -
发件人:Shawn Heisey 
收件人:solr-user@lucene.apache.org
主题:Re: JDK requirements for Solr 5.5
日期:2016年05月05日 23点21分


On 5/5/2016 1:48 AM, t...@sina.com wrote:
> Can Solr run on JDK 7 32 bit? Or must be 64 bit?
You can use a 32-bit JVM ... but it will be limited to 2GB of heap. 
This is a Java limitation, not a Solr limitation.  Depending on how
large your index is, 2GB may not be enough.
Thanks,
Shawn


Re: Filter queries & caching

2016-05-08 Thread Ahmet Arslan
Hi,

As I understand it useful incase you use an OR operator between two restricting 
clauses.
Recall that multiple fq means implicit AND.

ahmet



On Monday, May 9, 2016 4:02 AM, Jay Potharaju  wrote:
As mentioned above adding filter() will add the filter query to the cache.
This would mean that results are fetched from cache instead of running n
number of filter queries  in parallel.
Is it necessary to use the filter() option? I was under the impression that
all filter queries will get added to the "filtercache". What is the
advantage of using filter()?

*From
doc: 
https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
*
This cache is used by SolrIndexSearcher for filters (DocSets) for unordered
sets of all documents that match a query. The numeric attributes control
the number of entries in the cache.
Solr uses the filterCache to cache results of queries that use the fq
search parameter. Subsequent queries using the same parameter setting
result in cache hits and rapid returns of results. See Searching for a
detailed discussion of the fq parameter.

*From Yonik's site: http://yonik.com/solr/query-syntax/#FilterQuery
*

(Since Solr 5.4)

A filter query retrieves a set of documents matching a query from the
filter cache. Since scores are not cached, all documents that match the
filter produce the same score (0 by default). Cached filters will be
extremely fast when they are used again in another query.


Thanks


On Fri, May 6, 2016 at 9:46 AM, Jay Potharaju  wrote:

> We have high query load and considering that I think the suggestions made
> above will help with performance.
> Thanks
> Jay
>
> On Fri, May 6, 2016 at 7:26 AM, Shawn Heisey  wrote:
>
>> On 5/6/2016 7:19 AM, Shawn Heisey wrote:
>> > With three separate
>> > fq parameters, you'll get three cache entries in filterCache from the
>> > one query.
>>
>> One more tidbit of information related to this:
>>
>> When you have multiple filters and they aren't cached, I am reasonably
>> certain that they run in parallel.  Instead of one complex filter, you
>> would have three simple filters running simultaneously.  For low to
>> medium query loads on a server with a whole bunch of CPUs, where there
>> is plenty of spare CPU power, this can be a real gain in performance ...
>> but if the query load is really high, it might be a bad thing.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Thanks
> Jay Potharaju

>
>



-- 
Thanks
Jay Potharaju


Re: Filter queries & caching

2016-05-08 Thread Jay Potharaju
As mentioned above adding filter() will add the filter query to the cache.
This would mean that results are fetched from cache instead of running n
number of filter queries  in parallel.
Is it necessary to use the filter() option? I was under the impression that
all filter queries will get added to the "filtercache". What is the
advantage of using filter()?

*From
doc: 
https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
*
This cache is used by SolrIndexSearcher for filters (DocSets) for unordered
sets of all documents that match a query. The numeric attributes control
the number of entries in the cache.
Solr uses the filterCache to cache results of queries that use the fq
search parameter. Subsequent queries using the same parameter setting
result in cache hits and rapid returns of results. See Searching for a
detailed discussion of the fq parameter.

*From Yonik's site: http://yonik.com/solr/query-syntax/#FilterQuery
*

(Since Solr 5.4)

A filter query retrieves a set of documents matching a query from the
filter cache. Since scores are not cached, all documents that match the
filter produce the same score (0 by default). Cached filters will be
extremely fast when they are used again in another query.


Thanks


On Fri, May 6, 2016 at 9:46 AM, Jay Potharaju  wrote:

> We have high query load and considering that I think the suggestions made
> above will help with performance.
> Thanks
> Jay
>
> On Fri, May 6, 2016 at 7:26 AM, Shawn Heisey  wrote:
>
>> On 5/6/2016 7:19 AM, Shawn Heisey wrote:
>> > With three separate
>> > fq parameters, you'll get three cache entries in filterCache from the
>> > one query.
>>
>> One more tidbit of information related to this:
>>
>> When you have multiple filters and they aren't cached, I am reasonably
>> certain that they run in parallel.  Instead of one complex filter, you
>> would have three simple filters running simultaneously.  For low to
>> medium query loads on a server with a whole bunch of CPUs, where there
>> is plenty of spare CPU power, this can be a real gain in performance ...
>> but if the query load is really high, it might be a bad thing.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Thanks
> Jay Potharaju
>
>



-- 
Thanks
Jay Potharaju


Re: understanding phonetic matching

2016-05-08 Thread Erick Erickson
Jay:

Here's what's currently available:

https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching

Not sure what version of Solr some of them were added in

Best,
Erick

On Sat, May 7, 2016 at 9:30 PM, Jay Potharaju  wrote:
> Thanks will check it out.
>
>
> On Sat, May 7, 2016 at 7:05 PM, Susheel Kumar  wrote:
>
>> Jay,
>>
>> There are mainly three phonetics algorithms available in Solr i.e.
>> RefinedSoundex, DoubleMetaphone & BeiderMorse.  We did extensive comparison
>> considering various tests cases and found BeiderMorse to be the best among
>> those for finding sound like matches and it also supports multiple
>> languages.  We also customized Beider Morse extensively for our use case.
>>
>> So please take a closer look at Beider Morse and i am sure it will help you
>> out.
>>
>> Thanks,
>> Susheel
>>
>> On Sat, May 7, 2016 at 2:13 PM, Jay Potharaju 
>> wrote:
>>
>> > Thanks for the feedback, I was getting correct results when searching for
>> > jon & john. But when I tried other names like 'khloe' it matched on
>> > 'collier' because the phonetic filter generated KL as the token.
>> > Is phonetic filter the best way to find similar sounding names?
>> >
>> >
>> > On Wed, Mar 23, 2016 at 12:01 AM, davidphilip cherian <
>> > davidphilipcher...@gmail.com> wrote:
>> >
>> > > The "phonetic_en" analyzer definition available in solr-schema does
>> > return
>> > > documents having "Jon", "JN", "John" when search term is "John".
>> Checkout
>> > > screen shot here : http://imgur.com/0R6SvX2
>> > >
>> > > This wiki page explains how phonetic matching works :
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching#PhoneticMatching-DoubleMetaphone
>> > >
>> > >
>> > > Hope that helps.
>> > >
>> > >
>> > >
>> > > On Wed, Mar 23, 2016 at 11:18 AM, Alexandre Rafalovitch <
>> > > arafa...@gmail.com>
>> > > wrote:
>> > >
>> > > > I'd start by putting LowerCaseFF before the PhoneticFilter.
>> > > >
>> > > > But then, you say you were using Analysis screen and what? Do you get
>> > > > the matches when you put your sample text and the query text in the
>> > > > two boxes in the UI? I am not sure what "look at my solr data" means
>> > > > in this particular context.
>> > > >
>> > > > Regards,
>> > > >Alex.
>> > > > 
>> > > > Newsletter and resources for Solr beginners and intermediates:
>> > > > http://www.solr-start.com/
>> > > >
>> > > >
>> > > > On 23 March 2016 at 16:27, Jay Potharaju 
>> > wrote:
>> > > > > Hi,
>> > > > > I am trying to do name matching using the phonetic filter factory.
>> As
>> > > > part
>> > > > > of that I was analyzing the data using analysis screen in solr UI.
>> > If i
>> > > > > search for john, any documents containing john or jon should be
>> > found.
>> > > > >
>> > > > > Following is my definition of the custom field that I use for
>> > indexing
>> > > > the
>> > > > > data. When I look at my solr data I dont see any similar sounding
>> > names
>> > > > in
>> > > > > my solr data, even though I have set inject="true". Is that not how
>> > it
>> > > is
>> > > > > supposed to work?
>> > > > > Can someone explain how phonetic matching works?
>> > > > >
>> > > > >  > > > > positionIncrementGap
>> > > > > ="100">
>> > > > >
>> > > > >  
>> > > > >
>> > > > > 
>> > > > >
>> > > > > > > > > encoder="DoubleMetaphone"
>> > > > > inject="true" maxCodeLength="5"/>
>> > > > >
>> > > > > 
>> > > > >
>> > > > >  
>> > > > >
>> > > > > 
>> > > > >
>> > > > > --
>> > > > > Thanks
>> > > > > Jay
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks
>> > Jay Potharaju
>> >
>>
>
>
>
> --
> Thanks
> Jay Potharaju


Re: Solr re-indexing in case of store=false

2016-05-08 Thread Erick Erickson
bq: I would be grateful if somebody could introduce other way of re-indexing
the whole data without using another datastore

Not possible currently. Consider what's _in_ the index when stored="false".
The actual terms are the output of the entire analysis chain, including
stemming, stopword removal, synonym substitution etc. Since the
indexing process is lossy, you simply cannot reconstruct the original
stream from the indexed terms.

I suppose one _could_ do this in the case of docValues only index with
the new return-values-from-docvalues functionality, but even that's lossy
because the order of returned values may not be the original insertion
order. And if that suits your needs, a pretty simple driver program would
suffice.

To do this from indexed-only terms you'd have to somehow store the
original version of each term or store some codes indicating exactly
how to reconstruct the original steam, which very possibly would take
up as much space as if you'd just stored the values anyway. _And_ it
would burden every one else who didn't want to do this with a bloated
index.

Best,
Erick

On Sun, May 8, 2016 at 4:25 AM, Ali Nazemian  wrote:
> Dear all,
> Hi,
> I was wondering, is it possible to re-index Solr 6.0 data in case of
> store=false? I am using Solr as a secondary datastore, and for the sake of
> space efficiency all the fields (except id) are considered as store=false.
> Currently, due to some changes in application business, Solr schema should
> change, and in order to see the effect of changing schema on old data, I
> have to do the re-index process.  I know that one way of re-indexing in
> Solr is reading data from one collection (core) and inserting that to
> another one, but this solution is not possible for store=false fields, and
> re-indexing the whole data through primary datastore is kind of costly, so
> I would be grateful if somebody could introduce other way of re-indexing
> the whole data without using another datastore.
>
> Sincerely,
>
> --
> A.Nazemian


Solr re-indexing in case of store=false

2016-05-08 Thread Ali Nazemian
Dear all,
Hi,
I was wondering, is it possible to re-index Solr 6.0 data in case of
store=false? I am using Solr as a secondary datastore, and for the sake of
space efficiency all the fields (except id) are considered as store=false.
Currently, due to some changes in application business, Solr schema should
change, and in order to see the effect of changing schema on old data, I
have to do the re-index process.  I know that one way of re-indexing in
Solr is reading data from one collection (core) and inserting that to
another one, but this solution is not possible for store=false fields, and
re-indexing the whole data through primary datastore is kind of costly, so
I would be grateful if somebody could introduce other way of re-indexing
the whole data without using another datastore.

Sincerely,

-- 
A.Nazemian