Fwd: help on edismax_dynamic fields

2014-02-21 Thread rashi gandhi
Hello,



I am using edismax parser in my project.

I just wanted to confirm whether we can use dynamic fields with edismax or
not.

When I am using specific dynamic field in qf or pf parameter , it is
working.



But when iam using dynamic fields with *, like this:







   explicit

   10

   text

   edismax

*  *

*   *_nlp_new_sv^0.8*

*  *_nlp_copy_sv^0.2*

*  *





It is not working.



Is it possible to use dynamic fields with *,  like mentioned above with
edismax?

Please provide me some pointers on this.



Thanks in advance.


Re: How long do commits take?

2014-02-21 Thread Shawn Heisey

On 2/21/2014 5:15 PM, Shawn Heisey wrote:

Here's a log entry from Solr 4.6.1:

INFO  - 2014-02-21 17:09:04.837; 
org.apache.solr.update.processor.LogUpdateProcessor; [s1live] 
webapp=/solr path=/update 
params={waitSearcher=true&commit=true&wt=javabin&version=2&softCommit=true} 
{commit=} 0 4698


The QTime value here is 4698 milliseconds.

I no longer have a 3.x server I can look at.


It was bugging me, not knowing what 3.x says.

I pulled down the lucene_solr_3_6 branch, built the example, fired it 
up, and then sent a commit request to the update handler on collection1.


http://server:8983/solr/collection1/update?commit=true

I got the following in the logs:

Feb 21, 2014 5:25:17 PM 
org.apache.solr.update.processor.LogUpdateProcessor finish

INFO: {commit=} 0 12
Feb 21, 2014 5:25:17 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={commit=true} status=0 QTime=12

Thanks,
Shawn



Re: How long do commits take?

2014-02-21 Thread Shawn Heisey

On 2/21/2014 4:26 PM, William Tantzen wrote:

In solr 3.6, strictly using log files (catalina.out), how can I determine how 
long a commit operation takes?  I don’t see a QTime to help me out as in 
optimize…  No doubt, it’s staring me in the face but I can’t figure it out.


Here's a log entry from Solr 4.6.1:

INFO  - 2014-02-21 17:09:04.837; 
org.apache.solr.update.processor.LogUpdateProcessor; [s1live] 
webapp=/solr path=/update 
params={waitSearcher=true&commit=true&wt=javabin&version=2&softCommit=true} 
{commit=} 0 4698


The QTime value here is 4698 milliseconds.

I no longer have a 3.x server I can look at.

Thanks,
Shawn



How long do commits take?

2014-02-21 Thread William Tantzen
In solr 3.6, strictly using log files (catalina.out), how can I determine how 
long a commit operation takes?  I don’t see a QTime to help me out as in 
optimize…  No doubt, it’s staring me in the face but I can’t figure it out.

Thanks in advance,
Bill



Re: Solr4 performance

2014-02-21 Thread Michael Della Bitta
It could be that your query is churning the page cache on that node
sometimes, so Solr pauses so the OS can drag those pages off of disk. Have
you tried profiling your iowait in top or iostat during these pauses?
(assuming you're using linux).

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital  wrote:

> Thanks for your answer.
>
> We confirmed that it is not GC issue.
>
> The auto warming query looks good too and queries before and after the
> long running query comes back really quick. The only thing stands out is
> shard on which query takes long time has couple million more documents than
> other shards.
>
> -Original Message-
> From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
> Sent: Thursday, February 20, 2014 5:26 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr4 performance
>
> Hi,
>
> As for your first question, setting openSearcher to true means you will see
> the new docs after every hard commit. Soft and hard commits only become
> isolated from one another with that set to false.
>
> Your second problem might be explained by your large heap and garbage
> collection. Walking a heap that large can take an appreciable amount of
> time. You might consider turning on the JVM options for logging GC and
> seeing if you can correlate your slow responses to times when your JVM is
> garbage collecting.
>
> Hope that helps,
> On Feb 20, 2014 4:52 PM, "Joshi, Shital"  wrote:
>
> > Hi!
> >
> > I have few other questions regarding Solr4 performance issue we're
> facing.
> >
> > We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use
> > commit=false in update URL. We have only hard commit setting in Solr4
> > config.
> >
> > 
> >${solr.autoCommit.maxTime:60}
> >10
> >true
> >  
> >
> >
> > Since we're not using Soft commit at all (commit=false), the caches will
> > not get reloaded for every commit and recently added documents will not
> be
> > visible, correct?
> >
> > What we see is queries which usually take few milli seconds, takes ~40
> > seconds once in a while. Can high IO during hard commit cause queries to
> > slow down?
> >
> > For some shards we see 98% full physical memory. We have 60GB machine (30
> > GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high
> > physical memory would cause queries to slow down. We're in process of
> > reducing JVM size anyways.
> >
> > We have never run optimization till now. QA optimization didn't yield in
> > performance gain.
> >
> > Thanks much for all help.
> >
> > -Original Message-
> > From: Shawn Heisey [mailto:s...@elyograg.org]
> > Sent: Tuesday, February 18, 2014 4:55 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr4 performance
> >
> > On 2/18/2014 2:14 PM, Joshi, Shital wrote:
> > > Thanks much for all suggestions. We're looking into reducing allocated
> > heap size of Solr4 JVM.
> > >
> > > We're using NRTCachingDirectoryFactory. Does it use MMapDirectory
> > internally? Can someone please confirm?
> >
> > In Solr, NRTCachingDirectory does indeed use MMapDirectory as its
> > default delegate.  That's probably also the case with Lucene -- these
> > are Lucene classes, after all.
> >
> > MMapDirectory is almost always the most efficient way to handle on-disk
> > indexes.
> >
> > Thanks,
> > Shawn
> >
> >
>


RE: Solr4 performance

2014-02-21 Thread Joshi, Shital
Thanks for your answer. 

We confirmed that it is not GC issue. 

The auto warming query looks good too and queries before and after the long 
running query comes back really quick. The only thing stands out is shard on 
which query takes long time has couple million more documents than other 
shards. 

-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Sent: Thursday, February 20, 2014 5:26 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr4 performance

Hi,

As for your first question, setting openSearcher to true means you will see
the new docs after every hard commit. Soft and hard commits only become
isolated from one another with that set to false.

Your second problem might be explained by your large heap and garbage
collection. Walking a heap that large can take an appreciable amount of
time. You might consider turning on the JVM options for logging GC and
seeing if you can correlate your slow responses to times when your JVM is
garbage collecting.

Hope that helps,
On Feb 20, 2014 4:52 PM, "Joshi, Shital"  wrote:

> Hi!
>
> I have few other questions regarding Solr4 performance issue we're facing.
>
> We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use
> commit=false in update URL. We have only hard commit setting in Solr4
> config.
>
> 
>${solr.autoCommit.maxTime:60}
>10
>true
>  
>
>
> Since we're not using Soft commit at all (commit=false), the caches will
> not get reloaded for every commit and recently added documents will not be
> visible, correct?
>
> What we see is queries which usually take few milli seconds, takes ~40
> seconds once in a while. Can high IO during hard commit cause queries to
> slow down?
>
> For some shards we see 98% full physical memory. We have 60GB machine (30
> GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high
> physical memory would cause queries to slow down. We're in process of
> reducing JVM size anyways.
>
> We have never run optimization till now. QA optimization didn't yield in
> performance gain.
>
> Thanks much for all help.
>
> -Original Message-
> From: Shawn Heisey [mailto:s...@elyograg.org]
> Sent: Tuesday, February 18, 2014 4:55 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr4 performance
>
> On 2/18/2014 2:14 PM, Joshi, Shital wrote:
> > Thanks much for all suggestions. We're looking into reducing allocated
> heap size of Solr4 JVM.
> >
> > We're using NRTCachingDirectoryFactory. Does it use MMapDirectory
> internally? Can someone please confirm?
>
> In Solr, NRTCachingDirectory does indeed use MMapDirectory as its
> default delegate.  That's probably also the case with Lucene -- these
> are Lucene classes, after all.
>
> MMapDirectory is almost always the most efficient way to handle on-disk
> indexes.
>
> Thanks,
> Shawn
>
>


Re: hardcommit setting in solrconfig

2014-02-21 Thread Shawn Heisey

On 2/21/2014 2:34 PM, Joshi, Shital wrote:


 
   ${solr.ulog.dir:}
 

  
${solr.autoCommit.maxTime:60}
10
true
  
   

Shouldn't we see DirectUpdateHandler2; start commit  and DirectUpdateHandler2; 
end_commit_flush message in our log at least every ten minutes? I understand 
that if we have more than 100K documents to commit, hard commit could happen 
earlier than 10 minutes. But we see hard commit spaced out by more than 20 to 
30 minutes and sometimes couple hours. Can you please explain this behavior?


The autoCommit will not happen if you haven't indexed anything since the 
last commit.  As I understand it, the timer and document counter don't 
actually start until the moment you send an update request (add, update, 
or delete).  If no updates have come in, they are turned off once the 
commit completes.


Are you seeing this happen when you do not have delays between updates?

Thanks,
Shawn



Re: search across cores

2014-02-21 Thread Shawn Heisey

On 2/21/2014 2:15 PM, T. Kuro Kurosaka wrote:
If I want to search across cores, can I use (abuse?) the distributed 
search?
My simple experiment seems to confirm this but I'd like to know if 
there is

any drawbacks other than those of distributed search listed here?
https://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations 



If all cores are served by the same machine, does a distributed
search actually make sub-search requests over HTTP? Or is it
clever enough to skip the HTTP connection?


As long as the cores use the same schema, or at least have enough fields 
in common, searching across multiple cores with the shards parameter 
will work just fine.  You would need the uniqueKey field to have the 
same name and underlying type on all cores, and any fields that you are 
searching would also have to be in all the cores.


It does make subrequests with HTTP.  If the address that is being 
contacted is local, the connection is very fast and does not actually go 
out on the network, so it has very low overhead.


Thanks,
Shawn



Re: hardcommit setting in solrconfig

2014-02-21 Thread Chris Hostetter

: Shouldn't we see DirectUpdateHandler2; start commit and 
: DirectUpdateHandler2; end_commit_flush message in our log at least every 
: ten minutes? I understand that if we have more than 100K documents to 
: commit, hard commit could happen earlier than 10 minutes. But we see 
: hard commit spaced out by more than 20 to 30 minutes and sometimes 
: couple hours. Can you please explain this behavior?

autoCommit's only happen if needed -- if you start up your server and 20 
minutes go by w/o any updates that need committed, there won't be a 
commit.  if after 20 minutes of uptime you send a single document, then 
with your autoCommit setting of 10 minutes, a max of 10 more minutes will 
elapse before commit happens automaticaly - if you explicitly commit 
before the 10 minutes are up, no auto commiting will happen.


-Hoss
http://www.lucidworks.com/


hardcommit setting in solrconfig

2014-02-21 Thread Joshi, Shital
Hello,

We have following hard commit setting in solrconfig.xml.



  ${solr.ulog.dir:}


 
   ${solr.autoCommit.maxTime:60}
   10
   true
 
  

Shouldn't we see DirectUpdateHandler2; start commit  and DirectUpdateHandler2; 
end_commit_flush message in our log at least every ten minutes? I understand 
that if we have more than 100K documents to commit, hard commit could happen 
earlier than 10 minutes. But we see hard commit spaced out by more than 20 to 
30 minutes and sometimes couple hours. Can you please explain this behavior?

Thanks!



search across cores

2014-02-21 Thread T. Kuro Kurosaka
If I want to search across cores, can I use (abuse?) the distributed 
search?

My simple experiment seems to confirm this but I'd like to know if there is
any drawbacks other than those of distributed search listed here?
https://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

If all cores are served by the same machine, does a distributed
search actually make sub-search requests over HTTP? Or is it
clever enough to skip the HTTP connection?

Kuro



RE: Best way to get results ordered

2014-02-21 Thread OSMAN Metin
Thank you Michael,

this applies from 5 to about 60 contents.

We have already tried with boosts, but the results were not sorted well every 
time.
Maybe our boost coefficients were not set properly, but I thought that there 
will be a "correct" way to do this.

Metin OSMAN
Canal+ || DTD - VOD
01 71 35 02 70

-Message d'origine-
De : Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Envoyé : vendredi 21 février 2014 19:28
À : solr-user@lucene.apache.org
Objet : Re: Best way to get results ordered

Hi Metin,

How many IDs are you supplying in a single query? You could probably accomplish 
this easily with boosts if it were few.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Fri, Feb 21, 2014 at 1:25 PM, OSMAN Metin wrote:

> Hi all,
>
> we are using SolR 4.4.0 and planning to migrate to 4.6.1 very soon.
>
> We are looking for a way to get results ordered in a certain way.
>
> For example, we are doing query by ids this way : q=id=A OR id =C OR 
> id=B and we want the results to be sorted as A,C,B.
>
> Is there a good way to do this with SolR or should we sort the items 
> on the client application side ?
>
> Regards,
>
> Metin
>
>


Re: Best way to get results ordered

2014-02-21 Thread Michael Della Bitta
Hi Metin,

How many IDs are you supplying in a single query? You could probably
accomplish this easily with boosts if it were few.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Fri, Feb 21, 2014 at 1:25 PM, OSMAN Metin wrote:

> Hi all,
>
> we are using SolR 4.4.0 and planning to migrate to 4.6.1 very soon.
>
> We are looking for a way to get results ordered in a certain way.
>
> For example, we are doing query by ids this way : q=id=A OR id =C OR id=B
> and we want the results to be sorted as A,C,B.
>
> Is there a good way to do this with SolR or should we sort the items on
> the client application side ?
>
> Regards,
>
> Metin
>
>


Best way to get results ordered

2014-02-21 Thread OSMAN Metin
Hi all,

we are using SolR 4.4.0 and planning to migrate to 4.6.1 very soon.

We are looking for a way to get results ordered in a certain way.

For example, we are doing query by ids this way : q=id=A OR id =C OR id=B and 
we want the results to be sorted as A,C,B.

Is there a good way to do this with SolR or should we sort the items on the 
client application side ?

Regards,

Metin



Re: & in XML Node Getting Error

2014-02-21 Thread Shawn Heisey
On 2/21/2014 10:31 AM, EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions) wrote:

I am getting something like

ERROR org.apache.solr.core.SolrCore [com.ctc.wstx.excwstxLazyEception] 
com.ctc.wstx.exc.WstxParsingException: Undeclared general entity "nbsp"

The filed content is " "  or "&"


If you have "nbsp" entities, then it's not actually XML, it's a hybrid 
of HTML and XML.  There are exactly five legal entities in XML, and nbsp 
isn't one of them:


http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML

You'll need to clean up the XML.  As far as I know, there is no way to 
declare a permissive mode.  Solr uses standard and common XML libraries.


Thanks,
Shawn



RE: & in XML Node Getting Error

2014-02-21 Thread Chris Hostetter

: ERROR org.apache.solr.core.SolrCore [com.ctc.wstx.excwstxLazyEception] 
: com.ctc.wstx.exc.WstxParsingException: Undeclared general entity "nbsp"

"nbsp" is not a legal XML entity unless you have an enty declaration that 
defines it.

it sounds like you don't have valid xml -- it sounds like you maybe 
you have some HTML that someone cut/paste into a file that they called XML 
but isn't really.

you said "the field in the xml file" suggesting that someone/something 
attempted to build up a "file" in containing the xml messages for adding 
documents to solr -- what software created this file?  if it's just doing 
string manipulations to try and hack together some XML, you're going to 
keep running into pain.  You really want to be using a true XML library to 
generate correct XML.

Alternatively: don't generate files, or even XML at all -- make that 
software use a client API to talk directly to Solr via java obects, or 
json, or csv, etc



-Hoss
http://www.lucidworks.com/


RE: & in XML Node Getting Error

2014-02-21 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
I am getting something like

ERROR org.apache.solr.core.SolrCore [com.ctc.wstx.excwstxLazyEception] 
com.ctc.wstx.exc.WstxParsingException: Undeclared general entity "nbsp"

The filed content is " "  or "&"


-Original Message-
From: Greg Walters [mailto:greg.walt...@answers.com] 
Sent: Friday, February 21, 2014 12:16 PM
To: solr-user@lucene.apache.org
Subject: Re: & in XML Node Getting Error

Ravi,

What's the error you're getting?

Thanks,
Greg

On Feb 21, 2014, at 11:08 AM, "EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions)"  wrote:

> Hi, I am getting Error if any of the field in the xml file has & as value.
> 
> How can I fix this issue
> 
> FYI I changed & to & in the field but still has issues
> 
> e.g AT&T or AT&T
> 
> Above both gives error, DO I need to change something in the configuration. 
> 
> Thanks
> 
> Ravi



Re: & in XML Node Getting Error

2014-02-21 Thread Greg Walters
Ravi,

What's the error you're getting?

Thanks,
Greg

On Feb 21, 2014, at 11:08 AM, "EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions)"  wrote:

> Hi, I am getting Error if any of the field in the xml file has & as value.
> 
> How can I fix this issue
> 
> FYI I changed & to & in the field but still has issues
> 
> e.g AT&T or AT&T
> 
> Above both gives error, DO I need to change something in the configuration. 
> 
> Thanks
> 
> Ravi



ZK connection problems

2014-02-21 Thread Jeff Wartes

I’ve been experimenting with SolrCloud configurations in AWS. One issue I’ve 
been plagued with is that during indexing, occasionally a node decides it can’t 
talk to ZK, and this disables updates in the pool. The node usually recovers 
within a second or two. It’s possible this happens when I’m not indexing too, 
but I’m much less likely to notice.

I’ve seen this with multiple sharding configurations and multiple cluster 
sizes. I’ve searched around, and I think I’ve addressed the usual resolutions 
when someone complains about ZK and Solr. I’m using:

  *   60-sec ZK connection timeout (although this seems like a pretty terrible 
requirement)
  *   Independent 3-node ZK cluster, also in AWS.
  *   Solr 4.6.1
  *   Optimized GC settings (and I’ve confirmed no GC pauses are occurring)
  *   5-min auto-hard-commit with openSearcher=false

I’m indexing some 10K docs/sec using CloudSolrServer, but the CPU usage on the 
nodes doesn’t exceed 20%, typically it’s around 5%.

Here is the relevant section of logs from one of the nodes when this happened:
http://pastebin.com/K0ZdKmL4

It looks like it had a connection timeout, and tried to re-establish the same 
session on a connection to a new ZK node, except the session had also expired. 
It then closes *that* connection, changes to read-only mode, and eventually 
creates a new connection and new session which allows writes again.

Can anyone familiar with the ZK connection/session stuff comment on whether 
this is a bug? I really know nothing about proper ZK client behaviour.

Thanks.



& in XML Node Getting Error

2014-02-21 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi, I am getting Error if any of the field in the xml file has & as value.

How can I fix this issue

FYI I changed & to & in the field but still has issues

e.g AT&T or AT&T

Above both gives error, DO I need to change something in the configuration. 

Thanks

Ravi


RE: Grouping performance improvement

2014-02-21 Thread soodyogesh
Thanks Alexey for giving some really good points.

Just to make sure I get it right

Are you suggesting

1. do facets on category first lets say I get 10 distinct category
2. do another query where q=search query and fq= facet category values

May be im missing something, however Im not sure how to get factes along
with lets say 5 documents under each facet value.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549p4118844.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how many shards required to search data

2014-02-21 Thread Shawn Heisey
On 2/21/2014 1:39 AM, search engn dev wrote:
> As you suggestedI have indexed 12million sample records in solr on hardware
> of 8gb ram. Size of index is 3gb.
> can i extrapolate this to predict actual size of index.?

If the sizes of those records are about the same size as the records in
the system as a whole, you can probably use that to extrapolate.

Based on that, I would guess that the index is probably going to be
about 85GB.  That's a lot less than I would have guessed, so perhaps
there's a lot of extra stuff in that 250GB that doesn't actually get
sent to Solr.

Even though they are small, the number of documents will probably
require a larger Java heap than the relatively small index size would
normally require.

Do you have any kind of notion as to what kind of query volume you're
going to have?  If it's low, you can put multiple shards on your
multi-cpu machines and take advantage of parallel processing.  If the
query volume is high, you'll need all those cpus to handle the load of
one shard, and you might need more than two machines for each shard.

You'll want to shard your index even though it's relatively small in
terms of disk space, because a billion documents is a LOT.

If you're just starting out, SolrCloud is probably a good way to go.  It
handles document routing across shards for you.  You didn't say whether
that was your plan or not.

Thanks,
Shawn



Re: how many shards required to search data

2014-02-21 Thread search engn dev
As you suggestedI have indexed 12million sample records in solr on hardware
of 8gb ram. Size of index is 3gb.
can i extrapolate this to predict actual size of index.?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-many-shards-required-to-search-data-tp4118715p4118753.html
Sent from the Solr - User mailing list archive at Nabble.com.