Re: how many shards required to search data

2014-02-21 Thread search engn dev
As you suggestedI have indexed 12million sample records in solr on hardware
of 8gb ram. Size of index is 3gb.
can i extrapolate this to predict actual size of index.?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-many-shards-required-to-search-data-tp4118715p4118753.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how many shards required to search data

2014-02-21 Thread Shawn Heisey
On 2/21/2014 1:39 AM, search engn dev wrote:
 As you suggestedI have indexed 12million sample records in solr on hardware
 of 8gb ram. Size of index is 3gb.
 can i extrapolate this to predict actual size of index.?

If the sizes of those records are about the same size as the records in
the system as a whole, you can probably use that to extrapolate.

Based on that, I would guess that the index is probably going to be
about 85GB.  That's a lot less than I would have guessed, so perhaps
there's a lot of extra stuff in that 250GB that doesn't actually get
sent to Solr.

Even though they are small, the number of documents will probably
require a larger Java heap than the relatively small index size would
normally require.

Do you have any kind of notion as to what kind of query volume you're
going to have?  If it's low, you can put multiple shards on your
multi-cpu machines and take advantage of parallel processing.  If the
query volume is high, you'll need all those cpus to handle the load of
one shard, and you might need more than two machines for each shard.

You'll want to shard your index even though it's relatively small in
terms of disk space, because a billion documents is a LOT.

If you're just starting out, SolrCloud is probably a good way to go.  It
handles document routing across shards for you.  You didn't say whether
that was your plan or not.

Thanks,
Shawn



RE: Grouping performance improvement

2014-02-21 Thread soodyogesh
Thanks Alexey for giving some really good points.

Just to make sure I get it right

Are you suggesting

1. do facets on category first lets say I get 10 distinct category
2. do another query where q=search query and fq= facet category values

May be im missing something, however Im not sure how to get factes along
with lets say 5 documents under each facet value.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549p4118844.html
Sent from the Solr - User mailing list archive at Nabble.com.


in XML Node Getting Error

2014-02-21 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi, I am getting Error if any of the field in the xml file has  as value.

How can I fix this issue

FYI I changed  to amp; in the field but still has issues

e.g filed name=NameATT/field or field name=NameATamp;T/field

Above both gives error, DO I need to change something in the configuration. 

Thanks

Ravi


ZK connection problems

2014-02-21 Thread Jeff Wartes

I’ve been experimenting with SolrCloud configurations in AWS. One issue I’ve 
been plagued with is that during indexing, occasionally a node decides it can’t 
talk to ZK, and this disables updates in the pool. The node usually recovers 
within a second or two. It’s possible this happens when I’m not indexing too, 
but I’m much less likely to notice.

I’ve seen this with multiple sharding configurations and multiple cluster 
sizes. I’ve searched around, and I think I’ve addressed the usual resolutions 
when someone complains about ZK and Solr. I’m using:

  *   60-sec ZK connection timeout (although this seems like a pretty terrible 
requirement)
  *   Independent 3-node ZK cluster, also in AWS.
  *   Solr 4.6.1
  *   Optimized GC settings (and I’ve confirmed no GC pauses are occurring)
  *   5-min auto-hard-commit with openSearcher=false

I’m indexing some 10K docs/sec using CloudSolrServer, but the CPU usage on the 
nodes doesn’t exceed 20%, typically it’s around 5%.

Here is the relevant section of logs from one of the nodes when this happened:
http://pastebin.com/K0ZdKmL4

It looks like it had a connection timeout, and tried to re-establish the same 
session on a connection to a new ZK node, except the session had also expired. 
It then closes *that* connection, changes to read-only mode, and eventually 
creates a new connection and new session which allows writes again.

Can anyone familiar with the ZK connection/session stuff comment on whether 
this is a bug? I really know nothing about proper ZK client behaviour.

Thanks.



Re: in XML Node Getting Error

2014-02-21 Thread Greg Walters
Ravi,

What's the error you're getting?

Thanks,
Greg

On Feb 21, 2014, at 11:08 AM, EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

 Hi, I am getting Error if any of the field in the xml file has  as value.
 
 How can I fix this issue
 
 FYI I changed  to amp; in the field but still has issues
 
 e.g filed name=NameATT/field or field name=NameATamp;T/field
 
 Above both gives error, DO I need to change something in the configuration. 
 
 Thanks
 
 Ravi



RE: in XML Node Getting Error

2014-02-21 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
I am getting something like

ERROR org.apache.solr.core.SolrCore [com.ctc.wstx.excwstxLazyEception] 
com.ctc.wstx.exc.WstxParsingException: Undeclared general entity nbsp

The filed content is nbsp;  or amp;


-Original Message-
From: Greg Walters [mailto:greg.walt...@answers.com] 
Sent: Friday, February 21, 2014 12:16 PM
To: solr-user@lucene.apache.org
Subject: Re:  in XML Node Getting Error

Ravi,

What's the error you're getting?

Thanks,
Greg

On Feb 21, 2014, at 11:08 AM, EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

 Hi, I am getting Error if any of the field in the xml file has  as value.
 
 How can I fix this issue
 
 FYI I changed  to amp; in the field but still has issues
 
 e.g filed name=NameATT/field or field name=NameATamp;T/field
 
 Above both gives error, DO I need to change something in the configuration. 
 
 Thanks
 
 Ravi



RE: in XML Node Getting Error

2014-02-21 Thread Chris Hostetter

: ERROR org.apache.solr.core.SolrCore [com.ctc.wstx.excwstxLazyEception] 
: com.ctc.wstx.exc.WstxParsingException: Undeclared general entity nbsp

nbsp is not a legal XML entity unless you have an enty declaration that 
defines it.

it sounds like you don't have valid xml -- it sounds like you maybe 
you have some HTML that someone cut/paste into a file that they called XML 
but isn't really.

you said the field in the xml file suggesting that someone/something 
attempted to build up a file in containing the xml messages for adding 
documents to solr -- what software created this file?  if it's just doing 
string manipulations to try and hack together some XML, you're going to 
keep running into pain.  You really want to be using a true XML library to 
generate correct XML.

Alternatively: don't generate files, or even XML at all -- make that 
software use a client API to talk directly to Solr via java obects, or 
json, or csv, etc



-Hoss
http://www.lucidworks.com/


Re: in XML Node Getting Error

2014-02-21 Thread Shawn Heisey
On 2/21/2014 10:31 AM, EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions) wrote:

I am getting something like

ERROR org.apache.solr.core.SolrCore [com.ctc.wstx.excwstxLazyEception] 
com.ctc.wstx.exc.WstxParsingException: Undeclared general entity nbsp

The filed content is nbsp;  or amp;


If you have nbsp entities, then it's not actually XML, it's a hybrid 
of HTML and XML.  There are exactly five legal entities in XML, and nbsp 
isn't one of them:


http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML

You'll need to clean up the XML.  As far as I know, there is no way to 
declare a permissive mode.  Solr uses standard and common XML libraries.


Thanks,
Shawn



Best way to get results ordered

2014-02-21 Thread OSMAN Metin
Hi all,

we are using SolR 4.4.0 and planning to migrate to 4.6.1 very soon.

We are looking for a way to get results ordered in a certain way.

For example, we are doing query by ids this way : q=id=A OR id =C OR id=B and 
we want the results to be sorted as A,C,B.

Is there a good way to do this with SolR or should we sort the items on the 
client application side ?

Regards,

Metin



Re: Best way to get results ordered

2014-02-21 Thread Michael Della Bitta
Hi Metin,

How many IDs are you supplying in a single query? You could probably
accomplish this easily with boosts if it were few.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Fri, Feb 21, 2014 at 1:25 PM, OSMAN Metin metin.os...@canal-plus.comwrote:

 Hi all,

 we are using SolR 4.4.0 and planning to migrate to 4.6.1 very soon.

 We are looking for a way to get results ordered in a certain way.

 For example, we are doing query by ids this way : q=id=A OR id =C OR id=B
 and we want the results to be sorted as A,C,B.

 Is there a good way to do this with SolR or should we sort the items on
 the client application side ?

 Regards,

 Metin




RE: Best way to get results ordered

2014-02-21 Thread OSMAN Metin
Thank you Michael,

this applies from 5 to about 60 contents.

We have already tried with boosts, but the results were not sorted well every 
time.
Maybe our boost coefficients were not set properly, but I thought that there 
will be a correct way to do this.

Metin OSMAN
Canal+ || DTD - VOD
01 71 35 02 70

-Message d'origine-
De : Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Envoyé : vendredi 21 février 2014 19:28
À : solr-user@lucene.apache.org
Objet : Re: Best way to get results ordered

Hi Metin,

How many IDs are you supplying in a single query? You could probably accomplish 
this easily with boosts if it were few.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Fri, Feb 21, 2014 at 1:25 PM, OSMAN Metin metin.os...@canal-plus.comwrote:

 Hi all,

 we are using SolR 4.4.0 and planning to migrate to 4.6.1 very soon.

 We are looking for a way to get results ordered in a certain way.

 For example, we are doing query by ids this way : q=id=A OR id =C OR 
 id=B and we want the results to be sorted as A,C,B.

 Is there a good way to do this with SolR or should we sort the items 
 on the client application side ?

 Regards,

 Metin




search across cores

2014-02-21 Thread T. Kuro Kurosaka
If I want to search across cores, can I use (abuse?) the distributed 
search?

My simple experiment seems to confirm this but I'd like to know if there is
any drawbacks other than those of distributed search listed here?
https://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

If all cores are served by the same machine, does a distributed
search actually make sub-search requests over HTTP? Or is it
clever enough to skip the HTTP connection?

Kuro



hardcommit setting in solrconfig

2014-02-21 Thread Joshi, Shital
Hello,

We have following hard commit setting in solrconfig.xml.

updateHandler class=solr.DirectUpdateHandler2
updateLog
  str name=dir${solr.ulog.dir:}/str
/updateLog

 autoCommit
   maxTime${solr.autoCommit.maxTime:60}/maxTime
   maxDocs10/maxDocs
   openSearchertrue/openSearcher
 /autoCommit
  /updateHandler

Shouldn't we see DirectUpdateHandler2; start commit  and DirectUpdateHandler2; 
end_commit_flush message in our log at least every ten minutes? I understand 
that if we have more than 100K documents to commit, hard commit could happen 
earlier than 10 minutes. But we see hard commit spaced out by more than 20 to 
30 minutes and sometimes couple hours. Can you please explain this behavior?

Thanks!



Re: hardcommit setting in solrconfig

2014-02-21 Thread Chris Hostetter

: Shouldn't we see DirectUpdateHandler2; start commit and 
: DirectUpdateHandler2; end_commit_flush message in our log at least every 
: ten minutes? I understand that if we have more than 100K documents to 
: commit, hard commit could happen earlier than 10 minutes. But we see 
: hard commit spaced out by more than 20 to 30 minutes and sometimes 
: couple hours. Can you please explain this behavior?

autoCommit's only happen if needed -- if you start up your server and 20 
minutes go by w/o any updates that need committed, there won't be a 
commit.  if after 20 minutes of uptime you send a single document, then 
with your autoCommit setting of 10 minutes, a max of 10 more minutes will 
elapse before commit happens automaticaly - if you explicitly commit 
before the 10 minutes are up, no auto commiting will happen.


-Hoss
http://www.lucidworks.com/


Re: search across cores

2014-02-21 Thread Shawn Heisey

On 2/21/2014 2:15 PM, T. Kuro Kurosaka wrote:
If I want to search across cores, can I use (abuse?) the distributed 
search?
My simple experiment seems to confirm this but I'd like to know if 
there is

any drawbacks other than those of distributed search listed here?
https://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations 



If all cores are served by the same machine, does a distributed
search actually make sub-search requests over HTTP? Or is it
clever enough to skip the HTTP connection?


As long as the cores use the same schema, or at least have enough fields 
in common, searching across multiple cores with the shards parameter 
will work just fine.  You would need the uniqueKey field to have the 
same name and underlying type on all cores, and any fields that you are 
searching would also have to be in all the cores.


It does make subrequests with HTTP.  If the address that is being 
contacted is local, the connection is very fast and does not actually go 
out on the network, so it has very low overhead.


Thanks,
Shawn



Re: hardcommit setting in solrconfig

2014-02-21 Thread Shawn Heisey

On 2/21/2014 2:34 PM, Joshi, Shital wrote:

updateHandler class=solr.DirectUpdateHandler2
 updateLog
   str name=dir${solr.ulog.dir:}/str
 /updateLog

  autoCommit
maxTime${solr.autoCommit.maxTime:60}/maxTime
maxDocs10/maxDocs
openSearchertrue/openSearcher
  /autoCommit
   /updateHandler

Shouldn't we see DirectUpdateHandler2; start commit  and DirectUpdateHandler2; 
end_commit_flush message in our log at least every ten minutes? I understand 
that if we have more than 100K documents to commit, hard commit could happen 
earlier than 10 minutes. But we see hard commit spaced out by more than 20 to 
30 minutes and sometimes couple hours. Can you please explain this behavior?


The autoCommit will not happen if you haven't indexed anything since the 
last commit.  As I understand it, the timer and document counter don't 
actually start until the moment you send an update request (add, update, 
or delete).  If no updates have come in, they are turned off once the 
commit completes.


Are you seeing this happen when you do not have delays between updates?

Thanks,
Shawn



How long do commits take?

2014-02-21 Thread William Tantzen
In solr 3.6, strictly using log files (catalina.out), how can I determine how 
long a commit operation takes?  I don’t see a QTime to help me out as in 
optimize…  No doubt, it’s staring me in the face but I can’t figure it out.

Thanks in advance,
Bill



Re: How long do commits take?

2014-02-21 Thread Shawn Heisey

On 2/21/2014 4:26 PM, William Tantzen wrote:

In solr 3.6, strictly using log files (catalina.out), how can I determine how 
long a commit operation takes?  I don’t see a QTime to help me out as in 
optimize…  No doubt, it’s staring me in the face but I can’t figure it out.


Here's a log entry from Solr 4.6.1:

INFO  - 2014-02-21 17:09:04.837; 
org.apache.solr.update.processor.LogUpdateProcessor; [s1live] 
webapp=/solr path=/update 
params={waitSearcher=truecommit=truewt=javabinversion=2softCommit=true} 
{commit=} 0 4698


The QTime value here is 4698 milliseconds.

I no longer have a 3.x server I can look at.

Thanks,
Shawn



Re: How long do commits take?

2014-02-21 Thread Shawn Heisey

On 2/21/2014 5:15 PM, Shawn Heisey wrote:

Here's a log entry from Solr 4.6.1:

INFO  - 2014-02-21 17:09:04.837; 
org.apache.solr.update.processor.LogUpdateProcessor; [s1live] 
webapp=/solr path=/update 
params={waitSearcher=truecommit=truewt=javabinversion=2softCommit=true} 
{commit=} 0 4698


The QTime value here is 4698 milliseconds.

I no longer have a 3.x server I can look at.


It was bugging me, not knowing what 3.x says.

I pulled down the lucene_solr_3_6 branch, built the example, fired it 
up, and then sent a commit request to the update handler on collection1.


http://server:8983/solr/collection1/update?commit=true

I got the following in the logs:

Feb 21, 2014 5:25:17 PM 
org.apache.solr.update.processor.LogUpdateProcessor finish

INFO: {commit=} 0 12
Feb 21, 2014 5:25:17 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={commit=true} status=0 QTime=12

Thanks,
Shawn



Fwd: help on edismax_dynamic fields

2014-02-21 Thread rashi gandhi
Hello,



I am using edismax parser in my project.

I just wanted to confirm whether we can use dynamic fields with edismax or
not.

When I am using specific dynamic field in qf or pf parameter , it is
working.



But when iam using dynamic fields with *, like this:



requestHandler name=/select class=solr.SearchHandler

lst name=defaults

   str name=echoParamsexplicit/str

   int name=rows10/int

   str name=dftext/str

   str name=defTypeedismax/str

*  str name=qf*

*   *_nlp_new_sv^0.8*

*  *_nlp_copy_sv^0.2*

*  /str*

/lst

/requestHandler

It is not working.



Is it possible to use dynamic fields with *,  like mentioned above with
edismax?

Please provide me some pointers on this.



Thanks in advance.