Re: Data Import faile in solr 4.3.0

2013-08-21 Thread Montu v Boda
Thanks for suggestion

but as per us this is not the right way to re-index all the data each and
every time. we mean when we migrate the sole from older to latest version.
there is some way that solr have to provide the solutions for this because
re indexing the 50 lac document is not an easy job.

we want to know is there any way in solr to do this in easily.

Thanks & Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-faile-in-solr-4-3-0-tp4085868p4086020.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to avoid underscore sign indexing problem?

2013-08-21 Thread Floyd Wu
After trying some search case and different params combination of
WordDelimeter. I wonder what is the best strategy to index string
"2DA012_ISO MARK 2" and can be search by term "2DA012"?

What if I just want _ to be removed both query/index time, what and how to
configure?

Floyd



2013/8/22 Floyd Wu 

> Thank you all.
> By the way, Jack I gonna by your book. Where to buy?
> Floyd
>
>
> 2013/8/22 Jack Krupansky 
>
>> "I thought that the StandardTokenizer always split on punctuation, "
>>
>> Proving that you haven't read my book! The section on the standard
>> tokenizer details the rules that the tokenizer uses (in addition to
>> extensive examples.) That's what I mean by "deep dive."
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Shawn Heisey
>> Sent: Wednesday, August 21, 2013 10:41 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to avoid underscore sign indexing problem?
>>
>>
>> On 8/21/2013 7:54 PM, Floyd Wu wrote:
>>
>>> When using StandardAnalyzer to tokenize string "Pacific_Rim" will get
>>>
>>> ST
>>> textraw_**bytesstartendtypeposition
>>> pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]0111
>>>
>>> How to make this string to be tokenized to these two tokens "Pacific",
>>> "Rim"?
>>> Set _ as stopword?
>>> Please kindly help on this.
>>> Many thanks.
>>>
>>
>> Interesting.  I thought that the StandardTokenizer always split on
>> punctuation, but apparently that's not the case for the underscore
>> character.
>>
>> You can always use the WordDelimeterFilter after the StandardTokenizer.
>>
>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
>> WordDelimiterFilterFactory
>>
>> Thanks,
>> Shawn
>>
>
>


Re: How to avoid underscore sign indexing problem?

2013-08-21 Thread Floyd Wu
Thank you all.
By the way, Jack I gonna by your book. Where to buy?
Floyd


2013/8/22 Jack Krupansky 

> "I thought that the StandardTokenizer always split on punctuation, "
>
> Proving that you haven't read my book! The section on the standard
> tokenizer details the rules that the tokenizer uses (in addition to
> extensive examples.) That's what I mean by "deep dive."
>
> -- Jack Krupansky
>
> -Original Message- From: Shawn Heisey
> Sent: Wednesday, August 21, 2013 10:41 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to avoid underscore sign indexing problem?
>
>
> On 8/21/2013 7:54 PM, Floyd Wu wrote:
>
>> When using StandardAnalyzer to tokenize string "Pacific_Rim" will get
>>
>> ST
>> textraw_**bytesstartendtypeposition
>> pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]0111
>>
>> How to make this string to be tokenized to these two tokens "Pacific",
>> "Rim"?
>> Set _ as stopword?
>> Please kindly help on this.
>> Many thanks.
>>
>
> Interesting.  I thought that the StandardTokenizer always split on
> punctuation, but apparently that's not the case for the underscore
> character.
>
> You can always use the WordDelimeterFilter after the StandardTokenizer.
>
> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
> WordDelimiterFilterFactory
>
> Thanks,
> Shawn
>


Re: How to avoid underscore sign indexing problem?

2013-08-21 Thread Jack Krupansky

"I thought that the StandardTokenizer always split on punctuation, "

Proving that you haven't read my book! The section on the standard tokenizer 
details the rules that the tokenizer uses (in addition to extensive 
examples.) That's what I mean by "deep dive."


-- Jack Krupansky

-Original Message- 
From: Shawn Heisey

Sent: Wednesday, August 21, 2013 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: How to avoid underscore sign indexing problem?

On 8/21/2013 7:54 PM, Floyd Wu wrote:

When using StandardAnalyzer to tokenize string "Pacific_Rim" will get

ST
textraw_bytesstartendtypeposition
pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]0111

How to make this string to be tokenized to these two tokens "Pacific",
"Rim"?
Set _ as stopword?
Please kindly help on this.
Many thanks.


Interesting.  I thought that the StandardTokenizer always split on
punctuation, but apparently that's not the case for the underscore
character.

You can always use the WordDelimeterFilter after the StandardTokenizer.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Thanks,
Shawn 



Re: How to avoid underscore sign indexing problem?

2013-08-21 Thread Shawn Heisey
On 8/21/2013 7:54 PM, Floyd Wu wrote:
> When using StandardAnalyzer to tokenize string "Pacific_Rim" will get
> 
> ST
> textraw_bytesstartendtypeposition
> pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]0111
> 
> How to make this string to be tokenized to these two tokens "Pacific",
> "Rim"?
> Set _ as stopword?
> Please kindly help on this.
> Many thanks.

Interesting.  I thought that the StandardTokenizer always split on
punctuation, but apparently that's not the case for the underscore
character.

You can always use the WordDelimeterFilter after the StandardTokenizer.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Thanks,
Shawn



答复: removing duplicates

2013-08-21 Thread Liu
This picture is extracted from apache-solr-ref-guide-4.4.pdf ,Maybe it will
help you.
You could download the document from
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/

-邮件原件-
发件人: Ali, Saqib [mailto:docbook@gmail.com] 
发送时间: 2013年8月22日 5:15
收件人: solr-user@lucene.apache.org
主题: removing duplicates

hello,

We have documents that are duplicates i.e. the ID is different, but rest of
the fields are same. Is there a query that can remove duplicate, and just
leave one copy of the document on solr? There is one numeric field that we
can key off for find duplicates.

Please advise.

Thanks


How to avoid underscore sign indexing problem?

2013-08-21 Thread Floyd Wu
When using StandardAnalyzer to tokenize string "Pacific_Rim" will get

ST
textraw_bytesstartendtypeposition
pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]0111

How to make this string to be tokenized to these two tokens "Pacific",
"Rim"?
Set _ as stopword?
Please kindly help on this.
Many thanks.

Floyd


Re: 4.3 Cloud looks good on the outside, but lots of errors in the logs

2013-08-21 Thread Shawn Heisey

On 8/21/2013 6:23 PM, dmarini wrote:

Shawn,Thanks for your reply. All of these suggestions look like good ideas
and I will follow up. We are running Solr via the Jetty process on windows
as well as all of our zookeepers on the same boxes as the clouds. The reason
for this is that we're on EC2 servers so it gets ultra expensive to have a 6
box setup just to have zookeepers on separate boxes from the solr instances.


You can have zookeeper on the same host as Solr, that's no problem.  You 
should drop to just three total zookeepers, one per node, and use the 
chroot method to keep things separate.  You can probably run zookeeper 
with a max heap of 256MB, but it likely would never need more than 
512MB.  It doesn't use much memory at all.



Each of our Windows boxes has 8GB of RAM, with roughly 35 - 40% of it still
seemingly free. Is there a tool or some way we can identify for certain if
we're running into memory issues?I like your zookeeper idea and I didn't
know that this was feasible. I will get a test bed set up that way soon.As
for indexes, each cloud has multiple collections but we're looking at the
largest entire cloud (multiple indexes) being about 200MB, each collection
is between 50 and 100MB and I don't see them getting much bigger than that
per index (but I do see more indexes being added to the clouds).


With indexes that small, I would run each Jetty/Solr with a max heap of 
1GB.  With three of them per server, that will mean that Solr is using 
3GB of RAM, leaving 5GB for the OS disk cache.  You could probably bump 
that to 1.5 or 2GB and still be OK.



Is there a definitive advantage to running Solr on a linux box
over windows? I need to be able to justify the time and effort it will take
to get up to speed on a non-familiar OS if we're going to go that route but
if there's a good enough reason I don't see why not.


Linux manages memory better than Windows, and ext4 is a much better 
filesystem than NTFS.  If you are familiar with Windows, there's nothing 
wrong with continuing to use it, except for the fact that you have to 
give Microsoft a few hundred bucks per machine for a server OS when you 
take it into production.  You can run Linux for free.



--Would it be helpful to
have the zookeeper ensemble on a different disk drive than the clouds? --Can
the chattiness of all of the replication and zookeeper communication for
multiple clouds/collections cause any of these issues (We do have some
collections that are in constant flux with 1 - 5 requests each second, which
we gather up and send to solr in batches of 250 documents or a 10 second
flush)?


It never hurts to have things separated so they are on different disks, 
but SolrCloud will put hardly any load on zookeeper, so I don't think it 
matters much.  It is Solr itself that will take that load.


Thanks,
Shawn



Re: 4.3 Cloud looks good on the outside, but lots of errors in the logs

2013-08-21 Thread dmarini
Shawn,Thanks for your reply. All of these suggestions look like good ideas
and I will follow up. We are running Solr via the Jetty process on windows
as well as all of our zookeepers on the same boxes as the clouds. The reason
for this is that we're on EC2 servers so it gets ultra expensive to have a 6
box setup just to have zookeepers on separate boxes from the solr instances. 
Each of our Windows boxes has 8GB of RAM, with roughly 35 - 40% of it still
seemingly free. Is there a tool or some way we can identify for certain if
we're running into memory issues?I like your zookeeper idea and I didn't
know that this was feasible. I will get a test bed set up that way soon.As
for indexes, each cloud has multiple collections but we're looking at the
largest entire cloud (multiple indexes) being about 200MB, each collection
is between 50 and 100MB and I don't see them getting much bigger than that
per index (but I do see more indexes being added to the clouds).A few more
questions:--Is there a definitive advantage to running Solr on a linux box
over windows? I need to be able to justify the time and effort it will take
to get up to speed on a non-familiar OS if we're going to go that route but
if there's a good enough reason I don't see why not.--Would it be helpful to
have the zookeeper ensemble on a different disk drive than the clouds? --Can
the chattiness of all of the replication and zookeeper communication for
multiple clouds/collections cause any of these issues (We do have some
collections that are in constant flux with 1 - 5 requests each second, which
we gather up and send to solr in batches of 250 documents or a 10 second
flush)?Thanks again for your reply and suggestions, they are much
appreciated.--Dave  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/4-3-Cloud-looks-good-on-the-outside-but-lots-of-errors-in-the-logs-tp4085806p4085984.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Geo spatial clustering of points

2013-08-21 Thread Chris Atkinson
Did you get any resolution for this? I'm about to implement something
identical.
On 3 Jul 2013 23:03, "Jeroen Steggink"  wrote:

> Hi,
>
> I'm looking for a way to clustering (or should I call it group) geo
> spatial points on map based on the current zoom level and get the median
> coordinate for that cluster.
> Let's say I'm on the world level, and I want to cluster spatial points
> within a 1000 km radius. When I zoom in I only want to get the clustered
> points for that boundary. Let's say all the points within the US and
> cluster them within a 500 km radius.
>
> I'm using Solr 4.3.0 and looked into SpatialRecursivePrefixTreeFiel**dType
> with faceting. However, I'm not sure if the geohashes are of any use for
> clustering points.
>
> Does anyone have any experience with geo spatial clustering with Solr?
>
> Regards,
>
> jeroen
>
>
>


RE: removing duplicates

2013-08-21 Thread Petersen, Robert
This would describe the facet parameters we're talking about:

http://wiki.apache.org/solr/SimpleFacetParameters

Query something like this:
http://localhost:8983/solr/select?q=*:*&fl=id&rows=0&facet=true&facet.limit=-1&facet.field=&facet.mincount=2

Then filter on each facet returned with a filter query described here: 
http://wiki.apache.org/solr/CommonQueryParameters
Example: q=*:*&fq=:

Then you would have to get all ids returned and delete all but the first one 
using some app...

Thanks 
Robi


-Original Message-
From: Ali, Saqib [mailto:docbook@gmail.com] 
Sent: Wednesday, August 21, 2013 2:34 PM
To: solr-user@lucene.apache.org
Subject: Re: removing duplicates

Thanks Aloke and Robert. Can you please give me code/query snippets?
(newbie here)


On Wed, Aug 21, 2013 at 2:31 PM, Aloke Ghoshal  wrote:

> Hi,
>
> Facet by one of the duplicate fields (probably by the numeric field 
> that you mentioned) and set facet.mincount=2.
>
> Regards,
> Aloke
>
>
> On Thu, Aug 22, 2013 at 2:44 AM, Ali, Saqib  wrote:
>
> > hello,
> >
> > We have documents that are duplicates i.e. the ID is different, but 
> > rest
> of
> > the fields are same. Is there a query that can remove duplicate, and 
> > just leave one copy of the document on solr? There is one numeric 
> > field that
> we
> > can key off for find duplicates.
> >
> > Please advise.
> >
> > Thanks
> >
>



Re: removing duplicates

2013-08-21 Thread Aloke Ghoshal
Hi,

This will help you identify the duplicates:
q=*:*&fl=id&facet=true&facet.mincount=2&rows=0&facet.field=

To actually remove them from Solr, you will have to do something like
Robert suggested. Write an application that uses the results to build a
delete by id query (
http://wiki.apache.org/solr/UpdateXmlMessages#A.22delete.22_documents_by_ID_and_by_Query
).

Regards,
Aloke


On Thu, Aug 22, 2013 at 3:04 AM, Ali, Saqib  wrote:

> Thanks Aloke and Robert. Can you please give me code/query snippets?
> (newbie here)
>
>
> On Wed, Aug 21, 2013 at 2:31 PM, Aloke Ghoshal 
> wrote:
>
> > Hi,
> >
> > Facet by one of the duplicate fields (probably by the numeric field that
> > you mentioned) and set facet.mincount=2.
> >
> > Regards,
> > Aloke
> >
> >
> > On Thu, Aug 22, 2013 at 2:44 AM, Ali, Saqib 
> wrote:
> >
> > > hello,
> > >
> > > We have documents that are duplicates i.e. the ID is different, but
> rest
> > of
> > > the fields are same. Is there a query that can remove duplicate, and
> just
> > > leave one copy of the document on solr? There is one numeric field that
> > we
> > > can key off for find duplicates.
> > >
> > > Please advise.
> > >
> > > Thanks
> > >
> >
>


Re: removing duplicates

2013-08-21 Thread Ali, Saqib
Thanks Aloke and Robert. Can you please give me code/query snippets?
(newbie here)


On Wed, Aug 21, 2013 at 2:31 PM, Aloke Ghoshal  wrote:

> Hi,
>
> Facet by one of the duplicate fields (probably by the numeric field that
> you mentioned) and set facet.mincount=2.
>
> Regards,
> Aloke
>
>
> On Thu, Aug 22, 2013 at 2:44 AM, Ali, Saqib  wrote:
>
> > hello,
> >
> > We have documents that are duplicates i.e. the ID is different, but rest
> of
> > the fields are same. Is there a query that can remove duplicate, and just
> > leave one copy of the document on solr? There is one numeric field that
> we
> > can key off for find duplicates.
> >
> > Please advise.
> >
> > Thanks
> >
>


RE: removing duplicates

2013-08-21 Thread Petersen, Robert
Hi

Perhaps you could query for all documents asking for the id field to be 
returned and then facet on the field you say you can key off of for duplicates. 
 Set the facet mincount to 2, then you would have to filter on each facet value 
and page through all doc IDs (except skip the first document) for each returned 
facet and delete by ID using a small app or something like that.  Spin all the 
deletes into the index and then do a commit at the end.  I think that would do 
it.

Thanks
Robi

-Original Message-
From: Ali, Saqib [mailto:docbook@gmail.com] 
Sent: Wednesday, August 21, 2013 2:15 PM
To: solr-user@lucene.apache.org
Subject: removing duplicates

hello,

We have documents that are duplicates i.e. the ID is different, but rest of the 
fields are same. Is there a query that can remove duplicate, and just leave one 
copy of the document on solr? There is one numeric field that we can key off 
for find duplicates.

Please advise.

Thanks



Re: removing duplicates

2013-08-21 Thread Aloke Ghoshal
Hi,

Facet by one of the duplicate fields (probably by the numeric field that
you mentioned) and set facet.mincount=2.

Regards,
Aloke


On Thu, Aug 22, 2013 at 2:44 AM, Ali, Saqib  wrote:

> hello,
>
> We have documents that are duplicates i.e. the ID is different, but rest of
> the fields are same. Is there a query that can remove duplicate, and just
> leave one copy of the document on solr? There is one numeric field that we
> can key off for find duplicates.
>
> Please advise.
>
> Thanks
>


removing duplicates

2013-08-21 Thread Ali, Saqib
hello,

We have documents that are duplicates i.e. the ID is different, but rest of
the fields are same. Is there a query that can remove duplicate, and just
leave one copy of the document on solr? There is one numeric field that we
can key off for find duplicates.

Please advise.

Thanks


Solr 4.4, enablePositionIncrements=true and PhraseQueries

2013-08-21 Thread Ronald K. Braun
Hello,

I'm working on an upgrade from solr 1.4.1 to 4.4.  One of my field
analyzers uses StopWordFilter, which as of 4.4 is forbidden to set
enablePositionIncrements to false.  As a consequence, some hand-constructed
phrase queries (basically generated via calls to
SolrPluginUtils.parseQueryStrings on field:value text snippets) seem to now
be failing relative to 1.4.1 because (I think) of the created "gaps" in
phrase query content.

By way of example, I have indexed text of the form "Old Ones" and query
text of the form "The Old Ones".  Debug output shows my phrase query being
generated as field:"? Old Ones" and that seems to not match indexed source
text of "Old Ones", presumably since there is no initial token to "fill the
gap".

With positionIncrements set to false (tested by setting LUCENE_43
temporarily in solrconfig) to bypass the forced 4.4 restriction, it does
what I expect (and what 1.4.1 does) in just outright ignoring the stop
words with a generated query of field:"Old Ones" that matches my source
text.

Is there a way to configure phrase queries to ignore gaps, or otherwise
ignore positioning information for missing/removed tokens?  Fiddling with
slops is not a viable option -- I need exact sequential matching on my
token sequences apart from stopword presence.  A workaround that occurred
was perhaps adding a position normalizer filter that resets the term
positions to sequential, but I'm hoping there may be some other
configuration option to restore backwards-compatible phrase matching given
the neutering of enablePositionIncrements.

Thanks!

Ron


Re: edismax type query in different sets of fields?

2013-08-21 Thread Rafael Calsaverini
Hum! It seems to be exactly what I need. Thanks! I'll look for it in the
docs.


Rafael Calsaverini
Data Scientist @ Catho 
cell: +55 11 7525.6222

*
*
*8d21881718d00d997686177be1c27360493b23ea0258f5e6534437e6*


On Wed, Aug 21, 2013 at 12:08 PM, Erick Erickson wrote:

> Have you tried "nested queries"? I think you can specify the full edismax
> syntax on the URL in a nested query for your second search field...
>
> Best,
> Erick
>
>
> On Tue, Aug 20, 2013 at 5:17 PM, Rafael Calsaverini <
> rafael.calsaver...@gmail.com> wrote:
>
> > Hi there,
> >
> >
> > suppose I have documents with fields that can be loosely grouped by what
> > kind of information they contain. As an example, imagine the documents
> > representing people and some fields are related to how they're called
> > (name, surname, nickname, etc...) and some fields are related to where
> they
> > live (street, city, state, ...).
> >
> > Imagine I have an application that provide two search fields to look for
> > people: who do you want to find, and where does he live. So I just get
> two
> > strings, and each of them might contain information related to multiple
> > fields.
> >
> > Is there a way to do something like edismax for each group of field?
> > Something like:
> >
> > (name^2 surname^2 nickname):(rafael calsaverini) AND (street city
> > state):(rua dos bobos sao paulo SP)
> >
> > or whatever is the adequate syntax.
> >
> > The alternative would be to build a parser for one of the fields, but if
> I
> > want to allow for a completely free field for the "where" part, it might
> > not be easy to predict what the user would put there. Is there anybody
> else
> > with a similar problem?
> >
> > Thanks.
> >
> >
> >
> > Rafael Calsaverini
> > Data Scientist @ Catho 
> > cell: +55 11 7525.6222
> >
> > *
> > *
> > *8d21881718d00d997686177be1c27360493b23ea0258f5e6534437e6*
> >
>


Re: loading solr from Pig?

2013-08-21 Thread Utkarsh Sengar
That's a good point, we load data from pig to solr everyday.

1. What we do:
Pig jobs creates a csv dump, scp it over to a solr node and UpdateCSV
request handler loads the data in solr. A complete rebuild of index for
about 50M documents (20GB) takes 20mins (pig job which pulls and processes
data in cassandra and UpdateCSV loads).

2. Alternate way:
Another way I explored was writing a PIG UDF which POSTS to solr. But batch
http posts were slower than a CSV load for a full index rebuild (and that
was an important usecase for us).

These might not be the best practices, would like to know how others
handling this problem.

Thanks,
-Utkarsh



On Wed, Aug 21, 2013 at 11:29 AM, geeky2  wrote:

> Hello All,
>
> Is anyone loading Solr from a Pig script / process?
>
> I was talking to another group in our company and they have standardized on
> MongoDB instead of Solr - apparently there is very good support between
> MongoDB and Pig - allowing users to "stream" data directly from a Pig
> process in to MongoDB.
>
> Does solr have anything like this as well?
>
> thx
> mark
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/loading-solr-from-Pig-tp4085933.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks,
-Utkarsh


Re: 4.3 Cloud looks good on the outside, but lots of errors in the logs

2013-08-21 Thread Shawn Heisey

On 8/20/2013 10:52 PM, dmarini wrote:

I'm running a solr 4.3 cloud in a 3 machine setup that has the following
configuration:
each machine is running 3 zookeepers on different ports
each machine is running a jetty instance PER zookeeper..

Essentially, this gives us the ability to host 3 isolated clouds across the
3 machines. 3 shards per collection with each machine hosting a shard and
replicas of the other 2 shards. default timeout for the zookeeper
communication is 60 seconds. At any time I can go to any machine/port combo
and go to the "Cloud" view and everything looks peachy. All nodes are green
and each shard of each collection has an active leader (albeit they all
eventually have the SAME leader, which does stump me as to how it gets that
way but one thing at a time).

Despite everything looking good, looking at the logs on any of the nodes is
enough to make me wonder how the cloud is functioning at all, with errors
like the following:

*Error while trying to recover.
core=MyCollection.shard2.replica:org.apache.solr.client.solrj.SolrServerException:
Timeout occured while waiting response from server at:
http://MYNODE2.MYDOMAIN.LOCAL:8983/solr
* (what's funny about this one is that MYNODE2:8983/solr responds with no
issue and appears healthy (all green), but these errors are coming in 5 to
10 at a time for MYNODE1 and MYNODE3.)

*Org.apache.solr.common.SolrException: I was asked to wait on state
recovering for MYNODE3.MYDOMAIN.LOCAL:8983_solr but I still do not see the
requested state. I see state: active live:true* (this is from the leader
node: MYNODE2:8983/solr logs from the admin site.. Again, all appears ok and
read/writes to the cloud are working.)

To top it all off, we have monitors that call out to the solr/admin/ping
handler for each node of each cloud and normally these pings are very quick
(under 100ms).. but at various points throughout the day, the 60 second
timeout is surpassed for the monitor and it raises an alarm only to have the
next ping go right back to quick.

I've done checks against resource usage on the machines when I see these
ping slowdowns but I'm not seeing any memory pressure (in terms of free
memory) or cpu thrashing. I'm at a loss for what can cause the system to be
so unstable and would appreciate any thoughts on any of the messages from
the log or proposed ideas for the cause of the ping issue.

Also, to confirm, there is currently no way to force a leader election
correct? with all of our collections inevitably rolling themselves to the
same leader over time, I feel that the performance will suffer since all
writes will be trying to happen on the same machine when there are other
healthy machines that can be the leader for the other shards to allow a
better distribution of requests


I am guessing that you are running into resource starvation, mostly
memory.  You've probably got a lot of slow garbage collections, and you
might even be going to swap (UNIX) or the pagefile (Windows) from
allocating too much memory to Solr instances.  You may find that you
need to add memory to the machines.  I wouldn't try what you are doing
without at least 16GB per server, and depending on how big those indexes
are, I might want 32 or 64GB.

The first thing I recommend is getting rid of all those extra
zookeepers.  You can run many clouds on one three-node zookeeper
ensemble.  You just need to have zkHost parameters like the following,
where "/test1" gets replaced by a different chroot value for each cloud.
 You do not need the chroot on every server in the list, just once at
the end:

-DzkHost=server1:2181,server2:2181,server3:2181/test1

The next thing is to size the max heap appropriately for each of your
Solr instances.  The total amount of RAM allocated to all the JVMs -
zookeeper and Solr - must not exceed the total memory in the server, and
you should have RAM left over for OS disk caching as well.  Unless your
max heap is below 1GB, you'll also want to tune your garbage collection.

Included in the following wiki page are some good tips on memory and
garbage collection tuning:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn


Re: Solr Indexing Status

2013-08-21 Thread Shalin Shekhar Mangar
Yes, you can invoke
http://:/solr/dataimport?command=status which will return
how many Solr docs have been added etc.

On Wed, Aug 21, 2013 at 4:56 PM, Prasi S  wrote:
> Hi,
> I am using solr 4.4 to index csv files. I am using solrj for this. At
> frequent intervels my user may request for "Status". I have to send get
> something like in DIH " Indexing in progress.. Added xxx documents".
>
> Is there anything like in dih, where we can fire a command=status to get
> the status of indexing for files.
>
>
> Thanks,
> Prasi



-- 
Regards,
Shalin Shekhar Mangar.


Re: Data Import faile in solr 4.3.0

2013-08-21 Thread Shalin Shekhar Mangar
I guess you are trying to index another Solr index via DIH's
SolrEntityProcessor. That processor wasn't really designed for
migrating huge indexes. You're better off re-indexing content directly
to another Solr.

As far as this error is concerned, my guess is that it is due to an
error thrown by your 3.5 server while deep paging through the
response.

On Wed, Aug 21, 2013 at 6:31 PM, Montu v Boda
 wrote:
> when we import the all index of solr 3.5 to 4.3 then import goes fail each
> and every time due to below error.
>
> Caused by: org.apache.solr.common.SolrException: parsing error
> Caused by: org.paache.http.MalformedChunkCodingException: Unexpected content
> at the end of chunk
>
>
> we have 50 lac document is index in solr 3.5. approx the size of that index
> is 400 GB.
>
>
> Thanks & REgards
> Montu v Boda
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Data-Import-faile-in-solr-4-3-0-tp4085868.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


loading solr from Pig?

2013-08-21 Thread geeky2
Hello All,

Is anyone loading Solr from a Pig script / process?

I was talking to another group in our company and they have standardized on
MongoDB instead of Solr - apparently there is very good support between
MongoDB and Pig - allowing users to "stream" data directly from a Pig
process in to MongoDB.

Does solr have anything like this as well?

thx
mark







--
View this message in context: 
http://lucene.472066.n3.nabble.com/loading-solr-from-Pig-tp4085933.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to SOLR file in svn repository

2013-08-21 Thread jiunarayan
I have a svn respository and svn file path. How can I SOLR search content on
the svn file.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-SOLR-file-in-svn-repository-tp4085904.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sharing SolrCloud collection configs w/overrides

2013-08-21 Thread Tim Vaillancourt
Well, the mention of DIH is a bit off-topic. I'll simplify and say all I
need is the ability to set ANY variables in solrconfig.xml without having
to make N number of copies of the same configuration to achieve that.
Essentially I need 10+ collections to use the exact same config dir in
Zookeeper with minor/trivial differences set in variables.

Your proposal of taking in values at core creation-time is a neat one and
would be a very flexible solution for a lot of use cases. My only concern
for my really-specific use cae is that I'd be setting DB user/passwords via
plain-text HTTP calls, but having this feature is better than not.

In a perfect world I'd like to be able to include files in Zookeeper (like
XInclude) that are outside the common config dir (eg:
'/configs/sharedconfig') all the collections would be sharing. On the other
hand, that sort of solution would open up the Zookeeper layout to arbitrary
files and could end up in a nightmare if not done carefully, however.

Would it be possible for Solr to support specifying multiple configs at
collection creation, that are merged or concatenated. This idea sounds
terrible to me even at this moment, but I wonder if there is something in
there..

Tim


Re: Solr Indexing Status

2013-08-21 Thread Furkan KAMACI
You know the size of CSV files and you can calculate it if you want.


2013/8/21 Prasi S 

> Hi,
> I am using solr 4.4 to index csv files. I am using solrj for this. At
> frequent intervels my user may request for "Status". I have to send get
> something like in DIH " Indexing in progress.. Added xxx documents".
>
> Is there anything like in dih, where we can fire a command=status to get
> the status of indexing for files.
>
>
> Thanks,
> Prasi
>


Filter results based on their number of terms, relative to the search query

2013-08-21 Thread Spyros Kapnissis
Hi,

We have an index of several small expressions, let's say 4-20 words on average. 
I have a requirement to search for "approximate" results only, relevant to the 
search query. 

For example, when someone searches for (+a +b +c), we would like to return only 
these expressions that contain all terms, plus one irrelevant term at most (eg. 
a,b,c,d), and filter out any results that are longer.

Any ideas on this? One thought I had is that maybe we could use a filter 
function query using Similarity's queryNorm along with the coord factor. Is 
this even possible?

Thanks,
Spyros


Re: Facing Solr performance during query search

2013-08-21 Thread Jack Krupansky
I'd like to see a screen shot of a search results web page that has 2,000 
facets.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Wednesday, August 21, 2013 11:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Facing Solr performance during query search

~2,000 facets kind of worries me, but let's skip that for now.

Your original problem statement was that replication was the
thing that changed. So the first thing I'd do is not replicate. If you
turn it off, do your slaves still perform poorly?

Allocating that much RAM to the JVM is probably not a great idea,
see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

So far nothing is jumping out at me.


On Wed, Aug 21, 2013 at 4:09 AM, sivaprasad 
wrote:



Here I am providing the slave solrconfig information.

   1
   
  35
  35
   
   
  6
  1
   


1024
   20

  

  


  

  static firstSearcher warming in
solrconfig.xml

  

false


The slave will poll for every 1hr.

The field list is given below.










stored="true"/>


















We have configured ~2000 facets and the machine configuration is given
below.

6 core processor, 22528 GB RAM allotted to JVM . The solr version is 4.1.0

Please let me know, if you require any more information.



--
View this message in context:
http://lucene.472066.n3.nabble.com/Facing-Solr-performance-during-query-search-tp4085426p4085825.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: What filter to use to search with spaces omitted/included between words?

2013-08-21 Thread Erick Erickson
Jack:

That's a consequence of keyword tokenizer I hadn't thought of before

Erick


On Wed, Aug 21, 2013 at 11:17 AM, Jack Krupansky wrote:

> The reason that a query of "bestbuy" matches indexing of "best buy" in
> this case is that the keyword tokenizer treats the entire input text as one
> token, including the space between "best" and "buy" and then the WDF treats
> any embedded white space as if it were punctuation and then the catenateAll
> attribute causes "best" and "buy" to be concatenated to form "bestbuy".
>
>
> -- Jack Krupansky
>
> -Original Message- From: Erick Erickson
> Sent: Wednesday, August 21, 2013 11:12 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: What filter to use to search with spaces omitted/included
> between words?
>
> Keyword tokenizer will probably cause you problems, since you'll never
> match "best".
> and searching name:best AND name:buy would fail as well.
>
> And I'm surprised this is working at all, I'd really scrutinize why bestbuy
> matches an
> index with Best Buy, that makes no sense on the surface.
>
> If you have a relatively small vocabulary, synonyms might work for you.
>
> Best,
> Erick
>
>
> On Tue, Aug 20, 2013 at 8:04 PM, Utkarsh Sengar  >wrote:
>
>  Let me take that back, this actually works. q=bestbuy matches "Best Buy"
>> and documents are returned.
>>
>> > positionIncrementGap="100">
>>  
>>> generateWordParts="1" generateNumberParts="1"
>>
>> catenateWords="1"
>>
>> catenateNumbers="1"
>>
>> catenateAll="0"
>>
>> preserveOriginal="1"/>
>> 
>> 
>> 
>> 
>> > generateWordParts="1" generateNumberParts="1"
>>
>> catenateWords="1"
>>
>> catenateNumbers="1"
>>
>> catenateAll="0"
>>
>> preserveOriginal="1"/>
>> 
>> 
>> 
>> 
>>
>> I was using ,
>> replacing
>> it with  did the
>> trick.
>> Not sure how it worked. The field value I am searching is "Best Buy", but
>> when I search for "bestbuy", it returns a result.
>>
>> Thanks,
>> -Utkarsh
>>
>>
>>
>> On Tue, Aug 20, 2013 at 4:48 PM, Utkarsh Sengar > >wrote:
>>
>> > Thanks Tamanjit and Erick.
>> > I tried out the filters, most of the usecases work except "q=bestbuy".
>> > As
>> > mentioned by Erick, that is a hard one to crack.
>> >
>> > I am looking into DictionaryCompoundWordTokenFil**terFactory but
>> compound
>> > words like these:
>> >
>> http://www.manythings.org/**vocabulary/lists/a/words.php?**
>> f=compound_wordsandgenericenglish
>>  words, it won't cover my need of custom compound words
>>
>> > of store names like BestBuy, WalMart or CirtuitCity.
>> >
>> > Thanks,
>> > -Utkarsh
>> >
>> >
>> > On Tue, Aug 20, 2013 at 4:43 AM, Jack Krupansky <
>> j...@basetechnology.com
>> >wrote:
>> >
>> >> You could either have a synonym filter to replace "bestbuy" with "best
>> >> buy" or use DictionaryCompoundWordTokenFilterFactory to do the
>> same.
>> >>
>> >> See:
>> >> http://lucene.apache.org/core/4_4_0/analyzers-common/org/
>> >> apache/lucene/analysis/compound/DictionaryCompoundWordTokenFil
>> 
>> >> terFactory.html<
>> http://lucene.apache.org/core/**4_4_0/analyzers-common/org/**
>> apache/lucene/analysis/**compound/**DictionaryCompoundWordTokenFil**
>> terFactory.html
>> >
>> >>
>> >> There are some examples in my book, but they are for German compound
>> >> words since that was the original primary intent for this filter. But
>> >> it
>> >> should work for any words since it is a simple dictionary.
>> >>
>> >> -- Jack Krupansky
>> >>
>> >> -Original Message- From: Erick Erickson
>> >> Sent: Tuesday, August 20, 2013 7:21 AM
>> >> To: solr-user@lucene.apache.org
>> >> Subject: Re: What filter to use to search with spaces omitted/included
>> >> between words?
>> >>
>> >>
>> >> Also consider WordDelimterFilterFactory, which will break up the
>> >> tokens on upper/lower case transitions.
>> >>
>> >> to get relevance, consider edismax-style query parsers and adding
>> >> automatic phrase generation (with boosts usually).
>> >>
>> >> This one will be a problem:
>> >> q=bestbuy
>> >>
>> >> There's no good generic way to get this to split up. One
>> >> possibility is to use synonyms if the list is known, but
>> >> otherwise there's no information here to distinguish it
>> >> from "legitimate" words.
>> >>
>> >> edgeNgrams work on _tokens_, not words so I doubt
>> >> they would help in this case either since there is only
>> >> one token.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >>
>> >> On Tue, Aug 20, 2013 at 3:16 AM, tamanjit.bin...@yahoo.co.in <
>> >> tamanjit.bin...@yahoo.co.in> wrote:
>> >>
>> >>  Additionally, if you dont want re

Re: Facing Solr performance during query search

2013-08-21 Thread Erick Erickson
~2,000 facets kind of worries me, but let's skip that for now.

Your original problem statement was that replication was the
thing that changed. So the first thing I'd do is not replicate. If you
turn it off, do your slaves still perform poorly?

Allocating that much RAM to the JVM is probably not a great idea,
see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

So far nothing is jumping out at me.


On Wed, Aug 21, 2013 at 4:09 AM, sivaprasad wrote:

> Here I am providing the slave solrconfig information.
> 
>1
>
>   35
>   35
>
> class="org.apache.lucene.index.ConcurrentMergeScheduler">
>   6
>   1
>
> 
> 
> 1024
>20
> 
>   
>
>   
> 
> 
>   
> 
>   static firstSearcher warming in
> solrconfig.xml
> 
>   
> 
> false
> 
>
> The slave will poll for every 1hr.
>
> The field list is given below.
>
>  stored="true" />
>  stored="true"  omitNorms="false" termVectors="true"/>
>  stored="false"/>
>  stored="true" omitNorms="false"/>
>  multiValued="true" />
>  multiValued="true" />
>  multiValued="true" />
>  omitNorms="false" termVectors="true"/>
>  indexed="true"
> omitNorms="false"/>
> 
> 
>  stored="true"/>
>  stored="true"/>
>  stored="false"
> termVectors="true"/>
>  stored="true"/>
>  stored="true"/>
>  stored="true"/>
>  stored="true"/>
>  stored="true"/>
>  stored="true"/>
> 
>  multiValued="true"/>
> 
>  stored="false"/>
> 
> 
>
> We have configured ~2000 facets and the machine configuration is given
> below.
>
> 6 core processor, 22528 GB RAM allotted to JVM . The solr version is 4.1.0
>
> Please let me know, if you require any more information.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Facing-Solr-performance-during-query-search-tp4085426p4085825.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: What filter to use to search with spaces omitted/included between words?

2013-08-21 Thread Jack Krupansky
The reason that a query of "bestbuy" matches indexing of "best buy" in this 
case is that the keyword tokenizer treats the entire input text as one 
token, including the space between "best" and "buy" and then the WDF treats 
any embedded white space as if it were punctuation and then the catenateAll 
attribute causes "best" and "buy" to be concatenated to form "bestbuy".


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Wednesday, August 21, 2013 11:12 AM
To: solr-user@lucene.apache.org
Subject: Re: What filter to use to search with spaces omitted/included 
between words?


Keyword tokenizer will probably cause you problems, since you'll never
match "best".
and searching name:best AND name:buy would fail as well.

And I'm surprised this is working at all, I'd really scrutinize why bestbuy
matches an
index with Best Buy, that makes no sense on the surface.

If you have a relatively small vocabulary, synonyms might work for you.

Best,
Erick


On Tue, Aug 20, 2013 at 8:04 PM, Utkarsh Sengar 
wrote:



Let me take that back, this actually works. q=bestbuy matches "Best Buy"
and documents are returned.


 
   










I was using , replacing
it with  did the trick.
Not sure how it worked. The field value I am searching is "Best Buy", but
when I search for "bestbuy", it returns a result.

Thanks,
-Utkarsh



On Tue, Aug 20, 2013 at 4:48 PM, Utkarsh Sengar wrote:

> Thanks Tamanjit and Erick.
> I tried out the filters, most of the usecases work except "q=bestbuy". 
> As

> mentioned by Erick, that is a hard one to crack.
>
> I am looking into DictionaryCompoundWordTokenFilterFactory but compound
> words like these:
>
http://www.manythings.org/vocabulary/lists/a/words.php?f=compound_wordsandgeneric 
english words, it won't cover my need of custom compound words

> of store names like BestBuy, WalMart or CirtuitCity.
>
> Thanks,
> -Utkarsh
>
>
> On Tue, Aug 20, 2013 at 4:43 AM, Jack Krupansky wrote:
>
>> You could either have a synonym filter to replace "bestbuy" with "best
>> buy" or use DictionaryCompoundWordTokenFil**terFactory to do the same.
>>
>> See:
>> http://lucene.apache.org/core/**4_4_0/analyzers-common/org/**
>> apache/lucene/analysis/**compound/**DictionaryCompoundWordTokenFil**
>> terFactory.html<
http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html
>
>>
>> There are some examples in my book, but they are for German compound
>> words since that was the original primary intent for this filter. But 
>> it

>> should work for any words since it is a simple dictionary.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Erick Erickson
>> Sent: Tuesday, August 20, 2013 7:21 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: What filter to use to search with spaces omitted/included
>> between words?
>>
>>
>> Also consider WordDelimterFilterFactory, which will break up the
>> tokens on upper/lower case transitions.
>>
>> to get relevance, consider edismax-style query parsers and adding
>> automatic phrase generation (with boosts usually).
>>
>> This one will be a problem:
>> q=bestbuy
>>
>> There's no good generic way to get this to split up. One
>> possibility is to use synonyms if the list is known, but
>> otherwise there's no information here to distinguish it
>> from "legitimate" words.
>>
>> edgeNgrams work on _tokens_, not words so I doubt
>> they would help in this case either since there is only
>> one token.
>>
>> Best
>> Erick
>>
>>
>> On Tue, Aug 20, 2013 at 3:16 AM, tamanjit.bin...@yahoo.co.in <
>> tamanjit.bin...@yahoo.co.in> wrote:
>>
>>  Additionally, if you dont want results like q=best and result=bestbuy;
>>> you
>>> can use >> pattern="\W+" replacement=""/> to actually replace whitespaces with
>>> nothing.
>>>
>>>
>>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**
>>> s#CharFilterFactories<
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories
>
>>> <
>>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**
>>> s#CharFilterFactories<
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories
>
>>> >
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.**nabble.com/What-filter-to-use-**
>>> to-search-with-spaces-omitted-**included-between-words-**
>>> tp4085576p4085601.html<
http://lucene.472066.n3.nabble.com/What-filter-to-use-to-search-with-spaces-omitted-included-between-words-tp4085576p4085601.html
>
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>
>
> --
> Thanks,
> -Utkarsh
>



--
Thanks,
-Utkarsh





Re: Prevent Some Keywords at Analyzer Step

2013-08-21 Thread Furkan KAMACI
How can I remove unnecessary tokens after shingle filter?


2013/8/20 Jeff Porter 

> Why not use ShingleFilterFactory and then match on that token if you find
> it?
>
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
>
>
> Jeff Porter
> co-founder
> email: jpor...@o2ointeractive.com
> mobile: +1-303-332-4006
>
> On Aug 19, 2013, at 11:23 AM, Dan Davis wrote:
>
> > This is an interesting topic - my employer is a medical library and there
> > are many keywords that may need to be aliased in various ways, and 2 or 3
> > word phrases that perhaps should be treated specially.   Jack, can you
> give
> > me an example of how to do that sort of thing?Perhaps I need to buy
> > your almost released Deep Dive book...
> > Sorry to be too tangential - it is my strange way.
> >
> >
> > On Mon, Aug 19, 2013 at 12:32 PM, Jack Krupansky <
> j...@basetechnology.com>wrote:
> >
> >> Okay, but what is it that you are trying to "prevent"??
> >>
> >> And, "diet follower" is a phrase, not a keyword or term.
> >>
> >> So, I'm still baffled as to what you are really trying to do. Trying
> >> explaining it in plain English.
> >>
> >> And given this same input, how would it be queried?
> >>
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Furkan KAMACI
> >> Sent: Monday, August 19, 2013 11:22 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Prevent Some Keywords at Analyzer Step
> >>
> >>
> >> Let's assume that my sentence is that:
> >>
> >> *Alice is a diet follower*
> >>
> >> My special keyword => *diet follower*
> >>
> >> Tokens will be:
> >>
> >> Token 1) Alice
> >> Token 2) is
> >> Token 3) a
> >> Token 4) diet
> >> Token 5) follower
> >> Token 6) *diet follower*
> >>
> >>
> >> 2013/8/19 Jack Krupansky 
> >>
> >> Your example doesn't "prevent" any keywords.
> >>>
> >>> You need to elaborate the specific requirements with more detail.
> >>>
> >>> Given a long stream of text, what tokenization do you expect in the
> index?
> >>>
> >>> -- Jack Krupansky
> >>>
> >>> -Original Message- From: Furkan KAMACI Sent: Monday, August 19,
> >>> 2013 8:07 AM To: solr-user@lucene.apache.org Subject: Prevent Some
> >>> Keywords at Analyzer Step
> >>> Hi;
> >>>
> >>> I want to write an analyzer that will prevent some special words. For
> >>> example sentence to be indexed is:
> >>>
> >>> diet follower
> >>>
> >>> it will tokenize it as like that
> >>>
> >>> token 1) diet
> >>> token 2) follower
> >>> token 3) diet follower
> >>>
> >>> How can I do that with Solr?
> >>>
> >>>
> >>
>
>


Re: What filter to use to search with spaces omitted/included between words?

2013-08-21 Thread Erick Erickson
Keyword tokenizer will probably cause you problems, since you'll never
match "best".
and searching name:best AND name:buy would fail as well.

And I'm surprised this is working at all, I'd really scrutinize why bestbuy
matches an
index with Best Buy, that makes no sense on the surface.

If you have a relatively small vocabulary, synonyms might work for you.

Best,
Erick


On Tue, Aug 20, 2013 at 8:04 PM, Utkarsh Sengar wrote:

> Let me take that back, this actually works. q=bestbuy matches "Best Buy"
> and documents are returned.
>
>  positionIncrementGap="100">
>  
> generateWordParts="1" generateNumberParts="1"
>
> catenateWords="1"
>
> catenateNumbers="1"
>
> catenateAll="0"
>
> preserveOriginal="1"/>
> 
> 
> 
> 
>  generateWordParts="1" generateNumberParts="1"
>
> catenateWords="1"
>
> catenateNumbers="1"
>
> catenateAll="0"
>
> preserveOriginal="1"/>
> 
> 
> 
> 
>
> I was using , replacing
> it with  did the trick.
> Not sure how it worked. The field value I am searching is "Best Buy", but
> when I search for "bestbuy", it returns a result.
>
> Thanks,
> -Utkarsh
>
>
>
> On Tue, Aug 20, 2013 at 4:48 PM, Utkarsh Sengar  >wrote:
>
> > Thanks Tamanjit and Erick.
> > I tried out the filters, most of the usecases work except "q=bestbuy". As
> > mentioned by Erick, that is a hard one to crack.
> >
> > I am looking into DictionaryCompoundWordTokenFilterFactory but compound
> > words like these:
> >
> http://www.manythings.org/vocabulary/lists/a/words.php?f=compound_wordsandgeneric
>  english words, it won't cover my need of custom compound words
> > of store names like BestBuy, WalMart or CirtuitCity.
> >
> > Thanks,
> > -Utkarsh
> >
> >
> > On Tue, Aug 20, 2013 at 4:43 AM, Jack Krupansky  >wrote:
> >
> >> You could either have a synonym filter to replace "bestbuy" with "best
> >> buy" or use DictionaryCompoundWordTokenFil**terFactory to do the same.
> >>
> >> See:
> >> http://lucene.apache.org/core/**4_4_0/analyzers-common/org/**
> >> apache/lucene/analysis/**compound/**DictionaryCompoundWordTokenFil**
> >> terFactory.html<
> http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html
> >
> >>
> >> There are some examples in my book, but they are for German compound
> >> words since that was the original primary intent for this filter. But it
> >> should work for any words since it is a simple dictionary.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Erick Erickson
> >> Sent: Tuesday, August 20, 2013 7:21 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: What filter to use to search with spaces omitted/included
> >> between words?
> >>
> >>
> >> Also consider WordDelimterFilterFactory, which will break up the
> >> tokens on upper/lower case transitions.
> >>
> >> to get relevance, consider edismax-style query parsers and adding
> >> automatic phrase generation (with boosts usually).
> >>
> >> This one will be a problem:
> >> q=bestbuy
> >>
> >> There's no good generic way to get this to split up. One
> >> possibility is to use synonyms if the list is known, but
> >> otherwise there's no information here to distinguish it
> >> from "legitimate" words.
> >>
> >> edgeNgrams work on _tokens_, not words so I doubt
> >> they would help in this case either since there is only
> >> one token.
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Tue, Aug 20, 2013 at 3:16 AM, tamanjit.bin...@yahoo.co.in <
> >> tamanjit.bin...@yahoo.co.in> wrote:
> >>
> >>  Additionally, if you dont want results like q=best and result=bestbuy;
> >>> you
> >>> can use  >>> pattern="\W+" replacement=""/> to actually replace whitespaces with
> >>> nothing.
> >>>
> >>>
> >>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**
> >>> s#CharFilterFactories<
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories
> >
> >>> <
> >>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**
> >>> s#CharFilterFactories<
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories
> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>> http://lucene.472066.n3.**nabble.com/What-filter-to-use-**
> >>> to-search-with-spaces-omitted-**included-between-words-**
> >>> tp4085576p4085601.html<
> http://lucene.472066.n3.nabble.com/What-filter-to-use-to-search-with-spaces-omitted-included-between-words-tp4085576p4085601.html
> >
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >
> >
> > --
> > Thanks,
> > -Utkarsh
> >
>
>
>
> --
> Thanks,
> -Utkarsh
>


Re: edismax type query in different sets of fields?

2013-08-21 Thread Erick Erickson
Have you tried "nested queries"? I think you can specify the full edismax
syntax on the URL in a nested query for your second search field...

Best,
Erick


On Tue, Aug 20, 2013 at 5:17 PM, Rafael Calsaverini <
rafael.calsaver...@gmail.com> wrote:

> Hi there,
>
>
> suppose I have documents with fields that can be loosely grouped by what
> kind of information they contain. As an example, imagine the documents
> representing people and some fields are related to how they're called
> (name, surname, nickname, etc...) and some fields are related to where they
> live (street, city, state, ...).
>
> Imagine I have an application that provide two search fields to look for
> people: who do you want to find, and where does he live. So I just get two
> strings, and each of them might contain information related to multiple
> fields.
>
> Is there a way to do something like edismax for each group of field?
> Something like:
>
> (name^2 surname^2 nickname):(rafael calsaverini) AND (street city
> state):(rua dos bobos sao paulo SP)
>
> or whatever is the adequate syntax.
>
> The alternative would be to build a parser for one of the fields, but if I
> want to allow for a completely free field for the "where" part, it might
> not be easy to predict what the user would put there. Is there anybody else
> with a similar problem?
>
> Thanks.
>
>
>
> Rafael Calsaverini
> Data Scientist @ Catho 
> cell: +55 11 7525.6222
>
> *
> *
> *8d21881718d00d997686177be1c27360493b23ea0258f5e6534437e6*
>


Re: Sharing SolrCloud collection configs w/overrides

2013-08-21 Thread Erick Erickson
Hmmm. I'm going to leave the DIH stuff for someone else, but could
you raise a JIRA (and assign it to me) to think about a way to add
a core.properties file to the collection creation step?

I haven't thought it through very well, but currently I think we just assign
some defaults. Some thought that needs to be put into what
kinds of things we'll allow to be set, I think you could shoot yourself in
the foot.

Maybe it's as simple as allowing more params for creation like
collection.coreName where each param of the form collection.blah=blort
gets an entry in the properties file blah=blort? Would that work for your
case?

Best,
Erick


On Tue, Aug 20, 2013 at 2:22 PM, Tim Vaillancourt wrote:

> Hey guys,
>
> I have a situation where I have a lot of collections that share the same
> core config in Zookeeper. For each of my SolrCloud collections, 99.9% of
> the config (schema.xml, solrcloud.xml) are the same, only the
> DataImportHandler parameters are different for different database
> names/credentials, per collection.
>
> To provide the different DIH credentials per collection, I currently upload
> many copies of the exact-same Solr config dir with 1 Xincluded file with
> the 4-5 database parameters that are different alongside the schema.xml and
> solrconfig.xml.
>
> I don't feel this ideal and is wasting space in Zookeeper considering most
> of my configs are duplicated.
>
> At a high level, is there a way for me to share one config in Zookeeper
> while having minor overrides to the variables?
>
> Is there a way for me to XInclude a file outside of my Zookeeper config
> dir, ie: could I XInclude arbitrary locations in Zookeeper so that I can
> have the same config dir for all collections and a file in Zookeeper that
> is external to the common config dir to apply the collection-specific
> overrides?
>
> To extend my question for Solr 4.4 core.properties files: am I stuck in the
> same boat under Solr 4.4 if I have say 10 collections sharing one config,
> but I want each to have a unique core.properties?
>
> Cheers!
>
> Tim
>


Re: convert text file to solr document where delimiter fields are fields of document

2013-08-21 Thread Jack Krupansky

Yes, post.jar supports csv files.

-- Jack Krupansky

-Original Message- 
From: bharat

Sent: Wednesday, August 21, 2013 1:57 AM
To: solr-user@lucene.apache.org
Subject: Re: convert text file to solr document where delimiter fields are 
fields of document


Thanks all of you for quick reply. Your guidance helps me lot.
I am new to Solr so may be it is basic questions :
1) Is there any way we can import csv file using post.jar? (As I am windows
user)
2) I declared DataImportHandlers, I use Solr Admin (default solr UI) to
import the data for DIH, so how can I do same with csv import?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/convert-text-file-to-solr-document-where-delimiter-fields-are-fields-of-document-tp4085611p4085815.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: get term frequency, just only keywords search

2013-08-21 Thread Jack Krupansky
Probably your best bet is to use the "debug.explain.structured" parameter, 
set to true, to get the XML version of the debug explain section and then 
you can traverse looking for the desired phrase and then the "phraseFreq".


But, be aware that the terms in a Lucene query have been "analyzed", so they 
won't necessarily be exactly the same as your source query characters.


My advise would be to abandon "phrase frequency" since it is probably more 
effort than it is worth. But if your management insists on having the 
feature, roll up your sleeves and write the code needed to ferret it out.


-- Jack Krupansky

-Original Message- 
From: danielitos85

Sent: Wednesday, August 21, 2013 4:41 AM
To: solr-user@lucene.apache.org
Subject: Re: get term frequency, just only keywords search

Thanks a lot guys,

@Jack in my search I use dismax (how defType) and I search either term or
phrase, but I need to get the number that show me how many time that term or
phrase is in the document.

I could get it from debugQuery but I would like get it directly from the
results.

What do you suggest?
Thanks a lot for support.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/get-term-frequency-just-only-keywords-search-tp4084510p4085831.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr Filter Query

2013-08-21 Thread Jack Krupansky
As with many features in Solr, there is no hard limit per se, but the "rule" 
is to use the feature in moderation.


If you find yourself using a "big" filter query, it likely means that you 
have chosen a poor design or are misusing Solr in some way. The response 
should be to correct your design, not try to continue using a "big" filter 
query.


As a generality, try to keep your URL length well under 2048 
characters/bytes, otherwise you're risking running into configuration issues 
with various parts of the network infrastructure.


Long URLs also make reading and troubleshooting log files rather difficult.

All of that said, please describe your use case. First, let's make sure that 
it is an appropriate use case for Solr.


-- Jack Krupansky

-Original Message- 
From: Prasi S

Sent: Wednesday, August 21, 2013 1:12 AM
To: solr-user@lucene.apache.org
Subject: Solr Filter Query

Hi,
Is there any limit on how big a filter query can be ?
What are the values that should be set properly for handling big filter
queries.


thanks,
Prasi 



Re: Solr 4.4 problem with loading DisMaxRequestHandler

2013-08-21 Thread Jack Krupansky
You must have upgraded from a very old release of Solr. There is no 
DisMaxRequestHandler.


Just use the standard request handler for "/select" in the Solr example 
config and then add a boolean for the "defType" parameter to set it to 
dismax to enable the dismax query parser.


-- Jack Krupansky

-Original Message- 
From: danielitos85

Sent: Wednesday, August 21, 2013 6:30 AM
To: solr-user@lucene.apache.org
Subject: Solr 4.4 problem with loading DisMaxRequestHandler

Hi guys,

I'm using a clean solr 4.4 installation and I have add in my solrconfig.xml
the following lines:



all
0.01
*:*
0
regex
   


but when I start my solr he return an error:

*Caused by: java.lang.ClassNotFoundException: solr.DisMaxRequestHandler*

In my dist folder I have all the defaults library and also in the lib folder
into my core
please, any suggests?

Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-4-problem-with-loading-DisMaxRequestHandler-tp4085842.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Prevent Some Keywords at Analyzer Step

2013-08-21 Thread Jeff Porter
Why not use ShingleFilterFactory and then match on that token if you find it?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory


Jeff Porter
co-founder
email: jpor...@o2ointeractive.com
mobile: +1-303-332-4006

On Aug 19, 2013, at 11:23 AM, Dan Davis wrote:

> This is an interesting topic - my employer is a medical library and there
> are many keywords that may need to be aliased in various ways, and 2 or 3
> word phrases that perhaps should be treated specially.   Jack, can you give
> me an example of how to do that sort of thing?Perhaps I need to buy
> your almost released Deep Dive book...
> Sorry to be too tangential - it is my strange way.
> 
> 
> On Mon, Aug 19, 2013 at 12:32 PM, Jack Krupansky 
> wrote:
> 
>> Okay, but what is it that you are trying to "prevent"??
>> 
>> And, "diet follower" is a phrase, not a keyword or term.
>> 
>> So, I'm still baffled as to what you are really trying to do. Trying
>> explaining it in plain English.
>> 
>> And given this same input, how would it be queried?
>> 
>> 
>> -- Jack Krupansky
>> 
>> -Original Message- From: Furkan KAMACI
>> Sent: Monday, August 19, 2013 11:22 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Prevent Some Keywords at Analyzer Step
>> 
>> 
>> Let's assume that my sentence is that:
>> 
>> *Alice is a diet follower*
>> 
>> My special keyword => *diet follower*
>> 
>> Tokens will be:
>> 
>> Token 1) Alice
>> Token 2) is
>> Token 3) a
>> Token 4) diet
>> Token 5) follower
>> Token 6) *diet follower*
>> 
>> 
>> 2013/8/19 Jack Krupansky 
>> 
>> Your example doesn't "prevent" any keywords.
>>> 
>>> You need to elaborate the specific requirements with more detail.
>>> 
>>> Given a long stream of text, what tokenization do you expect in the index?
>>> 
>>> -- Jack Krupansky
>>> 
>>> -Original Message- From: Furkan KAMACI Sent: Monday, August 19,
>>> 2013 8:07 AM To: solr-user@lucene.apache.org Subject: Prevent Some
>>> Keywords at Analyzer Step
>>> Hi;
>>> 
>>> I want to write an analyzer that will prevent some special words. For
>>> example sentence to be indexed is:
>>> 
>>> diet follower
>>> 
>>> it will tokenize it as like that
>>> 
>>> token 1) diet
>>> token 2) follower
>>> token 3) diet follower
>>> 
>>> How can I do that with Solr?
>>> 
>>> 
>> 



Data Import faile in solr 4.3.0

2013-08-21 Thread Montu v Boda
when we import the all index of solr 3.5 to 4.3 then import goes fail each
and every time due to below error.

Caused by: org.apache.solr.common.SolrException: parsing error
Caused by: org.paache.http.MalformedChunkCodingException: Unexpected content
at the end of chunk


we have 50 lac document is index in solr 3.5. approx the size of that index
is 400 GB.


Thanks & REgards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-faile-in-solr-4-3-0-tp4085868.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue in Swap Space display at Solr Admin

2013-08-21 Thread Stefan Matheis
Thanks Vladimir, i've created SOLR-5178

- Stefan 


On Wednesday, August 21, 2013 at 1:29 PM, Vladimir Vagaitsev wrote:

> Stefan,
> 
> It's done! Here is the "system" key:
> 
> "system":{"name":"Linux","version":"3.2.0-39-virtual","arch":"amd64","systemLoadAverage":3.38,"committedVirtualMemorySize":32454287360,"freePhysicalMemorySize":912945152,"freeSwapSpaceSize":0,"processCpuTime":5627465000,"totalPhysicalMemorySize":71881908224,"totalSwapSpaceSize":0,"openFileDescriptorCount":350,"maxFileDescriptorCount":4096,"uname":"Linux
> ip-xxx-xxx-xxx-xxx 3.2.0-39-virtual #62-Ubuntu SMP Thu Feb 28 00:48:27
> UTC 2013 x86_64 x86_64 x86_64 GNU/Linux\n","uptime":" 11:24:39 up 4
> days, 23:03, 1 user, load average: 3.38, 3.10, 2.95\n"}
> 
> 
> 
> 2013/8/21 Stefan Matheis  (mailto:matheis.ste...@gmail.com)>
> 
> > Vladimir
> > 
> > As Shawn said .. there is/was a change in configuration - my explanation
> > was perhaps not the best.
> > if you try that one, it should work:
> > http://localhost:8983/solr/collection1/admin/system?wt=json
> > otherwise, let us know which is the url you're using to access the Admin UI
> > 
> > - Stefan
> > 
> > 
> > On Wednesday, August 21, 2013 at 11:50 AM, Vladimir Vagaitsev wrote:
> > 
> > > Stefan. the link still doesn't work.
> > > 
> > > I'm usiing solr-4.3.1 and I have the following solr.xml file:
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > >  > > hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}">
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 2013/8/20 Shawn Heisey mailto:s...@elyograg.org)>
> > > 
> > > > On 8/20/2013 9:49 AM, Stefan Matheis wrote:
> > > > 
> > > > > Vladimir
> > > > > 
> > > > > That shouldn't matter .. perhaps i did not provide enough
> > information?
> > > > > depends on which host & port you have solr running .. and the path
> > > > 
> > > 
> > 
> > you have
> > > > > defined.
> > > > > 
> > > > > based on the tutorial (host + port configuration) you would use
> > something
> > > > > like this:
> > > > > 
> > > > > http://localhost:8983/solr/**admin/system?wt=json<
> > http://localhost:8983/solr/admin/system?wt=json>
> > > > > 
> > > > > and that works in single- as well in multicore mode ..
> > > > > 
> > > > > Let me know if that still doesn't work? if so .. which is the address
> > > > > you're using to access the UI?
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > That URL doesn't have a core name.
> > > > 
> > > > If defaultCoreName is missing from an old-style solr.xml, if it's not a
> > > > valid core name, or if the user is running 4.4 and has a new-style
> > > > solr.xml, that URL will not work.
> > > > 
> > > > The old-style solr.xml will continue to work in all 4.x versions, you
> > > > don't need to use the new style.
> > > > 
> > > > Thanks,
> > > > Shawn
> > > > 
> > > 
> > 
> > 
> 
> 
> 




Re: Measuring SOLR performance

2013-08-21 Thread Dmitry Kan
Hi Roman,

I have noticed a difference with different solr.xml config contents. It is
probably legit, but thought to let you know (tests run on fresh checkout as
of today).

As mentioned before, I have two cores configured in solr.xml. If the file
is:

[code]


  
  


  

[/code]

then the instruction:

python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60 -R
cms -t /solr/statements -e statements -U 100

works just fine. If however the solr.xml has adminPath set to "/admin"
solrjmeter produces an error:

[error]
**ERROR**
  File "solrjmeter.py", line 1386, in 
main(sys.argv)
  File "solrjmeter.py", line 1278, in main
check_prerequisities(options)
  File "solrjmeter.py", line 375, in check_prerequisities
error('Cannot find admin pages: %s, please report a bug' % apath)
  File "solrjmeter.py", line 66, in error
traceback.print_stack()
Cannot find admin pages: http://localhost:8983/solr/admin, please report a
bug
[/error]

With both solr.xml configs the following url returns just fine:

http://localhost:8983/solr/statements/admin/system?wt=json

Regards,

Dmitry



On Wed, Aug 14, 2013 at 2:03 PM, Dmitry Kan  wrote:

> Hi Roman,
>
> This looks much better, thanks! The ordinary non-comarison mode works.
> I'll post here, if there are other findings.
>
> Thanks for quick turnarounds,
>
> Dmitry
>
>
> On Wed, Aug 14, 2013 at 1:32 AM, Roman Chyla wrote:
>
>> Hi Dmitry, oh yes, late night fixes... :) The latest commit should make it
>> work for you.
>> Thanks!
>>
>> roman
>>
>>
>> On Tue, Aug 13, 2013 at 3:37 AM, Dmitry Kan  wrote:
>>
>> > Hi Roman,
>> >
>> > Something bad happened in fresh checkout:
>> >
>> > python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
>> > ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60
>> -R
>> > cms -t /solr/statements -e statements -U 100
>> >
>> > Traceback (most recent call last):
>> >   File "solrjmeter.py", line 1392, in 
>> > main(sys.argv)
>> >   File "solrjmeter.py", line 1347, in main
>> > save_into_file('before-test.json', simplejson.dumps(before_test))
>> >   File "/usr/lib/python2.7/dist-packages/simplejson/__init__.py", line
>> 286,
>> > in dumps
>> > return _default_encoder.encode(obj)
>> >   File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line
>> 226,
>> > in encode
>> > chunks = self.iterencode(o, _one_shot=True)
>> >   File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line
>> 296,
>> > in iterencode
>> > return _iterencode(o, 0)
>> >   File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line
>> 202,
>> > in default
>> > raise TypeError(repr(o) + " is not JSON serializable")
>> > TypeError: <__main__.ForgivingValue object at 0x7fc6d4040fd0> is not
>> JSON
>> > serializable
>> >
>> >
>> > Regards,
>> >
>> > D.
>> >
>> >
>> > On Tue, Aug 13, 2013 at 8:10 AM, Roman Chyla 
>> > wrote:
>> >
>> > > Hi Dmitry,
>> > >
>> > >
>> > >
>> > > On Mon, Aug 12, 2013 at 9:36 AM, Dmitry Kan 
>> > wrote:
>> > >
>> > > > Hi Roman,
>> > > >
>> > > > Good point. I managed to run the command with -C and double quotes:
>> > > >
>> > > > python solrjmeter.py -a -C "g1,cms" -c hour -x
>> ./jmx/SolrQueryTest.jmx
>> > > >
>> > > > As a result got several files (html, css, js, csv) in the running
>> > > directory
>> > > > (any way to specify where the output should be stored in this case?)
>> > > >
>> > >
>> > > i know it is confusing, i plan to change it - but later, now it is too
>> > busy
>> > > here...
>> > >
>> > >
>> > > >
>> > > > When I look onto the comparison dashboard, I see this:
>> > > >
>> > > > http://pbrd.co/17IRI0b
>> > > >
>> > >
>> > > two things: the tests probably took more than one hour to finish, so
>> they
>> > > are not aligned - try generating the comparison with '-c  14400'  (ie.
>> > > 4x3600 secs)
>> > >
>> > > the other thing: if you have only two datapoints, the dygraph will not
>> > show
>> > > anything - there must be more datapoints/measurements
>> > >
>> > >
>> > >
>> > > >
>> > > > One more thing: all the previous tests were run with softCommit
>> > disabled.
>> > > > After enabling it, the tests started to fail:
>> > > >
>> > > > $ python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
>> > > > ./queries/demo/demo.queries -s localhost -p 8983 -a
>> --durationInSecs 60
>> > > -R
>> > > > g1 -t /solr/statements -e statements -U 100
>> > > > $ cd g1
>> > > > Reading results of the previous test
>> > > > $ cd 2013.08.12.16.32.48
>> > > > $ cd /home/dmitry/projects/lab/solrjmeter4/solrjmeter/g1
>> > > > $ mkdir 2013.08.12.16.33.02
>> > > > $ cd 2013.08.12.16.33.02
>> > > > $ cd /home/dmitry/projects/lab/solrjmeter4/solrjmeter/g1
>> > > > $ cd /home/dmitry/projects/lab/solrjmeter4/solrjmeter
>> > > > $ cd /home/dmitry/projects/lab/solrjmeter4/solrjmeter
>> > > > Traceback (most recent call last):
>> > > >   File "solrjmeter.py", line 1427, in 
>> > > > main(sys.argv)
>> > > >   Fil

Re: Issue in Swap Space display at Solr Admin

2013-08-21 Thread Vladimir Vagaitsev
Stefan,

It's done! Here is the "system" key:

"system":{"name":"Linux","version":"3.2.0-39-virtual","arch":"amd64","systemLoadAverage":3.38,"committedVirtualMemorySize":32454287360,"freePhysicalMemorySize":912945152,"freeSwapSpaceSize":0,"processCpuTime":5627465000,"totalPhysicalMemorySize":71881908224,"totalSwapSpaceSize":0,"openFileDescriptorCount":350,"maxFileDescriptorCount":4096,"uname":"Linux
ip-xxx-xxx-xxx-xxx 3.2.0-39-virtual #62-Ubuntu SMP Thu Feb 28 00:48:27
UTC 2013 x86_64 x86_64 x86_64 GNU/Linux\n","uptime":" 11:24:39 up 4
days, 23:03,  1 user,  load average: 3.38, 3.10, 2.95\n"}



2013/8/21 Stefan Matheis 

> Vladimir
>
> As Shawn said .. there is/was a change in configuration - my explanation
> was perhaps not the best.
> if you try that one, it should work:
> http://localhost:8983/solr/collection1/admin/system?wt=json
> otherwise, let us know which is the url you're using to access the Admin UI
>
> - Stefan
>
>
> On Wednesday, August 21, 2013 at 11:50 AM, Vladimir Vagaitsev wrote:
>
> > Stefan. the link still doesn't work.
> >
> > I'm usiing solr-4.3.1 and I have the following solr.xml file:
> >
> > 
> > 
> >
> > 
> > 
> >
> > 
> >  > hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}">
> > 
> > 
> > 
> > 
> >
> >
> > 2013/8/20 Shawn Heisey mailto:s...@elyograg.org)>
> >
> > > On 8/20/2013 9:49 AM, Stefan Matheis wrote:
> > >
> > > > Vladimir
> > > >
> > > > That shouldn't matter .. perhaps i did not provide enough
> information?
> > > > depends on which host & port you have solr running .. and the path
> you have
> > > > defined.
> > > >
> > > > based on the tutorial (host + port configuration) you would use
> something
> > > > like this:
> > > >
> > > > http://localhost:8983/solr/**admin/system?wt=json<
> http://localhost:8983/solr/admin/system?wt=json>
> > > >
> > > > and that works in single- as well in multicore mode ..
> > > >
> > > > Let me know if that still doesn't work? if so .. which is the address
> > > > you're using to access the UI?
> > > >
> > >
> > >
> > > That URL doesn't have a core name.
> > >
> > > If defaultCoreName is missing from an old-style solr.xml, if it's not a
> > > valid core name, or if the user is running 4.4 and has a new-style
> > > solr.xml, that URL will not work.
> > >
> > > The old-style solr.xml will continue to work in all 4.x versions, you
> > > don't need to use the new style.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
> >
> >
>
>
>


Solr Indexing Status

2013-08-21 Thread Prasi S
Hi,
I am using solr 4.4 to index csv files. I am using solrj for this. At
frequent intervels my user may request for "Status". I have to send get
something like in DIH " Indexing in progress.. Added xxx documents".

Is there anything like in dih, where we can fire a command=status to get
the status of indexing for files.


Thanks,
Prasi


Re: "Path must not end with / character" error during performance tests

2013-08-21 Thread Tanya
Eric,

The issue is that the problem happens not on every call, I assume that
there is not a configuration problem.

BR

Tanya



>It looks like you've specified your zkHost (?) as something like
>machine:port/solr/
>
>rather than
>machine:port/solr
>
>Is that possible?
>
>Best,
>Erick




On Tue, Aug 20, 2013 at 7:10 PM, Tanya  wrote:

> Hi,
>
> I have integrated SolrCloud search in some system with single shard and
> works fine on single tests.
>
> Recently we started to run performance test and we are getting following
> exception after a while.
>
> Help is really appreciated,
>
> Thanks
> Tanya
>
> 2013-08-20 10:45:56,128 [cTaskExecutor-1] ERROR
> LoggingAspect  - Exception
>
> java.lang.RuntimeException: java.lang.IllegalArgumentException: Path must
> not end with / character
>
> at
> org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:123)
>
> at
> org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:88)
>
> at
> org.apache.solr.common.cloud.ZkStateReader.(ZkStateReader.java:148)
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:147)
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:173)
>
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>
> at
> com.mcm.search.solr.SolrDefaultIndexingAdapter.updateEntities(SolrDefaultIndexingAdapter.java:109)
>
> at
> com.mcm.search.AbstractSearchEngineMediator.updateDocuments(AbstractSearchEngineMediator.java:89)
>
> at
> com.alu.dmsp.search.mediator.wrapper.SearchMediatorWrapper.updateDocuments(SearchMediatorWrapper.java:255)
>
> at
> com.alu.dmsp.business.module.beans.DiscoverySolrIndexingRequestHandler.executeLogic(DiscoverySolrIndexingRequestHandler.java:122)
>
> at
> com.alu.dmsp.business.module.beans.DiscoverySolrIndexingRequestHandler.executeLogic(DiscoverySolrIndexingRequestHandler.java:36)
>
> at
> com.alu.dmsp.common.business.BasicBusinessServiceExecutionRequestHandler.executeRequest(BasicBusinessServiceExecutionRequestHandler.java:107)
>
> at
> com.alu.dmsp.common.business.BasicBusinessServiceExecutionRequestHandler.execute(BasicBusinessServiceExecutionRequestHandler.java:84)
>
> at
> com.alu.dmsp.common.business.beans.BasicBusinessService.executeRequest(BasicBusinessService.java:92)
>
> at
> com.alu.dmsp.common.business.beans.BasicBusinessService.execute(BasicBusinessService.java:79)
>
> at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:319)
>
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>
> at
> org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:80)
>
> at
> com.alu.dmsp.common.log.LoggingAspect.logWebServiceMethodCall(LoggingAspect.java:143)
>
> at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at
> org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621)
>
> at
> org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610)
>
> at
> org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:65)
>
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>
> at
> org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:55)
>
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>
> at
> org.springframework.aop.framework.adapter.MethodBeforeAdviceInterceptor.invoke(MethodBeforeAdviceInterceptor.java:50)
>
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>
> at
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90)
>
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>
> at
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
>
> 

Re: Issue in Swap Space display at Solr Admin

2013-08-21 Thread Stefan Matheis
Vladimir

As Shawn said .. there is/was a change in configuration - my explanation was 
perhaps not the best.
if you try that one, it should work: 
http://localhost:8983/solr/collection1/admin/system?wt=json
otherwise, let us know which is the url you're using to access the Admin UI

- Stefan 


On Wednesday, August 21, 2013 at 11:50 AM, Vladimir Vagaitsev wrote:

> Stefan. the link still doesn't work.
> 
> I'm usiing solr-4.3.1 and I have the following solr.xml file:
> 
> 
> 
> 
> 
> 
> 
> 
>  hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}">
> 
> 
> 
> 
> 
> 
> 2013/8/20 Shawn Heisey mailto:s...@elyograg.org)>
> 
> > On 8/20/2013 9:49 AM, Stefan Matheis wrote:
> > 
> > > Vladimir
> > > 
> > > That shouldn't matter .. perhaps i did not provide enough information?
> > > depends on which host & port you have solr running .. and the path you 
> > > have
> > > defined.
> > > 
> > > based on the tutorial (host + port configuration) you would use something
> > > like this:
> > > 
> > > http://localhost:8983/solr/**admin/system?wt=json
> > > 
> > > and that works in single- as well in multicore mode ..
> > > 
> > > Let me know if that still doesn't work? if so .. which is the address
> > > you're using to access the UI?
> > > 
> > 
> > 
> > That URL doesn't have a core name.
> > 
> > If defaultCoreName is missing from an old-style solr.xml, if it's not a
> > valid core name, or if the user is running 4.4 and has a new-style
> > solr.xml, that URL will not work.
> > 
> > The old-style solr.xml will continue to work in all 4.x versions, you
> > don't need to use the new style.
> > 
> > Thanks,
> > Shawn
> > 
> 
> 
> 




Solr 4.4 problem with loading DisMaxRequestHandler

2013-08-21 Thread danielitos85
Hi guys,

I'm using a clean solr 4.4 installation and I have add in my solrconfig.xml
the following lines:


 
 all
 0.01
 *:*
 0
 regex 



but when I start my solr he return an error:

*Caused by: java.lang.ClassNotFoundException: solr.DisMaxRequestHandler*

In my dist folder I have all the defaults library and also in the lib folder
into my core
please, any suggests?

Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-4-problem-with-loading-DisMaxRequestHandler-tp4085842.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue in Swap Space display at Solr Admin

2013-08-21 Thread Vladimir Vagaitsev
Stefan. the link still doesn't work.

I'm usiing solr-4.3.1 and I have the following solr.xml file:







  
  


  



2013/8/20 Shawn Heisey 

> On 8/20/2013 9:49 AM, Stefan Matheis wrote:
>
>> Vladimir
>>
>> That shouldn't matter .. perhaps i did not provide enough information?
>> depends on which host & port you have solr running .. and the path you have
>> defined.
>>
>> based on the tutorial (host + port configuration) you would use something
>> like this:
>>
>> http://localhost:8983/solr/**admin/system?wt=json
>>
>> and that works in single- as well in multicore mode ..
>>
>> Let me know if that still doesn't work? if so .. which is the address
>> you're using to access the UI?
>>
>
> That URL doesn't have a core name.
>
> If defaultCoreName is missing from an old-style solr.xml, if it's not a
> valid core name, or if the user is running 4.4 and has a new-style
> solr.xml, that URL will not work.
>
> The old-style solr.xml will continue to work in all 4.x versions, you
> don't need to use the new style.
>
> Thanks,
> Shawn
>
>


Re: Solr Filter Query

2013-08-21 Thread tamanjit.bin...@yahoo.co.in
I am unsure what you mean when you say /how big a filter query can be ? /. Do
you mean how long can a single filter query can be or a limit on number of
filter queries that can be put?

For the former you may want to visit the maxBooleanClauses in your
solrconfig. Try the link:
he tottp://wiki.apache.org/solr/SolrConfigXml#The_Query_Section
  

Am not too sure if there is a limit on the number of filters that you can
put in a query.
 In either case, I think the length is also dependent on your webcontainer
config settings i.e. the max url length that it can accept.

Also you may have to revisit your filterCache settings. There is no optimal
number for the filterCache, you may have come to a number after trying
various combinations. 

http://wiki.apache.org/solr/SolrCaching#filterCache
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Filter-Query-tp4085807p4085832.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: get term frequency, just only keywords search

2013-08-21 Thread danielitos85
Thanks a lot guys,

@Jack in my search I use dismax (how defType) and I search either term or
phrase, but I need to get the number that show me how many time that term or
phrase is in the document.

I could get it from debugQuery but I would like get it directly from the
results.

What do you suggest? 
Thanks a lot for support.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/get-term-frequency-just-only-keywords-search-tp4084510p4085831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facing Solr performance during query search

2013-08-21 Thread sivaprasad
Here I am providing the slave solrconfig information.

   1
   
  35
  35
   
   
  6
  1



1024
   20

  

  


  

  static firstSearcher warming in solrconfig.xml

  

false


The slave will poll for every 1hr. 

The field list is given below.













   



 






  




We have configured ~2000 facets and the machine configuration is given
below.

6 core processor, 22528 GB RAM allotted to JVM . The solr version is 4.1.0

Please let me know, if you require any more information.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facing-Solr-performance-during-query-search-tp4085426p4085825.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: High memory usage on solr 3.6

2013-08-21 Thread Samuel García Martínez
You were right. I've attached VisualVM to the process and forced a
System.gc(): used memory went down to near 1.8gb.

So, i don't understand VisualVM dump reports. It said that all those char[]
references have a SolrDispatchFilter instance as CG root.

Another example (1M+ references with the exact same size 32792 [default
buffer size]):
https://dl.dropboxusercontent.com/u/4740964/solr_references.png

Can you give any hint on how to read this kind of reference graph?

Thanks!

On Tue, Aug 20, 2013 at 7:20 PM, Samuel García Martínez <
samuelgmarti...@gmail.com> wrote:

> Thanks for the quick answer.
>
> We are experiencing slower indexing speed (x2/x2.5 time consumed) and the
> memory consumed during this process oscilates between 10g (max allowed, so
> JVM performs full gc's) and 8gb, but never goes under 8gb.
>
> I'll try your suggestions and see how it goes.
>
> Thanks!
>  El 20/08/2013 13:29, "Erick Erickson"  escribió:
>
> Hmmm, first time I've seen this, but are you seeing a problem
>> or is this just a theoretical concern? There are some
>> references that are held around until GC, these could be
>> perfectly legitimate, unused but un-collected references.
>>
>> What I'd try before worrying much:
>> 1> attach jconsole and press the GC button and see if lots of those
>> go away.
>> 2> _reduce_ the memory allocated to Solr and see if you can get by.
>> In this case, reduce it to, say, 4G. If your memory consumption
>> hits 4G and you continue running, you're probably seeing uncollected
>> memory.
>>
>> Best
>> Erick
>>
>>
>> On Tue, Aug 20, 2013 at 4:24 AM, Samuel García Martínez <
>> samuelgmarti...@gmail.com> wrote:
>>
>> > Hi all, we are facing a high memory usage in our Solr 3.6 master (not in
>> > the slaves) even during "idle" (non indexing) periods.
>> >
>> > container: Tomcat 6.0.29
>> > maxthreads: 1.5k (i think this setting is wrong, 300 would be enough)
>> > solr: solr 3.6.0
>> > setup: multitenant environment with 120+ cores and near 5M docs spread
>> over
>> > all indexes.
>> > uptime: 100+ days
>> > qps on this machine: 10 qps
>> > used heap: 8gb
>> > JVM params:
>> >
>> >
>> -Djava.util.logging.config.file=/opt/opensearch/tomcat/conf/logging.properties
>> > -Dmaster=true
>> > -Dmaster.url=""
>> > -Xms5G
>> > -Xmx10G
>> > -XX:MaxPermSize=256m
>> > -XX:SurvivorRatio=5
>> > -XX:NewRatio=2
>> > -XX:+UseParNewGC
>> > -XX:+UseConcMarkSweepGC
>> > -Dcom.sun.management.jmxremote
>> > -Dcom.sun.management.jmxremote.port=2
>> > -Dcom.sun.management.jmxremote.authenticate=false
>> > -Dcom.sun.management.jmxremote.ssl=false
>> > -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>> > -Djava.endorsed.dirs=/opt/opensearch/tomcat/endorsed
>> > -Dcatalina.base=/opt/opensearch/tomcat
>> > -Dcatalina.home=/opt/opensearch/tomcat
>> > -Djava.io.tmpdir=/opt/opensearch/tomcat/temp
>> >
>> >
>> > I think we have a high memory usage due to the next report:
>> > char[]  3.1M instances 5.3Gb -> this is what i'm talking about.
>> > LinkedHashMap$Entry 2M instances 121Mb
>> > String 1.48M 53Mb
>> >
>> >
>> > Checking for "cg roots" at all these instances, i found that almost all
>> > these references are contained in ISOLatin1AccentFilter.output ->
>> > TokenStreamImpl - Map$Entry -> Map -> CloseableThreadLocal ->
>> > TokenizerChain
>> >
>> > Do we had to add any CMS param to the JVM params? Or is this a memory
>> leak
>> > due to the ThreadLocal's?
>> >
>> > I verified that those char[] didn't belong to FieldCache.default, so
>> this
>> > high memory usage is not due the faceting and high cadinality values.
>> >
>> > PS: we reduced the number of threads and memory (and char[] instances)
>> > decreased significantly.
>> > --
>> > Un saludo,
>> > Samuel García.
>> >
>>
>


-- 
Un saludo,
Samuel García.