Re: multiple indexes?

2012-11-30 Thread Shawn Heisey

On 11/30/2012 10:11 PM, Joe Zhang wrote:

May I ask: how to set up multiple indexes, and specify which index to send
the docs to at indexing time, and later on, how to specify which index to
work with?

A related question: what is the storage location and structure of solr
indexes?
When you index or query data, you'll use a base URL specific to the 
index (core).  Everything goes through that base URL, which includes the 
name of the core:


http://server:port/solr/corename

The file called solr.xml tells Solr about multiple cores.Each core has 
an instanceDir and a dataDir.


http://wiki.apache.org/solr/CoreAdmin

In the dataDir, Solr will create an index dir, which contains the Lucene 
index.  Here are the file formats for recent versions:


http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html
http://lucene.apache.org/core/3_6_1/fileformats.html
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html

Thanks,
Shawn



Re: multiple indexes?

2012-11-30 Thread Dikchant Sahi
Multiple indexes can be setup using the multi core feature of Solr.

Below are the steps:
1. Add the core name and storage location of the core to
the $SOLR_HOME/solr.xml file.
  
**
**
  

2. Create the core-directories specified and following sub-directories in
it:
- conf: Contains the configs and schema definition
- lib: Contains the required libraries
- data: Will be created automatically on first run. This would contain
the actual index.

While indexing the docs, you specify the core name in the url as follows:
  http://:/solr//update?

Similarly you do while querying.

Please refer to Solr Wiki, it has the complete details.

Hope this helps!

- Dikchant

On Sat, Dec 1, 2012 at 10:41 AM, Joe Zhang  wrote:

> May I ask: how to set up multiple indexes, and specify which index to send
> the docs to at indexing time, and later on, how to specify which index to
> work with?
>
> A related question: what is the storage location and structure of solr
> indexes?
>
> Thanks in advance, guys!
>
> Joe.
>


Re: Multi word synonyms

2012-11-30 Thread Roman Chyla
Try separating multi word synonyms with a null byte

simple\0syrup,sugar\0syrup,stock\0syrup

see https://issues.apache.org/jira/browse/LUCENE-4499 for details

roman

On Sun, Feb 5, 2012 at 10:31 PM, Zac Smith  wrote:

> Thanks for your response. When I don't include the KeywordTokenizerFactory
> in the SynonymFilter definition, I get additional term values that I don't
> want.
>
> e.g. synonyms.txt looks like:
> simple syrup,sugar syrup,stock syrup
>
> A document with a value containing 'simple syrup' can now be found when
> searching for just 'stock'.
>
> So the problem I am trying to address with KeywordTokenizerFactory, is to
> prevent my multi word synonyms from getting broken down into single words.
>
> Thanks
> Zac
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, February 05, 2012 8:07 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Multi word synonyms
>
> I'm not quite sure what you're trying to do with KeywordTokenizerFactory
> in your SynonymFilter definition, but if I use the defaults, then the
> all-phrase form works just fine.
>
> So the question is "what problem are you trying to address by using
> KeywordTokenizerFactory?"
>
> Best
> Erick
>
> On Sun, Feb 5, 2012 at 8:21 AM, O. Klein  wrote:
> > Your query analyser will tokenize "simple sirup" into "simple" and
> "sirup"
> > and wont match on "simple syrup" in the synonyms.txt
> >
> > So you have to change the query analyzer into KeywordTokenizerFactory
> > as well.
> >
> > It might be idea to make a field for synonyms only with this tokenizer
> > and another field to search on and use dismax. Never tried this though.
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p37172
> > 15.html Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>


Re: SolrCloud - Sorting Problem

2012-11-30 Thread Chris Hostetter
: Background: Basically, I have added a new feature to Solr after I got the
: source code. Similar to the we get "score" in the resultset,  I am now able
: to get position (or ranking) information of each document in the list. i.e
: if there are 5 documents in the result set, each of them has its position
: information if you add "fl=*,position" to the query.

w/o more information about how/where you add this information, it's going 
to be really hard to give you suggestions on how to fix your problem.

: solr is on a cloud (as a master), the result set is some kinda shuffled and
: position information is incorrect.

In general the thing to keep in mind is that when doing a distributed 
query, each node is responsible for providing data about the results, and 
then a single node (which ever one your client/browser is connected to) 
acts as an agregator to merge that information.

How that merging happens is specific to the information, and in most cases 
multiple (pipelined) requests are made to the individual shards.

for example: a request to search for X, sorted by Y, and faceting on field 
Z requires two requests to every shard: the first request gets the 
docIds and value of "Y" for the first N docs from each shard, as well as 
the top ranking facet values and their counts for field Z.  Then the 
agregator looks at the Y values to figure out which docs from which shards 
should be in the final result, and it looks at the facet values to see 
which ones should be in the final result, and then it issues a second 
request to each shard in which it asks for the "fl" of those specific 
docs, and the final counts for those specific facet values, and 
then the final response is built up and returned from the client.

so depending on what you really mean in terms of "position" in an 
agregated request like this, you need to make sure your custom code is 
running in the right place -- that may mean having logic that runs on the 
individual shards, as well as merge logic on the aggregator, or it may 
mean logic thta *only* runs on the aggregator, based on information 
already available.

the details of what you are trying to do, and how you are currently 
attempting to do it, matter a lot.


-Hoss


Re: predefined variables usable in schema.xml ?

2012-11-30 Thread T. Kuro Kurosaka

Sorry, correction.
${solr.core.instanceDir} is working in a sense.  It is replaced by the 
core name, rather than a directory path.

In an earlier startup time Solr prints out:
INFO: Creating SolrCore 'collection1' using instanceDir: solr/collection1
But judging from the error message I get, ${solr.core.instanceDir} is 
replaced by the value "collection1"  (no "solr/").


I was hoping that ${solr.core.instanceDir} would be replaced by the 
absolute path to the examples/core/collection1 directory.


On 11/30/12 2:41 PM, T. Kuro Kurosaka wrote:
I tried to use ${solr.core.instanceDir} in schema.xml with Solr 4.0, 
where every deployment is multi-core, and it didn't work.
It must be that the description about pre-defined properties in 
CoreAdmin wiki page is wrong, or it only works in solrconfig.xml, 
perhaps?


On 11/28/12 5:17 PM, T. Kuro Kurosaka wrote:

Thank you, Hoss.

I found this SolrWiki page talks about pre-defined properties such as 
solr.core.instanceDir:

http://wiki.apache.org/solr/CoreAdmin

I tried to use ${solr.core.instanceDir} in the default single-core 
schema.xml, and it didn't work.
Is this page wrong, or these properties are available only in 
multi-core deployments?


On 11/27/12 2:27 PM, Chris Hostetter wrote:
: The default solrconfig.xml seems to suggest ${solr.data.dir} can 
be used.
: So I am hoping there is another pre-defined variable like this 
that points to

: the solr core directory.

there's nothing special about solr.data.dir ... it's used i nthe 
example

configs as a convinient way to let you override it on the command line
when running the example, otherwise it defaults to the empty string 
which

triggers the default dataDir logic (ie: ./data in the instanceDir)...

${solr.data.dir:}

:class="com.basistech.rlp.solr.RCLUNormalizeCharFilterFactory"

:   rlpContext="solr/conf/rlp-context-rclu.xml"/>
:
: This only works if Solr is started from $SOLR_HOME/example, as it 
is relative

: to the current working directory.

if your factories are using the SolrResourceLoader.openResource to load
those files then you can change that to just be 
'rlpContext="rlp-context-rclu.xml"'

and it will just plain work -- the SolrResourceLoader is
SolrCloud/ZooKeeper aware, and in stadalone mode checks the conf dir,
the classpath, and as a last resort attempts to resolve it as an 
relative
path -- if your custom factories just call "new File(rlpContext)" on 
the

string, then you're stuck using absolute paths, or needing to define
system properties at runtime.


-Hoss








Re: Exceptions in branch_4x log

2012-11-30 Thread Mark Miller

On Nov 30, 2012, at 5:04 PM, Shawn Heisey  wrote:

> The other exceptions in the log look more serious. 

My guess would be that the sizeOf exception is leading to the others.

- Mark


Re: predefined variables usable in schema.xml ?

2012-11-30 Thread T. Kuro Kurosaka
I tried to use ${solr.core.instanceDir} in schema.xml with Solr 4.0, 
where every deployment is multi-core, and it didn't work.
It must be that the description about pre-defined properties in 
CoreAdmin wiki page is wrong, or it only works in solrconfig.xml, perhaps?


On 11/28/12 5:17 PM, T. Kuro Kurosaka wrote:

Thank you, Hoss.

I found this SolrWiki page talks about pre-defined properties such as 
solr.core.instanceDir:

http://wiki.apache.org/solr/CoreAdmin

I tried to use ${solr.core.instanceDir} in the default single-core 
schema.xml, and it didn't work.
Is this page wrong, or these properties are available only in 
multi-core deployments?


On 11/27/12 2:27 PM, Chris Hostetter wrote:
: The default solrconfig.xml seems to suggest ${solr.data.dir} can be 
used.
: So I am hoping there is another pre-defined variable like this that 
points to

: the solr core directory.

there's nothing special about solr.data.dir ... it's used i nthe example
configs as a convinient way to let you override it on the command line
when running the example, otherwise it defaults to the empty string 
which

triggers the default dataDir logic (ie: ./data in the instanceDir)...

${solr.data.dir:}

:class="com.basistech.rlp.solr.RCLUNormalizeCharFilterFactory"

:   rlpContext="solr/conf/rlp-context-rclu.xml"/>
:
: This only works if Solr is started from $SOLR_HOME/example, as it 
is relative

: to the current working directory.

if your factories are using the SolrResourceLoader.openResource to load
those files then you can change that to just be 
'rlpContext="rlp-context-rclu.xml"'

and it will just plain work -- the SolrResourceLoader is
SolrCloud/ZooKeeper aware, and in stadalone mode checks the conf dir,
the classpath, and as a last resort attempts to resolve it as an 
relative

path -- if your custom factories just call "new File(rlpContext)" on the
string, then you're stuck using absolute paths, or needing to define
system properties at runtime.


-Hoss






Re: Exceptions in branch_4x log

2012-11-30 Thread Shawn Heisey

On 11/30/2012 2:24 PM, Mark Miller wrote:

You are using a local filesystem and not like NFS?

If you are not replicating, I'm surprised this would happen and doubt it's the 
bug mentioned in the other reply.

It could just be a bug we have to defend against - if a file is not there 
because the index is changing as we are counting, we should probably just 
continue on.


Yes, it's a local filesystem, ext4 on LVM, scsi driver is mpt2sas -- 
Dell hardware RAID1.


Linux bigindy5 2.6.32-279.14.1.el6.centos.plus.x86_64 #1 SMP Wed Nov 7 
00:40:45 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux


I've checked out the latest revision from SVN and I'll file a new bug if 
that particular exception happens again.


The other exceptions in the log look more serious.  This is a dev box, 
so it doesn't see all that much traffic, but so far I have not noticed 
any actual problems.


Thanks,
Shawn



Re: Edismax query parser and phrase queries

2012-11-30 Thread Jack Krupansky
I don’t have a simple answer for your stated issue, but maybe part of that is 
because I’m not so sure what the exact problem/goal is. I mean, what’s so 
special about phrase queries for your app than they need distinct processing 
from individual terms?

And, ultimately, what goal are you trying to achieve? Such as, how will the 
outcome of the query affect what users see and do.

-- Jack Krupansky

From: Tantius, Richard 
Sent: Friday, November 30, 2012 8:44 AM
To: solr-user@lucene.apache.org 
Subject: Edismax query parser and phrase queries

Hi,

we are using the edismax query parser and execute queries on specific fields by 
using the qf option. Like others, we are facing the problem we do not want 
explicit phrase queries to be performed on some of the qf fields and also 
require additional search fields for those kind of queries.

We tried to expand explicit phrases in a query by implementing some 
pre-processing logic, which did not seemed to be quite convenient.

So for example (lets assume qf="title text", we want phrase queries to be 
performed on the additional fields "titleAlt textAlt" ): q="ran away from home" 
Cat Dog -transformTo-> q=( titleAlt:"ran away from home" OR textAlt:"ran away 
from home" ) Cat Dog. Unfortunately this gets rather complicated if logic 
operators are involved within the query. Is there some kind of best practice, 
should we for example extend the query parser, or stick to our pre-processing 
approach?

 

Regards,

Richard.

 

Richard Tantius
Software Engineer 



Gotenstr. 7-9
53175 Bonn
Tel.:+49 (0)228 / 4 22 86 - 38 
Fax.:   +49 (0)228 / 4 22 86 - 538
E-Mail:   r.tant...@binserv.de 
Web:  www.binserv.de
   www.binforcepro.de

Geschäftsführer: Rüdiger Jakob
Amtsgericht: Siegburg HRB 6765
Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter
Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche 
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige 
Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den 
Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien 
öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen 
Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien 
umgehend. Vielen Dank!

 

 


Re: Exceptions in branch_4x log

2012-11-30 Thread Shawn Heisey

On 11/30/2012 2:24 PM, Markus Jelsma wrote:

Hi, try updating your check out, i think that's fixed now.
https://issues.apache.org/jira/browse/SOLR-4117


Thank you, that's the most common problem.  I'll let you know how it 
turns out.


There are still other problems in the log.  Anyone have any idea about 
those?


Thanks,
Shawn



Re: Exceptions in branch_4x log

2012-11-30 Thread Mark Miller
You are using a local filesystem and not like NFS?

If you are not replicating, I'm surprised this would happen and doubt it's the 
bug mentioned in the other reply.

It could just be a bug we have to defend against - if a file is not there 
because the index is changing as we are counting, we should probably just 
continue on.

- Mark

On Nov 30, 2012, at 4:15 PM, Shawn Heisey  wrote:

> This is branch_4x, checked out 2012-11-28.  Here is my solr log, created by 
> log4j at WARN level:
> 
> http://dl.dropbox.com/u/97770508/solr-2012-11-30.log
> 
> There are a bunch of unusual exceptions in here.  Most of them appear to be 
> related to getting information from the mbeans handler, specifically getting 
> the size of the index from the replication handler ... it appears that for 
> this exception, the index gets updated while the stats are being gathered, 
> and files in the index disappear.
> 
> The other exceptions look like they are I/O related.  I don't know why they 
> are happening.  Can anyone shed any light on the situation?  What other 
> information should I include for troubleshooting?
> 
> For the disappearing index files, I am guessing that some kind of 
> synchronization may need to be added so that the index cannot change while 
> the replication handler is gathering the size information.  I am actually not 
> using replication, the only reason I have the handler configured is so that I 
> can get the size of my indexes.  See SOLR-3990.
> 
> https://issues.apache.org/jira/browse/SOLR-3990
> 
> I think I need to file jira issues for these problems, but I would like to 
> understand them better before doing that.  Can anyone offer any insight?
> 
> Thanks,
> Shawn
> 



Re: Grouping by a date field

2012-11-30 Thread sdanzig
Hey, great advice Amit, Jack, and Chris.  It's been a while since I got such
a nice array of options!  My response... yes, Amit, I thought of your way
before posting... I was just thinking, eh, there must be a way in SOLR,
since it was so easy to do the facets.  So I wanted an alternative first
before something that seems fairly ugly.. storing redundant data like that.

Jack, I was thinking of a group.func query, but I was looking for some way
to manipulate a string in some way.  I'm a bit surprised SOLR doesn't
provide this yet either.  I'm sure there's a reason.  The solution you
provided is pretty impressive, and I think I along with a bunch of other
subscribers at the very least appreciate having such a nice working example
of how to do a group.func that way.

Chris, yeah, it's for all of time, but good insight that I appreciate..
facets are a little different, yes.

I think the way I'm going to go for now, since I realize I'm doing a query
sorted by this date anyway, is just detecting day changes in the result set
while iterating through them, and grouping them that way.  Might've wanted
to have SOLR group them if there was a clean n' easy solution, but there
apparently isn't one.

- Scott



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-by-a-date-field-tp4023318p4023563.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Exceptions in branch_4x log

2012-11-30 Thread Markus Jelsma
Hi, try updating your check out, i think that's fixed now.
https://issues.apache.org/jira/browse/SOLR-4117 
 
-Original message-
> From:Shawn Heisey 
> Sent: Fri 30-Nov-2012 22:21
> To: solr-user@lucene.apache.org
> Subject: Exceptions in branch_4x log
> 
> This is branch_4x, checked out 2012-11-28.  Here is my solr log, created 
> by log4j at WARN level:
> 
> http://dl.dropbox.com/u/97770508/solr-2012-11-30.log
> 
> There are a bunch of unusual exceptions in here.  Most of them appear to 
> be related to getting information from the mbeans handler, specifically 
> getting the size of the index from the replication handler ... it 
> appears that for this exception, the index gets updated while the stats 
> are being gathered, and files in the index disappear.
> 
> The other exceptions look like they are I/O related.  I don't know why 
> they are happening.  Can anyone shed any light on the situation?  What 
> other information should I include for troubleshooting?
> 
> For the disappearing index files, I am guessing that some kind of 
> synchronization may need to be added so that the index cannot change 
> while the replication handler is gathering the size information.  I am 
> actually not using replication, the only reason I have the handler 
> configured is so that I can get the size of my indexes.  See SOLR-3990.
> 
> https://issues.apache.org/jira/browse/SOLR-3990
> 
> I think I need to file jira issues for these problems, but I would like 
> to understand them better before doing that.  Can anyone offer any insight?
> 
> Thanks,
> Shawn
> 
> 


Exceptions in branch_4x log

2012-11-30 Thread Shawn Heisey
This is branch_4x, checked out 2012-11-28.  Here is my solr log, created 
by log4j at WARN level:


http://dl.dropbox.com/u/97770508/solr-2012-11-30.log

There are a bunch of unusual exceptions in here.  Most of them appear to 
be related to getting information from the mbeans handler, specifically 
getting the size of the index from the replication handler ... it 
appears that for this exception, the index gets updated while the stats 
are being gathered, and files in the index disappear.


The other exceptions look like they are I/O related.  I don't know why 
they are happening.  Can anyone shed any light on the situation?  What 
other information should I include for troubleshooting?


For the disappearing index files, I am guessing that some kind of 
synchronization may need to be added so that the index cannot change 
while the replication handler is gathering the size information.  I am 
actually not using replication, the only reason I have the handler 
configured is so that I can get the size of my indexes.  See SOLR-3990.


https://issues.apache.org/jira/browse/SOLR-3990

I think I need to file jira issues for these problems, but I would like 
to understand them better before doing that.  Can anyone offer any insight?


Thanks,
Shawn



Re: DefaultSolrParams ?

2012-11-30 Thread Chris Hostetter

: I use it like this:
: SolrParams params = req.getParams();
: String q = params.get(CommonParams.Q).trim();
: 
: The exception is from the second line if "q" is empty.
: I can see "q.alt=*:*" in my defaults within params.
: 
: So why is it not picking up "q.alt" if "q" is empty?

Youre talking about some sort of custom solr plugin that you have correct?

when you are accessing a SolrParams object, there is nothing magic about 
"q" and "q.alt" -- params.get() will only return the value specified for 
the param name you ask about.  The logic for using "q.alt" (aka: 
"DisMaxParams.ALTQ") if "q" doesn't exist in the params (or is blank) has 
always been a specific feature of the DisMaxQParser.

So if you are suddenly getting an NPE when q is missing, perhaps the 
problem is that in your old configs there was a default "q" containing hte 
empty string, and now that's gone?


-Hoss


Re: Grouping by a date field

2012-11-30 Thread Chris Hostetter

: What's the performance impact of doing this?

the function approach should have slower query times compared to the "new 
field containing day" approach because it has to do the computation for 
every doc at query time, but it's less flexible because you have to know 
in advance you want to use it that way -- the classic classic 
"precompute work when indexing to save time during query" trade off.

A key thing to keep in mind though with both of these suggestions is 
wether you really want a single group for every day, for all of time

: >> second.  I'm able to easily use this field to create day-based facets, but
: >> not groups.  Advice please?

if you compare this to range/date faceting then there is a start/end date, 
and documents which fall outside of those dates get included in 
before/after ranges -- this wouldnt' happen if you used either of the two 
suggestions above.  

The closest analog available in grouping would be to 
use a series of "group.query" params, specifying the ranges of values you 
wanted -- this is essentialy what range faceting is doing under the 
covers.

if the goal is something like :group by day for the last 7 days, and 
everything else should be in a single 'older' bucket", then sending 8 
group.query params containing range queries should work fine.


-Hoss


Blacklight 4.0.0 released!

2012-11-30 Thread Jessie Keck
Apologies for the cross-post. 

Blacklight 4.0.0 was just released yesterday evening.  One of the most notable 
changes in this release is a switch to using Twitter Bootstrap for our UI 
component.  We have taken a fairly generic approach which will allow 
implementers to take full advantage of the features Bootstrap provides 
(including drop-in Bootswatch themes).  You can see the new Bootstrap UI for 
Blacklight at our demo site ( http://demo.projectblacklight.org/ ).

Other notable changes are:
- Removing dependency on RSolr::Ext which allows us to leverage new solr 
features as the come out.  One such feature (Pivot Facets) is supported in this 
release.
- Updated blacklight-jetty submodule to solr 4.0. (note that we expect to 
remain compatible with 3.x and 1.4 moving forward)
- Drop support for ruby 1.8.

In addition to the core release we have upgraded most (if not all) of the 
plugins under the projectblacklight Github organization to work with the 4.0.0 
release.

For more information about what this release contains as well as an upgrade 
guide please see our wiki:
https://github.com/projectblacklight/blacklight/wiki/Blacklight-4.0-release-notes-and-upgrade-guide

Very special thanks to the developers in the Blacklight community that did the 
heavy lifting on this release: 
Chris Beer (Stanford)
Simon Lamb (Hull)
James Stuart (Columbia)
Justin Coyne (MediaShelf)

As always, please feel free to contact us via email ( 
blacklight-developm...@googlegroups.com ) or on IRC ( 
http://webchat.freenode.net/?channels=blacklight )

- Jessie Keck
Software Developer
Stanford University

Re: [Solrj 4.0] No group response

2012-11-30 Thread Chris Hostetter

: query.setParam(GroupParams.GROUP_MAIN, true);
...
: GroupResponse groupResponse = response.getGroupResponse(); // null
: 
: Search result is ok, QueryResponse contains docs I searched for. But group
: response is always null. Did I miss something, some magic parameter for
: enabling group response?

By using GROUP_MAIN = true, you've told the grouping code you wnat it to 
flatten the grouping results and return them in the format of a single 
DocList result -- so there is no GroupResponse included...

https://wiki.apache.org/solr/FieldCollapsing

"We can optionally use the results of a group command as the "main" result 
(i.e. a single flat document list that would normally be produced by a 
non-grouped query request) by adding the parameter group.main=true. 
Although this result format does not have as much information, it may be 
easier for existing solr clients to parse."

If you want the "rich" grouping response, you need to use 
GROUP_MAIN=false.


-Hoss


Re: [Solrj 4.0] No group response

2012-11-30 Thread Kissue Kissue
Here is how i have previously used grouping. Note i am using Solr 3.5:

SolrQuery query = new SolrQuery("");
query.setRows(GROUPING_LIMIT);
query.setParam("group", Boolean.TRUE);
query.setParam("group.field", "GROUP_FIELD");

This seems to work for me.



On Fri, Nov 30, 2012 at 1:17 PM, Roman Slavík  wrote:

> Hi guys,
>
> I have problem with grouping in Solr 4.0 using Solrj api. I need this:
> search some documents limited with solr query, group them by one field and
> return total count of groups.
> There is param 'group.ngroups' for adding groups count into group
> response. Sounds easy, so I wrote something like this:
>
> SolrQuery query = new SolrQuery().setQuery(**queryString);
> query.addField("score");
>
> query.setParam(GroupParams.**GROUP, true);
> query.setParam(GroupParams.**GROUP_MAIN, true);
> query.setParam(GroupParams.**GROUP_FIELD, "group_field");
> query.setParam(GroupParams.**GROUP_LIMIT, "1");
> query.setParam(GroupParams.**GROUP_TOTAL_COUNT, true);
>
> QueryResponse response = solrServer.query(query);
> // contains found docs
> GroupResponse groupResponse = response.getGroupResponse(); // null
>
> Search result is ok, QueryResponse contains docs I searched for. But group
> response is always null. Did I miss something, some magic parameter for
> enabling group response?
>
> Thanks for any advice
>
> Roman
>


Re: Regexp and speed

2012-11-30 Thread Robert Muir
On Fri, Nov 30, 2012 at 12:13 PM, Roman Chyla  wrote:

>
> The code here:
>
> https://github.com/romanchyla/montysolr/blob/solr-trunk/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java
>
> The benchmark should probably not be called 'benchmark', do you think it
> may be too simplistic? Can we expect some bad surprises somewhere?
>
>
I think maybe a few surprises, since it extends LuceneTestCase and uses
RandomIndexWriter, newSearcher and so on, the benchmark results can be
confusing.

This stuff is fantastic to use for tests but for benchmarks may cause
confusion.

For example you might run it and it gets SimpleText codec, maybe wraps the
indexsearcher with slow things like ParallelReader, and maybe you get
horrific merge parameters and so on.


RE: Best way to increase boost to results that 'starts with' search keyword

2012-11-30 Thread Markus Jelsma
This issue adds the SpanFirstQuery to edismax.
https://issues.apache.org/jira/browse/SOLR-3925

It unfortuntately cannot produce progressively higher boosts if the term is 
closer to the beginning.

 
 
-Original message-
> From:Jack Krupansky 
> Sent: Fri 30-Nov-2012 18:54
> To: solr-user@lucene.apache.org
> Subject: Re: Best way to increase boost to results that 'starts with' search 
> keyword
> 
> Two choices:
> 
> 1. You need the Lucene SpanFirstQuery, but the normal Solr query parsers 
> don't support it, so you need to roll your own.
> 2. Do a custom update processor that at index time inserts a special start 
> marker like "aaafirstaaa" at the beginning of each field that needs this 
> feature. Then, you can query for "aaafirstaaa accounting" to find documents 
> wih accounting as the first term.
> 
> If you are ambitious, you could modify/extend the Solr query parser (or 
> edismax) to add a custom boost that uses SpanFirstQuery and each term or 
> phrase. Call it "boostInitial" or something like that.
> 
> The latter would be a great feature request for Solr. (And a nice start for 
> an enterprising developer seeking to progress along on the path to becoming 
> a committer!)
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: bbarani
> Sent: Friday, November 30, 2012 12:16 PM
> To: solr-user@lucene.apache.org
> Subject: Best way to increase boost to results that 'starts with' search 
> keyword
> 
> Hi,
> 
> I need to boost the document containing the search keyword in the first
> position of the indexed data, ex:
> 
> If I have 3 data indexed as below,
> 
> Account number
> Data account and account number
> Information number account data account
> Account indicator
> 
> when users searches for keyword account, I want solr to first bring in the
> documents that starts with the search keyword followed by other documents.
> 
> so the result should be
> 
> Account Number
> Account indicator
> Data account and account number
> Information number account data account
> 
> Is it possible to do this?
> 
> Thanks,
> BB
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Best-way-to-increase-boost-to-results-that-starts-with-search-keyword-tp4023502.html
> Sent from the Solr - User mailing list archive at Nabble.com. 
> 
> 


Re: Does SolrCloud support distributed IDFs?

2012-11-30 Thread Walter Underwood
Wow, an XPA user!

The distributed search merging and global IDF calculation that we used in 
Ultraseek XPA is described here:

http://wunderwood.org/most_casual_observer/2007/04/progressive_reranking.html

If you have per-term document frequencies and numdocs for each shard, you can 
calculate global IDF. It is always possible, though maybe slow, to get per-term 
document frequencies, because you can do a search for just that term and use 
the count of matches.

The Ultraseek quality was additive, like bf. A multiplicative boost (like 
edismax) is much more stable over a range of boost values.

wunder
Former Ultraseek Architect
Search Guy, Chegg.com

On Nov 28, 2012, at 9:43 AM, Sandeep Mestry wrote:

> Dear All, Can anyone suggest how long it will take to get SOLR-1632 patch
> into Solr 4?
> 
> Also, it'd be good if someone has used any alternate method like Ultraseek
> XPA Java library to calculate the distributed ranking?
> 
> Many Thanks,
> Sandeep
> 
> 
> On 22 October 2012 13:23, Sascha SZOTT  wrote:
> 
>> Hi Mark,
>> 
>> 
>> Mark Miller wrote:
>> 
>>> Still waiting on that issue. I think Andrzej should just update it to
>>> trunk and commit - it's option and defaults to off. Go vote :)
>>> 
>> Sounds like the problem is already solved and the remaining work consists
>> of code integration? Can somebody estimate how much work that would be?
>> 
>> -Sascha
>> 

--
Walter Underwood
wun...@wunderwood.org





Re: Best way to increase boost to results that 'starts with' search keyword

2012-11-30 Thread Jack Krupansky

Two choices:

1. You need the Lucene SpanFirstQuery, but the normal Solr query parsers 
don't support it, so you need to roll your own.
2. Do a custom update processor that at index time inserts a special start 
marker like "aaafirstaaa" at the beginning of each field that needs this 
feature. Then, you can query for "aaafirstaaa accounting" to find documents 
wih accounting as the first term.


If you are ambitious, you could modify/extend the Solr query parser (or 
edismax) to add a custom boost that uses SpanFirstQuery and each term or 
phrase. Call it "boostInitial" or something like that.


The latter would be a great feature request for Solr. (And a nice start for 
an enterprising developer seeking to progress along on the path to becoming 
a committer!)


-- Jack Krupansky

-Original Message- 
From: bbarani

Sent: Friday, November 30, 2012 12:16 PM
To: solr-user@lucene.apache.org
Subject: Best way to increase boost to results that 'starts with' search 
keyword


Hi,

I need to boost the document containing the search keyword in the first
position of the indexed data, ex:

If I have 3 data indexed as below,

Account number
Data account and account number
Information number account data account
Account indicator

when users searches for keyword account, I want solr to first bring in the
documents that starts with the search keyword followed by other documents.

so the result should be

Account Number
Account indicator
Data account and account number
Information number account data account

Is it possible to do this?

Thanks,
BB



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-increase-boost-to-results-that-starts-with-search-keyword-tp4023502.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Regexp and speed

2012-11-30 Thread Roman Chyla
found also some 1M test


258033ms.  Buiding index of 100 docs
29703ms.  Verifying data integrity with 100 docs
1821ms.  Preparing 1 random queries
2867284ms.  Regex queries
18772ms.  Regexp queries (new style)
29257ms.  Wildcard queries
4920ms.  Boolean queries
Totals: [1749708, 1744494, 1749708, 1744494]


On Fri, Nov 30, 2012 at 12:13 PM, Roman Chyla  wrote:

> Hi,
>
> Some time ago we have done some measurement of the performance fo the
> regexp queries and found that they are VERY FAST! We can't be grateful
> enough, it saves many days/lives ;)
>
> This was an old lenovo x61 laptop, core2 due, 1.7GHz,no special memory
> allocation, SSD disk:
>
>
> 51459ms.  Buiding index of 10 docs
> 181175ms.  Verifying data integrity with 100 docs
> 315ms.  Preparing 1000 random queries
>
> 61167ms.  Regex queries - Stopping execution, # queries finished: 150
> 2795ms.  Regexp queries (new style)
> 3936ms.  Wildcard queries
> 777ms.  Boolean queries
> 893ms.  Boolean queries (truncated)
> 3596ms.  Span queries
> 91751ms.  Span queries (truncated)Stopping execution, # queries finished: 100
> 3937ms.  Payload queries
> 93726ms.  Payload queries (truncated)Stopping execution, # queries finished: 
> 100
> Totals: [4865, 18284, 18286, 18284, 18405, 287934, 44375, 18284, 2489]
>
> Examples of queries:
> 
> regex:bgiyodjrr, k\w* michael\w* jay\w* .*
> regexp:/bgiyodjrr, k\w* michael\w* jay\w* .*/
> wildcard:bgiyodjrr, k*1 michael*2 jay*3 *
> +n0:bgiyodjrr +n1:k +n2:michael +n3:jay
> +n0:bgiyodjrr +n1:k* +n2:m* +n3:j*
> spanNear([vectrfield:bgiyodjrr, vectrfield:k, vectrfield:michael, 
> vectrfield:jay], 0, true)
> spanNear([vectrfield:bgiyodjrr, SpanMultiTermQueryWrapper(vectrfield:k*), 
> SpanMultiTermQueryWrapper(vectrfield:m*), 
> SpanMultiTermQueryWrapper(vectrfield:j*)], 0, true)
> spanPayCheck(spanNear([vectrfield:bgiyodjrr, vectrfield:k, 
> vectrfield:michael, vectrfield:jay], 1, true), payloadRef: 
> b[0]=48;b[0]=49;b[0]=50;b[0]=51;)
> spanPayCheck(spanNear([vectrfield:bgiyodjrr, 
> SpanMultiTermQueryWrapper(vectrfield:k*), 
> SpanMultiTermQueryWrapper(vectrfield:m*), 
> SpanMultiTermQueryWrapper(vectrfield:j*)], 1, true), payloadRef: 
> b[0]=48;b[0]=49;b[0]=50;b[0]=51;)
>
>
> The code here:
>
> https://github.com/romanchyla/montysolr/blob/solr-trunk/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java
>
> The benchmark should probably not be called 'benchmark', do you think it
> may be too simplistic? Can we expect some bad surprises somewhere?
>
> Thanks,
>
>   roman
>


Best way to increase boost to results that 'starts with' search keyword

2012-11-30 Thread bbarani
Hi,

I need to boost the document containing the search keyword in the first
position of the indexed data, ex:

If I have 3 data indexed as below,

Account number
Data account and account number
Information number account data account
Account indicator

when users searches for keyword account, I want solr to first bring in the
documents that starts with the search keyword followed by other documents.

so the result should be

Account Number
Account indicator
Data account and account number
Information number account data account

Is it possible to do this?

Thanks,
BB



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-increase-boost-to-results-that-starts-with-search-keyword-tp4023502.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Downloading files from the solr replication Handler

2012-11-30 Thread Alexandre Rafalovitch
What mime type you get for binary files? Maybe server is misconfigured for
that extension and sends them as text. Then they could be the markers.

Do they look like markers?

Regards,
Alex
On 30 Nov 2012 04:06, "Eva Lacy"  wrote:

> Doesn't make much sense if they are in binary files as well.
>
>
> On Thu, Nov 29, 2012 at 10:16 PM, Lance Norskog  wrote:
>
> > Maybe these are text encoding markers?
> >
> > - Original Message -
> > | From: "Eva Lacy" 
> > | To: solr-user@lucene.apache.org
> > | Sent: Thursday, November 29, 2012 3:53:07 AM
> > | Subject: Re: Downloading files from the solr replication Handler
> > |
> > | I tried downloading them with my browser and also with a c#
> > | WebRequest.
> > | If I skip the first and last 4 bytes it seems work fine.
> > |
> > |
> > | On Thu, Nov 29, 2012 at 2:28 AM, Erick Erickson
> > | wrote:
> > |
> > | > How are you downloading them? I suspect the issue is
> > | > with the download process rather than Solr, but I'm just guessing.
> > | >
> > | > Best
> > | > Erick
> > | >
> > | >
> > | > On Wed, Nov 28, 2012 at 12:19 PM, Eva Lacy  wrote:
> > | >
> > | > > Just to add to that, I'm using solr 3.6.1
> > | > >
> > | > >
> > | > > On Wed, Nov 28, 2012 at 5:18 PM, Eva Lacy  wrote:
> > | > >
> > | > > > I downloaded some configuration and data files directly from
> > | > > > solr in an
> > | > > > attempt to develop a backup solution.
> > | > > > I noticed there is some characters at the start and end of the
> > | > > > file
> > | > that
> > | > > > aren't in configuration files, I notice the same characters at
> > | > > > the
> > | > start
> > | > > > and end of the data files.
> > | > > > Anyone with any idea how I can download these files without the
> > | > > > extra
> > | > > > characters or predict how many there are going to be so I can
> > | > > > skip
> > | > them?
> > | > > >
> > | > >
> > | >
> > |
> >
>


Re: Permanently Full Old Generation...

2012-11-30 Thread Walter Underwood
We are running 1.6 update 37. That was released on the same day as your 
version, so it should have the same bug fixes. We use these options in 
production, it is very stable:

export CATALINA_OPTS="$CATALINA_OPTS -d64"
export CATALINA_OPTS="$CATALINA_OPTS -Xms4096m -Xmx6144m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:MaxPermSize=256m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:NewSize=2048m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+UseConcMarkSweepGC 
-XX:+DoEscapeAnalysis -XX:+UseCompressedOops"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+UseParNewGC 
-XX:+CMSParallelRemarkEnabled"
export CATALINA_OPTS="$CATALINA_OPTS -verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps"
export CATALINA_OPTS="$CATALINA_OPTS -XX:-TraceClassUnloading"

We do indexing and searching on separate machines. At Netflix, I found that the 
indexing load had a big effect on search speed, so I've separated the functions 
since then.

wunder
Search Guy, Chegg.com

On Nov 30, 2012, at 8:43 AM, Andy Kershaw wrote:

> We are currently operating at reduced load which is why the ParNew
> collections are not a problem. I don't know how long they were taking
> before though. Thanks for the warning about index formats.
> 
> Our JVM is:
> 
> Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode)
> 
> We are currently running more tests but it takes a while before the issues
> become apparent.
> 
> Andy Kershaw
> 
> On 29 November 2012 18:31, Walter Underwood  wrote:
> 
>> Several suggestions.
>> 
>> 1. Adjust the traffic load for about 75% CPU. When you hit 100%, you are
>> already in an overload state and the variance of the response times goes
>> way up. You'll have very noisy benchmark data.
>> 
>> 2. Do not force manual GCs during a benchmark.
>> 
>> 3. Do not force merge (optimise). That is a very expensive operation and
>> will cause slowdowns.
>> 
>> 4. Make eden big enough to hold all data allocated during a request for
>> all simultaneous requests. All that stuff is garbage after the end of the
>> request. If eden fills up, it will be allocated from the tenured space and
>> cause that to grow unnecessarily. We use an 8GB heap and 2GB eden. I like
>> setting the size better than setting ratios.
>> 
>> 5. What version of the JVM are you using?
>> 
>> wunder
>> 
>> On Nov 29, 2012, at 10:15 AM, Shawn Heisey wrote:
>> 
>>> On 11/29/2012 10:44 AM, Andy Kershaw wrote:
 Annette is away until Monday so I am looking into this in the meantime.
 Looking at the times of the Full GC entries at the end of the log, I
>> think
 they are collections we started manually through jconsole to try and
>> reduce
 the size of the old generation. This only seemed to have an effect when
>> we
 reloaded the core first though.
 
 It is my understanding that the eden size is deliberately smaller to
>> keep
 the ParNew collection time down. If it takes too long then the node is
 flagged as down.
>>> 
>>> Your ParNew collections are taking less than 1 second (some WAY less
>> than one second) to complete and the CMS collections are taking far longer
>> -- 6 seconds seems to be a common number in the GC log.  GC is unavoidable
>> with Java, so if there has to be a collection, you definitely want it to be
>> on the young generation (ParNew).
>>> 
>>> Controversial idea coming up, nothing concrete to back it up.  This
>> means that you might want to wait for a committer to weigh in:  I have seen
>> a lot of recent development work relating to SolrCloud and shard stability.
>> You may want to check out branch_4x from SVN and build that, rather than
>> use 4.0.  I don't have any idea what the timeline for 4.1 is, but based on
>> what I saw for 3.x releases, it should be released relatively soon.
>>> 
>>> The above advice is a bad idea if you have to be able to upgrade from
>> one 4.1 snapshot to a later one without reindexing. There is a possibility
>> that the 4.1 index format will change before release and require a reindex,
>> it has happened at least twice already.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>> 
>> --
>> Walter Underwood
>> wun...@wunderwood.org
>> 
>> 
>> 
>> 
> 
> 
> -- 
> Andy Kershaw
> 
> Technical Developer
> 
> ServiceTick Ltd
> 
> 
> 
> T: +44(0)1603 618326
> 
> M: +44 (0)7876 556833
> 
> 
> 
> Seebohm House, 2-4 Queen Street, Norwich, England, NR2 4SQ
> 
> www.ServiceTick.com 
> 
> www.SessionCam.com 
> 
> 
> 
> *This message is confidential and is intended to be read solely by the
> addressee. If you have received this message by mistake, please delete it
> and do not copy it to anyone else. Internet communications are not secure
> and may be intercepted or changed after they are sent. ServiceTick Ltd does
> not accept liability for any such changes.*

--
Walter Underwood
wun...@wunderwood.org





Re: Permanently Full Old Generation...

2012-11-30 Thread Andy Kershaw
We are currently operating at reduced load which is why the ParNew
collections are not a problem. I don't know how long they were taking
before though. Thanks for the warning about index formats.

Our JVM is:

Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode)

We are currently running more tests but it takes a while before the issues
become apparent.

Andy Kershaw

On 29 November 2012 18:31, Walter Underwood  wrote:

> Several suggestions.
>
> 1. Adjust the traffic load for about 75% CPU. When you hit 100%, you are
> already in an overload state and the variance of the response times goes
> way up. You'll have very noisy benchmark data.
>
> 2. Do not force manual GCs during a benchmark.
>
> 3. Do not force merge (optimise). That is a very expensive operation and
> will cause slowdowns.
>
> 4. Make eden big enough to hold all data allocated during a request for
> all simultaneous requests. All that stuff is garbage after the end of the
> request. If eden fills up, it will be allocated from the tenured space and
> cause that to grow unnecessarily. We use an 8GB heap and 2GB eden. I like
> setting the size better than setting ratios.
>
> 5. What version of the JVM are you using?
>
> wunder
>
> On Nov 29, 2012, at 10:15 AM, Shawn Heisey wrote:
>
> > On 11/29/2012 10:44 AM, Andy Kershaw wrote:
> >> Annette is away until Monday so I am looking into this in the meantime.
> >> Looking at the times of the Full GC entries at the end of the log, I
> think
> >> they are collections we started manually through jconsole to try and
> reduce
> >> the size of the old generation. This only seemed to have an effect when
> we
> >> reloaded the core first though.
> >>
> >> It is my understanding that the eden size is deliberately smaller to
> keep
> >> the ParNew collection time down. If it takes too long then the node is
> >> flagged as down.
> >
> > Your ParNew collections are taking less than 1 second (some WAY less
> than one second) to complete and the CMS collections are taking far longer
> -- 6 seconds seems to be a common number in the GC log.  GC is unavoidable
> with Java, so if there has to be a collection, you definitely want it to be
> on the young generation (ParNew).
> >
> > Controversial idea coming up, nothing concrete to back it up.  This
> means that you might want to wait for a committer to weigh in:  I have seen
> a lot of recent development work relating to SolrCloud and shard stability.
>  You may want to check out branch_4x from SVN and build that, rather than
> use 4.0.  I don't have any idea what the timeline for 4.1 is, but based on
> what I saw for 3.x releases, it should be released relatively soon.
> >
> > The above advice is a bad idea if you have to be able to upgrade from
> one 4.1 snapshot to a later one without reindexing. There is a possibility
> that the 4.1 index format will change before release and require a reindex,
> it has happened at least twice already.
> >
> > Thanks,
> > Shawn
> >
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


-- 
Andy Kershaw

Technical Developer

ServiceTick Ltd



T: +44(0)1603 618326

M: +44 (0)7876 556833



Seebohm House, 2-4 Queen Street, Norwich, England, NR2 4SQ

www.ServiceTick.com 

www.SessionCam.com 



*This message is confidential and is intended to be read solely by the
addressee. If you have received this message by mistake, please delete it
and do not copy it to anyone else. Internet communications are not secure
and may be intercepted or changed after they are sent. ServiceTick Ltd does
not accept liability for any such changes.*


Re: Replication fails in SolrCloud

2012-11-30 Thread Mark Miller

On Nov 30, 2012, at 11:01 AM, yayati  wrote:

> We have created some custom search component, where this error occur in
> inform method at line
> .getResourceLoader().getConfigDir()));

Does your custom component try and get the config dir? What for?

- Mark


Re: Replication fails in SolrCloud

2012-11-30 Thread yayati
Hi Mark,

Please find detail stacktrace :


2012-11-30 19:32:58,260 [pool-2-thread-1] ERROR
apache.solr.core.CoreContainer - null:org.apache.solr.common.SolrException:
ZkSolrResourceLoader does not support getConfigDir() - likely, what you are
trying to do is not supported in ZooKeeper mode
at org.apache.solr.core.SolrCore.(SolrCore.java:721)
at org.apache.solr.core.SolrCore.(SolrCore.java:566)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
at
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638)
at
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871)
at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615)
at
org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:962)
at
org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1603)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.cloud.ZooKeeperException:
ZkSolrResourceLoader does not support getConfigDir() - likely, what you are
trying to do is not supported in ZooKeeper mode
at
org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:100)

We have created some custom search component, where this error occur in
inform method at line
.getResourceLoader().getConfigDir()));


For set up i did following steps.

1. Add version field in schema.xml
2. In seperate zookeeper server, upload configuration of solr. 
3. Add following parameter in tomcat startup
   -Dsolr.solr.home=/home/live/solr  -DhostContext=solr
-DzkClientTimeout=2 -DzkHost=172.23.xx.xx:2181

4. solr.xml contains:



  
  





Let me know if you need more details.








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Zookeeper-aware-Replication-in-SolrCloud-tp3479497p4023484.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication in SolrCloud

2012-11-30 Thread Mark Miller
Thanks for all the detailed info!

Yes, that is confusing. One of the sore points we have while supporting both 
std Solr and SolrCloud mode.

In SolrCloud, every node is a Master when thinking about std Solr replication. 
However, as you see on the cloud page, only one of them is a *leader*. A leader 
is different than a master.

Being a Master when it comes to the replication handler simply means you can 
replicate the index to other nodes - in SolrCloud we need every node to be 
capable of doing that. Each shard only has one leader, but every node in your 
cluster will be a replication master.

- Mark


On Nov 30, 2012, at 10:32 AM, Arkadi Colson  wrote:

> This is my setup for solrCloud 4.0 on Tomcat 7.0.33 and zookeeper 3.4.5
> 
> hosts:
> - solr01-dcg (first started)
> - solr01-gs (second started so becomes replicate)
> 
> collections:
> - smsc
> 
> shards:
> - mydoc
> 
> zookeeper:
> - on solr01-dcg
> - on solr01-gs
> 
> SOLR_OPTS="-Dsolr.solr.home=/opt/solr/ -Dport=8983 
> -Dcollection.configName=smsc -DzkClientTimeout=2 
> -DzkHost=solr01-dcg:2181,solr01-gs:2181"
> 
> solr.xml:
> 
> 
>   
>  name="mydoc" config="solrconfig.xml" collection="mydoc"/>
>   
> 
> 
> I upload the config to zookeeper:
> java -classpath .:/usr/local/tomcat/webapps/solr/WEB-INF/lib/* 
> org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 
> solr01-dcg:2181,solr01-gs:2181 -confdir /opt/solr/conf -confname smsc
> 
> Linking the config to the collection:
> java -classpath .:/usr/local/tomcat/webapps/solr/WEB-INF/lib/* 
> org.apache.solr.cloud.ZkCLI -cmd linkconfig -collection mydoc -zkhost 
> solr01-dcg.intnet.smartbit.be:2181,solr01-gs.intnet.smartbit.be:2181 
> -confname smsc
> 
> cloud on both hosts:
> 
> 
> 
> solr01-dcg
> 
> 
> 
> solr01-gs:
> 
> 
> Any idea?
> 
> Thanks!
> 
> On 11/30/2012 03:15 PM, Mark Miller wrote:
>> On Nov 30, 2012, at 5:08 AM, Arkadi Colson 
>>  wrote:
>> 
>> 
>>> Hi
>>> 
>>> I've setup an simple 2 machine cloud with 1 shard, one replicator and 2 
>>> collections.Everything went fine. However when I look at the interface: 
>>> http://localhost:8983/solr/#/coll1/replication
>>>  is reporting the both machines are master. Did I do something wrong in my 
>>> config or isit a report for manual replication configuration? Can someone 
>>> else check this?
>>> 
>> How? You don't really give anything to look at :)
>> 
>> 
>>> Is it poossible to link 2 collections to the same conf in zookeeper?
>>> 
>>> 
>> Yes, that is no problem.
>> 
>> - Mark
>> 
>> 
>> 
>> 



Solr 4: Join Query

2012-11-30 Thread Vikash Sharma
Hi All,
I have my field definition in schema.xml like below







I need to create separate record in solr for each parent child
relationship... such that if child is same across different parent that it
gets stored only once.

For e.g.
 ---_Record 1
ABC
EMP001
DOC001
My Parent Doc

 ---_Record 2
DOC001


My Document Data


This will ensure that if any doc_id content is duplicate, than only once
the record is inserted in the solr.

Lastly, I want the result as join. if emp_id=EMP001. then both record
should be returned, as there is a relationship between two records using of
doc_id = id

If I query:
http://localhost:8983/solr/select?q={!join%20from=doc_id%20to=id}emp_id:EMP001&wt=json

I expect both record should be returned either one after another or
nested..
But I only get child records...


Please help..



Regards,
Vikash Sharma
vikash0...@gmail.com


DefaultSolrParams ?

2012-11-30 Thread Bernd Fehling
Dear list,

after going from 3.6 to 4.0 I see exceptions in my logs.
It turned out that somehow the "q"-parameter was empty.
With 3.6 the "q.alt" in the solrconfig.xml worked as fallback but now with 4.0 
I get exceptions.

I use it like this:
SolrParams params = req.getParams();
String q = params.get(CommonParams.Q).trim();

The exception is from the second line if "q" is empty.
I can see "q.alt=*:*" in my defaults within params.

So why is it not picking up "q.alt" if "q" is empty?

Regards
Bernd


Re: Grouping by a date field

2012-11-30 Thread Jack Krupansky
A POC will tell you - it is 90% driven by your particular environment, your 
particular schema, your particular data, and your particular queries (e.g., 
how many documents they match, how many days they match.)


But please do share with us your results after conducting your POC - which 
is an absolute requirement for all particular changes to how an app uses 
Solr.


-- Jack Krupansky

-Original Message- 
From: Amit Nithian

Sent: Friday, November 30, 2012 12:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Grouping by a date field

What's the performance impact of doing this?


On Thu, Nov 29, 2012 at 7:54 PM, Jack Krupansky 
wrote:



Or group by a function query which is the date field converted to
milliseconds divided by the number of milliseconds in a day.

Such as:

 q=*:*&group=true&group.func=**rint(div(ms(date_dt),mul(24,**
mul(60,mul(60,1000)

-- Jack Krupansky

-Original Message- From: Amit Nithian
Sent: Thursday, November 29, 2012 10:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Grouping by a date field


Why not create a new field that just contains the day component? Then you
can group by this field.


On Thu, Nov 29, 2012 at 12:38 PM, sdanzig  wrote:

 I'm trying to create a SOLR query that groups/field collapses by date.  I

have a field in -MM-dd'T'HH:mm:ss'Z' format, "datetime", and I'm
looking
to group by just per day.  When grouping on this field using
group.field=datetime in the query, SOLR responds with a group for every
second.  I'm able to easily use this field to create day-based facets, 
but

not groups.  Advice please?

- Scott



--
View this message in context:
http://lucene.472066.n3.**nabble.com/Grouping-by-a-date-**
field-tp4023318.html
Sent from the Solr - User mailing list archive at Nabble.com.








Re: localHostContext should not contain a / ... why not?

2012-11-30 Thread Mark Miller
I really don't remember. Yes, you don't want it to start with a /, yes it's 
part of the node name, but the node name should have all / turned into _. 

I'd simply try it - enforce no starting / instead, turn / into _ for the name 
node…see what tests pass, do some manual testing…

That's all I've got.

- Mark

On Nov 29, 2012, at 11:20 PM, Chris Hostetter  wrote:

> 
> Can anyone shed some light on this code in ZkController...
> 
>if (localHostContext.contains("/")) {
>  throw new IllegalArgumentException("localHostContext ("
>  + localHostContext + ") should not contain a /");
>}
> 
> ...i don't really understand this limitation.  There's nothing in the servlet 
> spec that prevents a context path from containing '/' characters -- i can for 
> instance modify the jetty context file that ships with solr like so and jetty 
> will happily run solr rooted at http://localhost:8983/solr/hoss/man ...
> 
> 
> hossman@frisbee:~/lucene/dev$ svn diff solr/example/contexts/solr.xml
> Index: solr/example/contexts/solr.xml
> ===
> --- solr/example/contexts/solr.xml(revision 1415493)
> +++ solr/example/contexts/solr.xml(working copy)
> @@ -1,8 +1,8 @@
> 
>  "http://www.eclipse.org/jetty/configure.dtd";>
> 
> -  /solr
> +  /solr/hoss/man
>   /webapps/solr.war
>name="jetty.home"/>/etc/webdefault.xml
>default="."/>/solr-webapp
> -
> \ No newline at end of file
> +
> 
> 
> 
> My best guesses as to the intent of this code are:
> 
> 1) that it was really ment to ensure the localHostContext didn't *start* with 
> a redundent "/"
> 
> 2) that there is some reason why the nodeName shouldn't include slashes, and 
> the nodeName is built using the localHostContext, so the restriction 
> propogates.
> 
> If it's #1 it seems like a trivial bug with an easy fix. #2 doesn't really 
> make sense to me -- but it may just be my ZK ignorance: Aren't nodePaths in 
> ZK hierarchical by nature, so shouldn't allowing "/" be fine? is there some 
> reason introducing multiple "sub directories" (with a single child) in ZK for 
> a single solr node would bad? ... if so then wouldn't a simple solution be to 
> URL encode the localHostContext (or escape the "/" in some other way) when 
> building the nodeName so that we can eliminate this limitation?
> 
> 
> 
> 
> -Hoss



Re: Downloading files from the solr replication Handler

2012-11-30 Thread Erick Erickson
tag me baffled. But these are copied around all the time so I'm guessing an
interaction between your servlet container and your request, which is like
saying "it must be magic". You can tell I'm into places where I'm
clueless

Sorry I can't be more help
Erick


On Fri, Nov 30, 2012 at 4:06 AM, Eva Lacy  wrote:

> Doesn't make much sense if they are in binary files as well.
>
>
> On Thu, Nov 29, 2012 at 10:16 PM, Lance Norskog  wrote:
>
> > Maybe these are text encoding markers?
> >
> > - Original Message -
> > | From: "Eva Lacy" 
> > | To: solr-user@lucene.apache.org
> > | Sent: Thursday, November 29, 2012 3:53:07 AM
> > | Subject: Re: Downloading files from the solr replication Handler
> > |
> > | I tried downloading them with my browser and also with a c#
> > | WebRequest.
> > | If I skip the first and last 4 bytes it seems work fine.
> > |
> > |
> > | On Thu, Nov 29, 2012 at 2:28 AM, Erick Erickson
> > | wrote:
> > |
> > | > How are you downloading them? I suspect the issue is
> > | > with the download process rather than Solr, but I'm just guessing.
> > | >
> > | > Best
> > | > Erick
> > | >
> > | >
> > | > On Wed, Nov 28, 2012 at 12:19 PM, Eva Lacy  wrote:
> > | >
> > | > > Just to add to that, I'm using solr 3.6.1
> > | > >
> > | > >
> > | > > On Wed, Nov 28, 2012 at 5:18 PM, Eva Lacy  wrote:
> > | > >
> > | > > > I downloaded some configuration and data files directly from
> > | > > > solr in an
> > | > > > attempt to develop a backup solution.
> > | > > > I noticed there is some characters at the start and end of the
> > | > > > file
> > | > that
> > | > > > aren't in configuration files, I notice the same characters at
> > | > > > the
> > | > start
> > | > > > and end of the data files.
> > | > > > Anyone with any idea how I can download these files without the
> > | > > > extra
> > | > > > characters or predict how many there are going to be so I can
> > | > > > skip
> > | > them?
> > | > > >
> > | > >
> > | >
> > |
> >
>


Re: inconsistent number of results returned in solr cloud

2012-11-30 Thread Erick Erickson
Just glad it's resolved

Erick


On Thu, Nov 29, 2012 at 7:46 PM, Buttler, David  wrote:

> Sorry, yes, I had been using the BETA version.  I have deleted all of
> that, replaced the jars with the released versions (reduced my core count),
> and now I have consistent results.
> I guess I missed that JIRA ticket, sorry for the false alarm.
> Dave
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, November 23, 2012 4:25 AM
> To: solr-user@lucene.apache.org
> Subject: Re: inconsistent number of results returned in solr cloud
>
> Dave:
>
> I should have asked this first. What version of Solr are you using? I  Not
> sure whether it was fixed in BETA or not (certainly is in the 4.0 GA
> release). There was a problem with adding a doclist via solrj, here's one
> related JIRA, although it wasn't the main fix:
> https://issues.apache.org/jira/browse/SOLR-3001. I suspect that's the
> "known problem" Mark mentioned.
>
> Because what you're seeing _sure_ sounds similar
>
> Best
> Erick
>
>
> On Mon, Nov 19, 2012 at 12:49 PM, Buttler, David 
> wrote:
>
> > Answers inline below
> >
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Saturday, November 17, 2012 6:40 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: inconsistent number of results returned in solr cloud
> >
> > Hmmm, first an aside. If by "commit after every batch of documents "
> > you mean after every call to server.add(doclist), there's no real need
> > to do that unless you're striving for really low latency. the usual
> > recommendation is to use commitWithin when adding and commit only at
> > the very end of the run. This shouldn't actually be germane to your
> > issue, just an FYI.
> >
> > DB> Good point.  The code for committing docs to solr is fairly old.
> > DB> I
> > will update it since I don't have a latency requirement.
> >
> > So you're saying that the inconsistency is permanent? By that I mean
> > it keeps coming back inconsistently for minutes/hours/days?
> >
> > DB> Yes, it is permanent.  I have collections that have been up for
> > DB> weeks,
> > and are still returning inconsistent results, and I haven't been
> > adding any additional documents.
> > DB> Related to this, I seem to have a discrepancy between the number
> > DB> of
> > documents I think I am sending to solr, and the number of documents it
> > is reporting.  I have tried reducing the number of shards for one of
> > my small collections, so I deleted all references to this collections,
> > and reloaded it. I think I have 260 documents submitted (counted from a
> hadoop job).
> >  Solr returns a count of ~430 (it varies), and the first returned
> > document is not consistent.
> >
> > I guess if I were trying to test this I'd need to know how you added
> > subsequent collections. In particular what you did re: zookeeper as
> > you added each collection.
> >
> > DB> These are my steps
> > DB> 1. Create the collection via the HTTP API: http://
> > :/solr/admin/collections?action=CREATE&name=&n
> > umShards=6&%20collection.configName=
> > DB> 2. Relaunch one of my JVM processes, bootstrapping the collection:
> > DB> java -Xmx16g -Dcollection.configName=
> > DB> -Djetty.port=
> > -DzkHost= -Dsolr.solr.home= -DnumShards=6
> > -Dbootstrap_confdir=conf -jar start.jar
> > DB> load data
> >
> > DB> Let me know if something is unclear.  I can run through the
> > DB> process
> > again and document it more carefully.
> > DB>
> > DB> Thanks for looking at it,
> > DB> Dave
> >
> > Best
> > Erick
> >
> >
> > On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David 
> wrote:
> >
> > > My typical way of adding documents is through SolrJ, where I commit
> > > after every batch of documents (where the batch size is
> > > configurable)
> > >
> > > I have now tried committing several times, from the command line
> > > (curl) with and without openSearcher=true.  It does not affect
> anything.
> > >
> > > Dave
> > >
> > > -Original Message-
> > > From: Mark Miller [mailto:markrmil...@gmail.com]
> > > Sent: Friday, November 16, 2012 11:04 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: inconsistent number of results returned in solr cloud
> > >
> > > How did you do the final commit? Can you try a lone commit (with
> > > openSearcher=true) and see if that affects things?
> > >
> > > Trying to determine if this is a known issue or not.
> > >
> > > - Mark
> > >
> > > On Nov 16, 2012, at 1:34 PM, "Buttler, David" 
> wrote:
> > >
> > > > Hi all,
> > > > I buried an issue in my last post, so let me pop it up.
> > > >
> > > > I have a cluster with 10 collections on it.  The first collection
> > > > I
> > > loaded works perfectly.  But every subsequent collection returns an
> > > inconsistent number of results for each query.  The queries can be
> > > simply *:*, or more complex facet queries.  If I go to individual
> > > cores and
> > issue
> > > the query, with distrib=false, I get a

Re: Replication fails in SolrCloud

2012-11-30 Thread Mark Miller
Need more information about your setup and config.

Longer stack traces would be helpful as well.

- Mark

On Nov 30, 2012, at 12:35 AM, yayati  wrote:

> Hi All,
> 
> I also got similar error while moving my solr 3.6 based application on solr
> cloud. While setting solrcloud i got this error :
> SolrCore Initialization Failures
> 
>ns01:
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> ZkSolrResourceLoader does not support getConfigDir() - likely, what you are
> trying to do is not supported in ZooKeeper mode
>ns02:
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> ZkSolrResourceLoader does not support getConfigDir() - likely, what you are
> trying to do is not supported in ZooKeeper mode 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Zookeeper-aware-Replication-in-SolrCloud-tp3479497p4023398.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Replication in SolrCloud

2012-11-30 Thread Mark Miller

On Nov 30, 2012, at 5:08 AM, Arkadi Colson  wrote:

> Hi
> 
> I've setup an simple 2 machine cloud with 1 shard, one replicator and 2 
> collections.Everything went fine. However when I look at the interface: 
> http://localhost:8983/solr/#/coll1/replication is reporting the both machines 
> are master. Did I do something wrong in my config or isit a report for manual 
> replication configuration? Can someone else check this?

How? You don't really give anything to look at :)

> 
> Is it poossible to link 2 collections to the same conf in zookeeper?
> 

Yes, that is no problem.

- Mark



Re: SOLR4 cluster - strange CPU spike on slave

2012-11-30 Thread Erick Erickson
right, so here's what I'd check for.

Your logs should show a replication pretty coincident with the spike and
that should be in the log. Note: the replication should complete just
before the spike.

Or you can just turn replication off and fire it manually to try to force
the situation at will, see:
http://wiki.apache.org/solr/SolrReplication#HTTP_API. (but note that you'll
have to wait until the index has changed on the master to see any action).

So you should be able to create your spike at will. And this will be pretty
normal. When replication happens, a new searcher is opened, caches are
filled, autowarming is done, all kinds of stuff like that. During this
period, the _old_ searcher is still open, which will both cause the CPU to
be busier and require additional memory. Once the new searcher is warmed,
new queries go to it, and when the old searcher has finished serving all
the queries it shuts down and all the resources are freed. Which is why
commits are expensive operations.

All of which means that so far I don't think there's a problem, this is
just normal Solr operation. If you're seeing responsiveness problems when
serving queries you probably want to throw more hardware (particularly
memory) at the problem.

But when thinking about memory allocating to the JVM, _really_ read Uwe's
post here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best
Erick


On Thu, Nov 29, 2012 at 2:39 AM, John Nielsen  wrote:

> Yup you read it right.
>
> We originally intended to do all our indexing to varnish02, replicate to
> varnish01 and then search from varnish01 (through a fail-over ip which
> would switch the reader to varnish02 in case of trouble).
>
> When I saw the spikes, I tried to eliminate possibilities by starting
> searching from varnish02, leaving varnish01 with nothing to do but to
> receive replication data. This did not remove the spikes. As soon as this
> spike is fixed, I will start searching from varnish01 again. These sort of
> debug antics are only possible because, although we do have customers using
> this, we are still in our beta phase.
>
> Varnish01 never receives any manual commit orders. Varnish02 does from time
> to time.
>
> Oh, and I accidentally misinformed you before. (damn secondary language) We
> are actually seeing the spikes on both servers. I was just focusing on
> varnish01 because I use it to eliminate possibilities.
>
> It just occurred to me now; We tried switching off our feeder/index tool
> for 24 hours, and we didn't see any spikes during this period, so receiving
> replication data certainly has something to do with it.
>
> Med venlig hilsen / Best regards
>
> *John Nielsen*
> Programmer
>
>
>
> *MCB A/S*
> Enghaven 15
> DK-7500 Holstebro
>
> Kundeservice: +45 9610 2824
> p...@mcb.dk
> www.mcb.dk
>
>
>
> On Thu, Nov 29, 2012 at 3:20 AM, Erick Erickson  >wrote:
>
> > Am I reading this right? All you're doing on varnish1 is replicating to
> it?
> > You're not searching or indexing? I'm sure I'm misreading this.
> >
> >
> > "The spike, which only lasts for a couple of minutes, sends the disks
> > racing" This _sounds_ suspiciously like segment merging, especially the
> > "disks racing" bit. Or possibly replication. Neither of which make much
> > sense. But is there any chance that somehow multiple commits are being
> > issued? Of course if varnish1 is a slave, that shouldn't be happening
> > either.
> >
> > And the whole bit about nothing going to the logs is just bizarre. I'm
> > tempted to claim hardware gremlins, especially if you see nothing similar
> > on varnish2. Or some other process is pegging the machine. All of which
> is
> > a way of saying "I have no idea"
> >
> > Yours in bewilderment,
> > Erick
> >
> >
> >
> > On Wed, Nov 28, 2012 at 6:15 AM, John Nielsen  wrote:
> >
> > > I apologize for the late reply.
> > >
> > > The query load is more or less stable during the spikes. There are
> always
> > > fluctuations, but nothing on the order of magnitude that could explain
> > this
> > > spike. In fact, the latest spike occured last night when there were
> > almost
> > > noone using it.
> > >
> > > To test a hunch of mine, I tried to deactivate all caches by commenting
> > out
> > > all cache entries in solrconfig.xml. It still spikes, so I dont think
> it
> > > has anything to do with cache warming or hits/misses or anything of the
> > > sort.
> > >
> > > One interesting thing GC though. This is our latest spike with cpu load
> > > (this server has 8 cores, so a load higher than 8 is potentially
> > > troublesome):
> > >
> > > 2012.Nov.27 19:58:182.27
> > > 2012.Nov.27 19:57:174.06
> > > 2012.Nov.27 19:56:188.95
> > > 2012.Nov.27 19:55:1719.97
> > > 2012.Nov.27 19:54:1732.27
> > > 2012.Nov.27 19:53:181.67
> > > 2012.Nov.27 19:52:171.6
> > > 2012.Nov.27 19:51:181.77
> > > 2012.Nov.27 19:50:171.89
> > >
> > > This is what the GC was doing around that time:
> > >
> > > 2012-11-27T19:50:04.933

Re: Inconsistent search results.

2012-11-30 Thread Sohail Aboobaker
Hi,

Thank you for your help. The issue is now resolved after using analysis
tool as suggested by Jack and Chris. We used the following filters in the
end for this field:

  








  

WordDelimiterFilterFactory does the trick for splitting tokens into our
words appropriately.

Thanks to everyone for helping.

Regards,
Sohail Aboobaker.


Edismax query parser and phrase queries

2012-11-30 Thread Tantius, Richard
Hi,
we are using the edismax query parser and execute queries on specific fields by 
using the qf option. Like others, we are facing the problem we do not want 
explicit phrase queries to be performed on some of the qf fields and also 
require additional search fields for those kind of queries.
We tried to expand explicit phrases in a query by implementing some 
pre-processing logic, which did not seemed to be quite convenient.
So for example (lets assume qf="title text", we want phrase queries to be 
performed on the additional fields "titleAlt textAlt" ): q="ran away from home" 
Cat Dog -transformTo-> q=( titleAlt:"ran away from home" OR textAlt:"ran away 
from home" ) Cat Dog. Unfortunately this gets rather complicated if logic 
operators are involved within the query. Is there some kind of best practice, 
should we for example extend the query parser, or stick to our pre-processing 
approach?

Regards,
Richard.

Richard Tantius
Software Engineer

[cid:image001.jpg@01CDCF09.3DA17860]

Gotenstr. 7-9
53175 Bonn
Tel.:+49 (0)228 / 4 22 86 - 38
Fax.:   +49 (0)228 / 4 22 86 - 538
E-Mail:   r.tant...@binserv.de
Web:  www.binserv.de
   www.binforcepro.de

Geschäftsführer: Rüdiger Jakob
Amtsgericht: Siegburg HRB 6765
Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter
Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche 
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige 
Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den 
Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien 
öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen 
Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien 
umgehend. Vielen Dank!




Re: multiple filter query with seperate result sets (in one call)

2012-11-30 Thread Erick Erickson
You might look into joins. Be aware that the sweet spot for joins is when
the field you're joining on doesn't have a huge number of unique values per
document.

But that's about all I can think of offhand

Best
Erick


On Thu, Nov 29, 2012 at 1:29 AM, ninaddesai82  wrote:

> Thanks Erick for replying,
>
> Well, I am actually trying to build a autosuggestion; However functionality
> I need is little bit tricky.
> So, just to give you an idea -
>
> I have certain generic attributes (say category, city etc)
> When use types, I want autosuggest to populate, but while doing that I want
> only results to autopopulate which satisfies filter of my preselected
> attributes (category and city which user had already selected).
>
> So e.g. user selected category food, and city san fransisco
> and if he tries to type lets say "app"
>
> I might have 50 auto population results, out of which I want to search in
> solr, how many returns results pertaining to above said attributes. (other
> wise even if apple store is valid auto population, if its not present in
> food and san fransisco, it wont be valid auto population).
>
> So, I am trying to do background search in solr, where I could find
> validity
> of auto population phrases results and only return valid ones. But to do
> this, I will have to pawn 50 searches (solr) ,adn then logic to check how
> many results are being returned.
> Instead I was hoping ... if I can do only one search for all those 50
> criterias in solr and get grouped results, it will be awesome and
> optimized.
>
> Any other idea you have in mind ?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/multiple-filter-query-with-seperate-result-sets-in-one-call-tp4022912p4023184.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


[Solrj 4.0] No group response

2012-11-30 Thread Roman Slavík

Hi guys,

I have problem with grouping in Solr 4.0 using Solrj api. I need this: 
search some documents limited with solr query, group them by one field 
and return total count of groups.
There is param 'group.ngroups' for adding groups count into group 
response. Sounds easy, so I wrote something like this:


SolrQuery query = new SolrQuery().setQuery(queryString);
query.addField("score");

query.setParam(GroupParams.GROUP, true);
query.setParam(GroupParams.GROUP_MAIN, true);
query.setParam(GroupParams.GROUP_FIELD, "group_field");
query.setParam(GroupParams.GROUP_LIMIT, "1");
query.setParam(GroupParams.GROUP_TOTAL_COUNT, true);

QueryResponse response = 
solrServer.query(query); // contains found docs

GroupResponse groupResponse = response.getGroupResponse(); // null

Search result is ok, QueryResponse contains docs I searched for. But 
group response is always null. Did I miss something, some magic 
parameter for enabling group response?


Thanks for any advice

Roman


Replication happening before replicateAfter event

2012-11-30 Thread Duncan Irvine
Hi All,
  I'm a bit new to the whole solr world and am having a slight problem with
replication.  I'm attempting to configure a master/slave scenario with bulk
updates happening periodically. I'd like to insert a large batch of docs to
the master, then invoke an optimize and have it only then replicate to the
slave.

At present I can create the master index, which seems to go to plan.
 Watching the updateHandler, I see records being added, indexed and
auto-committed every so often.  If I query the master while I am inserting,
and auto-commits have happened I see 0 records.  Then, when I commit at the
end, they all appear at once.  This is as I'd expect.

What doesn't seem to be working right is that I've configured replication
to "replicateAfter" "startup" and "optimize" with a pollInterval of 60s;
however the slave is replicating and serving the "uncommitted" data
(although presumably post-auto-commit).

According to my master, I have:

Version: 0
Gen: 1
Size: 1.53GB
replicateAfter: optimize, startup

And, at present, my slave says:
Master:
  Version: 0
  Gen: 1
  Size: 1.53GB
Slave:
  Version: 1354275651817
  Gen: 52
  Size: 1.39GB

Which is a bit odd.
If I query the slave, I get results and as the slave polls I gradually get
more and more.

Obviously, I can disable polling and enable it programmatically once I'm
ready, but I was hoping to avoid that.

Does anyone have any thoughts?

Cheers,
  Duncan.


Re: solr asp.net integration challenge

2012-11-30 Thread Paul Tester
Yes we do use paginating, we show 10 or 15 results but the user has an
option to select them all(count of the queryresult which is returned by
solr). When he uses this functionality we need all the selected doc id's(or
orginal primary keys from the database) in the asp.net application as fast
as possible. So i'm looking for a solution where the results(only the id's)
are shared between solr on tomcat and the asp.net application in IIS other
than over HTTP.



On Fri, Nov 30, 2012 at 2:48 AM, Gora Mohanty  wrote:

> On 27 November 2012 21:19, Paul Tester  wrote:
> > Hi all,
> >
> > At our company we have an asp.net webapplication hosted in IIS 7.5. This
> > application have a search module which is using solr. For communication
> > with the solr instance we use a 3th party plugin. For every search we
> show
> > the total count of the results and also 10 or 15 records. What we're now
> > trying to achieve is that the user can select all the records from his
> > search, which involves that all the doc ids should be available in the
> > asp.net application in IIS as fast as possible. Our problem is that the
> > count of his search easily can contain 1.000.000 records (or even more),
> > which takes way to long to transport them to the application via a json
> > result over http. So I'm looking for an alternative solution which is way
> > faster.
>
> Retrieving, and displaying, all of a million records is definitely
> going to be slow. Are you not paginating your displayed results?
> If so, you could fetch results from Solr in smaller batches, keeping
> a small window of pages around the current one.
>
> Regards,
> Gora
>


Re: Benchmarking Solr 3.3 vs. 4.0

2012-11-30 Thread Daniel Exner

Shawn Heisey wrote:
[..]


For best results, you'll want to ensure that Solr4 is working completely
from scratch, that it has never seen a 3.3 index, so that it will use
its own native format.
That's why I did in the second run. Thanks for clarifying that this is 
in fact better. :)



It may be a good idea to look into the example
Solr4 config/schema and see whether there are improvements you can
make.  One note: the updateLog feature in the update handler config will
generally cause performance to be lower.  The features that require
updateLog would make this less of an apples to apples comparison, so I
wouldn't enable it unless I knew I needed it.

I'll have a look at the updateLog feature. But I'm pretty sure its disabled.


Unless the lines are labelled wrong in the legend, the graph does show
higher CPU usage during the push, but lower CPU usage during the
optimize and most of the rest of the time.
Slightly, but I was expecting higher latency also. Also raw data shows 
the box is unable to deliver CPU stats to the PerfMon Plugin because of 
high load. Perhaps I was expecting higher changes, but if you say what I 
see is ok, I'm fine.

Can you comment on high CPU load even at low QPS rates?
Is there some parameter to force lower load while testing at the cost of 
higher latencies for better comparison?




The graph shows that Solr4 has lower latency than 3.3 during both the
push and the optimize, as well as most of the rest of the time.  The
latency numbers however are a lot higher than I would expect, seeming to
average out at around 100 seconds (10 ms).  That is terrible
performance from both versions.  On my own Solr installation, which is
distributed and has 78 million documents, I have a median latency of 8
milliseconds and a 95th percentile latency of 248 milliseconds.
OK, I should relabel the y-axis because data is in fact 1000 times to 
high. So latency is more at 10ms which is quite good for high QPS rates.




Is this a 64-bit platform with a 64-bit Java?  How much memory have you
allocated for the java heap?  How big is the index?


The VM I am using is an openSUSE 10.3 (i586), so no 64-bit Java at all 
(but production is using it).

Tomcat Java parameters are:
"-Xms1024m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=256m 
-XX:+UseParallelGC -XX:ParallelGCThreads=4 -XX:GCTimeRatio=10"


Number of docs is 266249 for both indices. Which is quite small, but I 
may be able to use a much larger index and a much better machine in the 
near future.


Greetings
Daniel Exner
--
Daniel Exner
Softwaredevelopment & Applicationsupport
ESEMOS GmbH


Re: Downloading files from the solr replication Handler

2012-11-30 Thread Eva Lacy
Doesn't make much sense if they are in binary files as well.


On Thu, Nov 29, 2012 at 10:16 PM, Lance Norskog  wrote:

> Maybe these are text encoding markers?
>
> - Original Message -
> | From: "Eva Lacy" 
> | To: solr-user@lucene.apache.org
> | Sent: Thursday, November 29, 2012 3:53:07 AM
> | Subject: Re: Downloading files from the solr replication Handler
> |
> | I tried downloading them with my browser and also with a c#
> | WebRequest.
> | If I skip the first and last 4 bytes it seems work fine.
> |
> |
> | On Thu, Nov 29, 2012 at 2:28 AM, Erick Erickson
> | wrote:
> |
> | > How are you downloading them? I suspect the issue is
> | > with the download process rather than Solr, but I'm just guessing.
> | >
> | > Best
> | > Erick
> | >
> | >
> | > On Wed, Nov 28, 2012 at 12:19 PM, Eva Lacy  wrote:
> | >
> | > > Just to add to that, I'm using solr 3.6.1
> | > >
> | > >
> | > > On Wed, Nov 28, 2012 at 5:18 PM, Eva Lacy  wrote:
> | > >
> | > > > I downloaded some configuration and data files directly from
> | > > > solr in an
> | > > > attempt to develop a backup solution.
> | > > > I noticed there is some characters at the start and end of the
> | > > > file
> | > that
> | > > > aren't in configuration files, I notice the same characters at
> | > > > the
> | > start
> | > > > and end of the data files.
> | > > > Anyone with any idea how I can download these files without the
> | > > > extra
> | > > > characters or predict how many there are going to be so I can
> | > > > skip
> | > them?
> | > > >
> | > >
> | >
> |
>