date:20100804

On Wed, Aug 4, 2010 at 3:27 PM, Stanislaw wrote:

> Hi all!
> I cant load my custom queries from the external file, as written here:
> https://issues.apache.org/jira/browse/SOLR-784
>
> This option is seems to be not implemented in current version 1.4.1 of
> Solr.
> It was deleted or it comes first with new version?
>
>
That patch was never committed so it is not available in any release.

-- 
Regards,
Shalin Shekhar Mangar.

Re: analysis tool vs. reality

On Wed, Aug 4, 2010 at 7:52 PM, Robert Muir  wrote:

> I think I agree with Justin here, I think the way analysis tool highlights
> 'matches' is extremely misleading, especially considering it completely
> ignores queryparsing.
>
> it would be better if it put your text in a memoryindex and actually parsed
> the query w/ queryparser, ran it, and used the highlighter to try to show
> any matches.
>
>
+1

-- 
Regards,
Shalin Shekhar Mangar.

Re: Is there a better for solor server side loadbalance?

2010/8/4 Chengyang 

> The default solr solution is client side loadbalance.
> Is there a solution provide the server side loadbalance?
>
>
No. Most of us stick a HTTP load balancer in front of multiple Solr servers.

-- 
Regards,
Shalin Shekhar Mangar.

DIH and Cassandra

2010-08-04 Thread Mark

Is it possible to use DIH with Cassandra either out of the box or with 
something more custom? Thanks

Re: enhancing auto complete

2010-08-04 Thread Avlesh Singh

I preferred to answer this question privately earlier. But I have received
innumerable requests to unveil the architecture. For the benefit of all, I
am posting it here (after hiding as much info as I should, in my company's
interest).

The context: Auto-suggest feature on http://askme.in

*Solr setup*: Underneath are some of the salient features -

   1. TermsComponent is NOT used.
   2. The index is made up of 4 fields of the following types -
   "autocomplete_full", "autocomplete_token", "string" and "text".
   3. "autocomplete_full" uses KeywordTokenizerFactory and
   EdgeNGramFilterFactory. "autocomplete_token" uses WhitespaceTokenizerFactory
   and EdgeNGramFilterFactory. Both of these are Solr text fields with standard
   filters like LowerCaseFilterFactory etc applied during querying and
   indexing.
   4. Standard DataImportHandler and a bunch of sql procedures are used to
   "derive" all suggestable phrases from the system and index them in the above
   mentioned fields.

*Controller setup*: The controller (to handle suggest queries) is a typical
JAVA servlet using Solr as its backend (connecting via solrj). Based on the
incoming query string, a lucene query is created. It is BooleanQuery
comprising of TermQuery across all the above mentioned fields. The boost
factor to each of these term queries would determine (to an extent) what
kind of matches do you prefer to show up first. JSON is used as the data
exchange format.

*Frontend setup*: It is a home grown JS to address some specific use cases
of the project in question. One simple exercise with Firebug will spill all
the beans. However, I strongly recommend using jQuery to build (and extend)
the UI component.

Any help beyond this is available, but off the list.

Cheers
Avlesh
@avlesh  | http://webklipper.com

On Tue, Aug 3, 2010 at 10:04 AM, Bhavnik Gajjar <
bhavnik.gaj...@gatewaynintec.com> wrote:

>  Whoops!
>
> table still not looks ok :(
>
> trying to send once again
>
>
> loremLorem ipsum dolor sit amet
> Hieyed ddi lorem ipsum dolor
> test lorem ipsume
> test xyz lorem ipslili
>
> lorem ipLorem ipsum dolor sit amet
> Hieyed ddi lorem ipsum dolor
> test lorem ipsume
> test xyz lorem ipslili
>
> lorem ipsltest xyz lorem ipslili
>
> On 8/3/2010 10:00 AM, Bhavnik Gajjar wrote:
>
> Avlesh,
>
> Thanks for responding
>
> The table mentioned below looks like,
>
> lorem   Lorem ipsum dolor sit amet
>  Hieyed ddi lorem ipsum
> dolor
>  test lorem ipsume
>  test xyz lorem ipslili
>
> lorem ip   Lorem ipsum dolor sit amet
>  Hieyed ddi lorem ipsum
> dolor
>  test lorem ipsume
>  test xyz lorem ipslili
>
> lorem ipsl test xyz lorem ipslili
>
>
> Yes, [http://askme.in] looks good!
>
> I would like to know its designs/solr configurations etc.. Can you
> please provide me detailed views of it?
>
> In [http://askme.in], there is one thing to be noted. Search text like,
> [business c] populates [Business Centre] which looks OK but, [Consultant
> Business] looks bit odd. But, in general the pointer you suggested is
> great to start with.
>
> On 8/2/2010 8:39 PM, Avlesh Singh wrote:
>
>
>  From whatever I could read in your broken table of sample use cases, I think
>
>
>  you are looking for something similar to what has been done here 
> -http://askme.in; if this is what you are looking do let me know.
>
> Cheers
> Avlesh
> @avlesh   | 
> http://webklipper.com
>
> On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik 
> Gajjar  wrote:
>
>
>
>
>  Hi,
>
> I'm looking for a solution related to auto complete feature for one
> application.
>
> Below is a list of texts from which auto complete results would be
> populated.
>
> Lorem ipsum dolor sit amet
> tincidunt ut laoreet
> dolore eu feugiat nulla facilisis at vero eros et
> te feugait nulla facilisi
> Claritas est etiam processus
> anteposuerit litterarum formas humanitatis
> fiant sollemnes in futurum
> Hieyed ddi lorem ipsum dolor
> test lorem ipsume
> test xyz lorem ipslili
>
> Consider below table. First column describes user entered value and
> second column describes expected result (list of auto complete terms
> that should be populated from Solr)
>
> lorem
> *Lorem* ipsum dolor sit amet
> Hieyed ddi *lorem* ipsum dolor
> test *lorem *ipsume
> test xyz *lorem *ipslili
> lorem ip
> *Lorem ip*sum dolor sit amet
> Hieyed ddi *lorem ip*sum dolor
> test *lorem ip*sume
> test xyz *lorem ip*slili
> lorem

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420


Thanks man i haven't tried this but where do put that xml configuration. Is
it to the web.xml in solr?

Cheers,
Hando
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023188.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread jayendra patil

The sole home is configured in the web.xml of the application which points
to the folder having the conf files and the data directory

   solr/home
   D:/multicore
   java.lang.String


Regards,
Jayendra

On Wed, Aug 4, 2010 at 12:21 PM, Hando420  wrote:

>
> Thanks man i haven't tried this but where do put that xml configuration. Is
> it to the web.xml in solr?
>
> Cheers,
> Hando
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023188.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

can't use strdist as functionquery?

2010-08-04 Thread solr-user


I want to sort my results by how closely a given resultset field matches a
given string.

For example, say I am searching for a given product, and the product can be
found in many cities including "seattle".  I want to sort the results so
that results from city of "seattle" are at the top, and all other results
below that

I thought that I could do so by using strdist as a functionquery (I am using
solr 1.4 so I cant directly sort on strdist) but am having problems with the
syntax of the query because functionqueries require double quotes and so
does strdist.

My current query, which fails with an NPE, looks something like this:

http://localhost:8080/solr/select?q=(product:"foo")
_val_:"strdist("seattle",city,edit)"&sort=score%20asc&fl=product, city,
score

I have tried various types of URL encoding (ie using %22 instead of double
quotes in the strdist function), but no success.

Any ideas??  Is there a better way to accomplish this sorting??

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1023390.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420


Thanks now its clear and works fine.

Regards,
Hando
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023404.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sharing index files between multiple JVMs and replication

2010-08-04 Thread Kelly Taylor

Is anybody else encountering these same issues; IF having a similar setup?  And 
is there a way to configure certain Solr web-apps as read-only (basically dummy 
instances) so that index changes are not allowed?

- Original Message 
From: Kelly Taylor 
To: solr-user@lucene.apache.org
Sent: Tue, August 3, 2010 5:48:11 PM
Subject: Re: Sharing index files between multiple JVMs and replication

Yes, they are on a common file server, and I've been sharing the same index 
directory between the Solr JVMs. But I seem to be hitting a wall when 
attempting 

to use just one instance for changing the index.

With Solr replication disabled, I stream updates to the one instance, and this 
process hangs whenever there are additional Solr JVMs started up with the same 
configuration in solrconfig.xml  -  So I then tried, to no avail, using a 
different configuration, solrconfig-readonly.xml where the updateHandler was 
commmented out, all /update* requestHandlers removed, mainIndex locktype of 
none, etc.

And with Solr replication enabled, the Slave seems to hang, or at least report 
unusually long time estimates for the current running replication process to 
complete. 

-Kelly

- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Tue, August 3, 2010 4:56:58 PM
Subject: Re: Sharing index files between multiple JVMs and replication

Are these files on a common file server? If you want to share them
that way, it actually does work just to give them all the same index
directory, as long as only one of them changes it.

On Tue, Aug 3, 2010 at 4:38 PM, Kelly Taylor  wrote:
> Is there a way to share index files amongst my multiple Solr web-apps, by
> configuring only one of the JVMs as an indexer, and the remaining, as 
read-only
> searchers?
>
> I'd like to configure in such a way that on startup of the read-only 
searchers,
> missing cores/indexes are not created, and updates are not handled.
>
> If I can get around the files being locked by the read-only instances, I 
should
> be able to scale wider in a given environment, as well as have less replicated
> copies of my master index (Solr 1.4 Java Replication).
>
> Then once the commit is issued to the slave, I can fire off a RELOAD script 
for
> each of my read-only cores.
>
> -Kelly
>
>
>
>
>

-- 
Lance Norskog
goks...@gmail.com

Re: analysis tool vs. reality

2010-08-04 Thread Chris Hostetter


: I think I agree with Justin here, I think the way analysis tool highlights
: 'matches' is extremely misleading, especially considering it completely
: ignores queryparsing.

it really only attempts to identify when there is overlap between 
analaysis at query time and at indexing time so you can easily spot when 
one analyzer or the other "breaks" things so that they no longer line up 
(or when it "fiexes" things so they start to line up)

Even if we eliminated that highlighting as missleading, people would still 
do it in thier minds, it would just be harder -- it doesn't change the 
underlying fact that analysis is only part of the picture.

: it would be better if it put your text in a memoryindex and actually parsed
: the query w/ queryparser, ran it, and used the highlighter to try to show
: any matches.

Thta level of "query explanation" really only works if the user gives us a 
full document (all fields, not just one) and a full query string, and all 
of the possible query params -- because the query parser (either implicit 
because of config, or explicitly specified by the user) might change it's 
behavior based on those other params.

I agree with you: debugging functionality along hte lines of what you are 
describing would be *VASTLY* more useful then what we've got right now, 
and is something i breifly looked into doing before as an extension of the 
existing DebugComponent...

   https://issues.apache.org/jira/browse/SOLR-1749

...the problems i encountered trying to do it as a debug component on 
a "real" Solr request seem like they would also be problems for a 
MemoryIndex based "admin tool" approach like what you suggest -- but if 
you've got ideas on working arround them i am 100% interested.

Independent of how we might create a better "QueryPasrser + Analyssis 
Explanation" tool / debug component is hte question of what we can do to 
make it more clear what exactly the analysis.jsp page is doing and what 
people can infer from that page.  As i said, i don't think removing the 
"match" highlighting will actaully reduce confusion, but perhaps there is 
verbage/disclaimers that could be added to make it more clear?



-Hoss

Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir

On Wed, Aug 4, 2010 at 1:45 PM, Chris Hostetter wrote:

>
> it really only attempts to identify when there is overlap between
> analaysis at query time and at indexing time so you can easily spot when
> one analyzer or the other "breaks" things so that they no longer line up
> (or when it "fiexes" things so they start to line up)
>

It attempts badly, because it only "works" in the most trivial of cases
(e.g. doesnt reflect the interaction of queryparser with multiword synonyms
or worddelimiterfilter).

Since Solr includes these non-trivial analysis components *in the example*
it means that this 'highlight matches' doesnt actually even really work at
all.

Someone is gonna use this thing when they dont understand why analysis isnt
doing what they want, i.e. the cases like I outlined above.

For the trivial cases where it does "work" the 'highlight matches' isnt
useful anyway, so in its current state its completely unnecessary.

> Even if we eliminated that highlighting as missleading, people would still
> do it in thier minds, it would just be harder -- it doesn't change the
> underlying fact that analysis is only part of the picture.
>

I'm not suggesting that. I'm suggesting fixing the highlighting so its not
misleading. There are really only two choices:
1. remove the current highlighting
2. fix it.

in its current state its completely useless and misleading, except for very
trivial cases, in which you dont need it anyway.

>
> : it would be better if it put your text in a memoryindex and actually
> parsed
> : the query w/ queryparser, ran it, and used the highlighter to try to show
> : any matches.
>
> Thta level of "query explanation" really only works if the user gives us a
> full document (all fields, not just one) and a full query string, and all
> of the possible query params -- because the query parser (either implicit
> because of config, or explicitly specified by the user) might change it's
> behavior based on those other params.
>

thats true, but I dont see why the user couldnt be allowed to provide just
this.
I'd bet money a lot of people are using this thing with a specific
query/document in mind anyway!

> people can infer from that page.  As i said, i don't think removing the
> "match" highlighting will actaully reduce confusion, but perhaps there is
> verbage/disclaimers that could be added to make it more clear?
>

 As i said before, I think i disagree with you. I think for stuff like this
the technicals are less important, whats important is this is a misleading
checkbox that really confuses users.

I suggest disabling it entirely, you are only going to remove confusion.

-- 
Robert Muir
rcm...@gmail.com

Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir

Furthermore, I would like to add its not just the highlight matches
functionality that is horribly broken here, but the output of the analysis
itself is misleading.

lets say i take 'textTight' from the example, and add the following synonym:

this is broken => broke

the query time analysis is wrong, as it clearly shows synonymfilter
collapsing "this is broken" to broke, but in reality with the qp for that
field, you are gonna get 3 separate tokenstreams and this will never
actually happen (because the qp will divide it up on whitespace first)

So really the output from 'Query Analyzer' is completely bogus.

On Wed, Aug 4, 2010 at 1:57 PM, Robert Muir  wrote:

>
>
> On Wed, Aug 4, 2010 at 1:45 PM, Chris Hostetter 
> wrote:
>
>>
>> it really only attempts to identify when there is overlap between
>> analaysis at query time and at indexing time so you can easily spot when
>> one analyzer or the other "breaks" things so that they no longer line up
>> (or when it "fiexes" things so they start to line up)
>>
>
> It attempts badly, because it only "works" in the most trivial of cases
> (e.g. doesnt reflect the interaction of queryparser with multiword synonyms
> or worddelimiterfilter).
>
> Since Solr includes these non-trivial analysis components *in the example*
> it means that this 'highlight matches' doesnt actually even really work at
> all.
>
> Someone is gonna use this thing when they dont understand why analysis isnt
> doing what they want, i.e. the cases like I outlined above.
>
> For the trivial cases where it does "work" the 'highlight matches' isnt
> useful anyway, so in its current state its completely unnecessary.
>
>
>> Even if we eliminated that highlighting as missleading, people would still
>> do it in thier minds, it would just be harder -- it doesn't change the
>> underlying fact that analysis is only part of the picture.
>>
>
> I'm not suggesting that. I'm suggesting fixing the highlighting so its not
> misleading. There are really only two choices:
> 1. remove the current highlighting
> 2. fix it.
>
> in its current state its completely useless and misleading, except for very
> trivial cases, in which you dont need it anyway.
>
>
>>
>> : it would be better if it put your text in a memoryindex and actually
>> parsed
>> : the query w/ queryparser, ran it, and used the highlighter to try to
>> show
>> : any matches.
>>
>> Thta level of "query explanation" really only works if the user gives us a
>> full document (all fields, not just one) and a full query string, and all
>> of the possible query params -- because the query parser (either implicit
>> because of config, or explicitly specified by the user) might change it's
>> behavior based on those other params.
>>
>
> thats true, but I dont see why the user couldnt be allowed to provide just
> this.
> I'd bet money a lot of people are using this thing with a specific
> query/document in mind anyway!
>
>
>> people can infer from that page.  As i said, i don't think removing the
>> "match" highlighting will actaully reduce confusion, but perhaps there is
>> verbage/disclaimers that could be added to make it more clear?
>>
>
>  As i said before, I think i disagree with you. I think for stuff like this
> the technicals are less important, whats important is this is a misleading
> checkbox that really confuses users.
>
> I suggest disabling it entirely, you are only going to remove confusion.
>
>
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Robert Muir
rcm...@gmail.com

Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Ken Krugler


Hi Geert-Jan,

On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:

Field Collapsing (currently as patch) is exactly what you're looking  
for

imo.

http://wiki.apache.org/solr/FieldCollapsing


Thanks for the ref, good stuff.

I think it's close, but if I understand this correctly, then I could  
get (using just top two, versus top 10 for simplicity) results that  
looked like


"dog training" (faceted field value A)
"super dog" (faceted field value B)

but if the actual faceted field value/hit counts were:

C (10)
D (8)
A (2)
B (1)

Then what I'd want is the top hit for "dog AND facet field:C",  
followed by "dog AND facet field:D".


Used field collapsing would improve the probability that if I asked  
for the top 100 hits, I'd find entries for each of my top N faceted  
field values.


Thanks again,

-- Ken

I've got a situation where the key result from an initial search  
request
(let's say for "dog") is the list of values from a faceted field,  
sorted by

hit count.

For the top 10 of these faceted field values, I need to get the top  
hit for
the target request ("dog") restricted to that value for the faceted  
field.


Currently this is 11 total requests, of which the 10 requests  
following the
initial query can be made in parallel. But that's still a lot of  
requests.


So my questions are:

1. Is there any magic query to handle this with Solr as-is?

2. if not, is the best solution to create my own request handler?

3. And in that case, any input/tips on developing this type of custom
request handler?

Thanks,

-- Ken



Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: DIH and Cassandra

2010-08-04 Thread Andrei Savu

DIH only works with relational databases and XML files [1], you need
to write custom code in order to index data from Cassandra.

It should be pretty easy to map documents from Cassandra to Solr.
There are a lot of client libraries available [2] for Cassandra.

[1] http://wiki.apache.org/solr/DataImportHandler
[2] http://wiki.apache.org/cassandra/ClientOptions

On Wed, Aug 4, 2010 at 6:41 PM, Mark  wrote:
> Is it possible to use DIH with Cassandra either out of the box or with
> something more custom? Thanks
>

-- 
Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr

Re: DIH and Cassandra

2010-08-04 Thread Andrei Savu

DIH only works with relational databases and XML files [1], you need
to write custom code in order to index data from Cassandra.

It should be pretty easy to map documents from Cassandra to Solr.
There are a lot of client libraries available [2] for Cassandra.

[1] http://wiki.apache.org/solr/DataImportHandler
[2] http://wiki.apache.org/cassandra/ClientOptions

On Wed, Aug 4, 2010 at 6:41 PM, Mark  wrote:
> Is it possible to use DIH with Cassandra either out of the box or with
> something more custom? Thanks
>

-- 
Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr

Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Andrei Savu

Check this article [1] that explains how to setup haproxy to do load
balacing. The steps are the same even if you are not using Drupal.  By
using this approach you can easily add more replicas without changing
the application configuration files.

You should also check SolrCloud [2] which does automatic load
balancing and fail-over for queries. This branch is still under
development.

[1] 
http://davehall.com.au/blog/dave/2010/03/13/solr-replication-load-balancing-haproxy-and-drupal
[2] http://wiki.apache.org/solr/SolrCloud

2010/8/4 Chengyang :
- Hide quoted text -
> The default solr solution is client side loadbalance.
> Is there a solution provide the server side loadbalance?
>
>

-- 
Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr

Solrj ContentStreamUpdateRequest Slow

2010-08-04 Thread Tod

I'm running a slight variation of the example code referenced below and 
it takes a real long time to finally execute.  In fact it hangs for a 
long time at solr.request(up) before finally executing.  Is there 
anything I can look at or tweak to improve performance?


I am also indexing a local pdf file, there are no firewall issues, solr 
is running on the same machine, and I tried the actual host name in 
addition to localhost but nothing helps.



Thanks - Tod

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample

Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Geert-Jan Brits

If I understand correctly: you want to sort your collapsed results by 'nr of
collapsed results'/ hits.

It seems this can't be done out-of-the-box using this patch (I'm not
entirely sure, at least it doesn't follow from the wiki-page. Perhaps best
is to check the jira-issues to make sure this isn't already available now,
but just not updated on the wiki)

Also I found a blogpost (from the patch creator afaik) with in the comments
someone with the same issue + some pointers.
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/

hope that helps,
Geert-jan

2010/8/4 Ken Krugler 

> Hi Geert-Jan,
>
>
> On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:
>
>  Field Collapsing (currently as patch) is exactly what you're looking for
>> imo.
>>
>> http://wiki.apache.org/solr/FieldCollapsing
>>
>
> Thanks for the ref, good stuff.
>
> I think it's close, but if I understand this correctly, then I could get
> (using just top two, versus top 10 for simplicity) results that looked like
>
> "dog training" (faceted field value A)
> "super dog" (faceted field value B)
>
> but if the actual faceted field value/hit counts were:
>
> C (10)
> D (8)
> A (2)
> B (1)
>
> Then what I'd want is the top hit for "dog AND facet field:C", followed by
> "dog AND facet field:D".
>
> Used field collapsing would improve the probability that if I asked for the
> top 100 hits, I'd find entries for each of my top N faceted field values.
>
> Thanks again,
>
> -- Ken
>
>
>  I've got a situation where the key result from an initial search request
>>> (let's say for "dog") is the list of values from a faceted field, sorted
>>> by
>>> hit count.
>>>
>>> For the top 10 of these faceted field values, I need to get the top hit
>>> for
>>> the target request ("dog") restricted to that value for the faceted
>>> field.
>>>
>>> Currently this is 11 total requests, of which the 10 requests following
>>> the
>>> initial query can be made in parallel. But that's still a lot of
>>> requests.
>>>
>>> So my questions are:
>>>
>>> 1. Is there any magic query to handle this with Solr as-is?
>>>
>>> 2. if not, is the best solution to create my own request handler?
>>>
>>> 3. And in that case, any input/tips on developing this type of custom
>>> request handler?
>>>
>>> Thanks,
>>>
>>> -- Ken
>>>
>>
> 
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>

Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Ken Krugler

Hi Geert-jan,

On Aug 4, 2010, at 12:04pm, Geert-Jan Brits wrote:

If I understand correctly: you want to sort your collapsed results
by 'nr of

collapsed results'/ hits.

It seems this can't be done out-of-the-box using this patch (I'm not
entirely sure, at least it doesn't follow from the wiki-page.
Perhaps best
is to check the jira-issues to make sure this isn't already
available now,

but just not updated on the wiki)

Also I found a blogpost (from the patch creator afaik) with in the
comments

someone with the same issue + some pointers.
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/

Yup, that's the one -
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/comment-page-1/#comment-1249

So with some modifications to that patch, it could work...thanks for
the info!

-- Ken

2010/8/4 Ken Krugler

Hi Geert-Jan,

On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:

Field Collapsing (currently as patch) is exactly what you're
looking for

imo.

http://wiki.apache.org/solr/FieldCollapsing

Thanks for the ref, good stuff.

I think it's close, but if I understand this correctly, then I
could get
(using just top two, versus top 10 for simplicity) results that
looked like

"dog training" (faceted field value A)
"super dog" (faceted field value B)

but if the actual faceted field value/hit counts were:

C (10)
D (8)
A (2)
B (1)

Then what I'd want is the top hit for "dog AND facet field:C",
followed by

"dog AND facet field:D".

Used field collapsing would improve the probability that if I asked
for the
top 100 hits, I'd find entries for each of my top N faceted field
values.

Thanks again,

-- Ken

I've got a situation where the key result from an initial search
request
(let's say for "dog") is the list of values from a faceted field,
sorted

by
hit count.

For the top 10 of these faceted field values, I need to get the
top hit

for
the target request ("dog") restricted to that value for the faceted
field.

Currently this is 11 total requests, of which the 10 requests
following

the
initial query can be made in parallel. But that's still a lot of
requests.

So my questions are:

1. Is there any magic query to handle this with Solr as-is?

2. if not, is the best solution to create my own request handler?

3. And in that case, any input/tips on developing this type of
custom

request handler?

Thanks,

-- Ken

Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g

Indexing boolean value


Im trying to index a boolean location, but for some reason it does not show
up in my indexed data.

data-config.xml







OFFICIALLOCATION is a MSSQL database field of type 'bit'


schema.xml




(im not sure why I would use http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023708.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing fieldvalues with dashes and spaces

Your schema.xml setting for the field is probably tokenizing the punctuation. 
Change the field type to one that doesn't tokenize on punctuation; e.g. use 
"text_ws" and not "text"

-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 3:36 PM
To: solr-user@lucene.apache.org
Subject: Indexing fieldvalues with dashes and spaces

Im having issues with indexing field values containing spaces and dashes.
For example: Im trying to index province names of the Netherlands. Some 
province names contain a "-":
Zuid-Holland
Noord-Holland

my data-config has this:

When I check what has been indexed, I have this:

−

0
0
−

on
0
*:*
2.2
10

−
 −  Nijmegen −  Tuin 
Cafe  1 Gelderland −  
Fotoreportage  −  Gemeentehuis 
 2010-08-04T19:11:51.796Z
Gemeentehuis Nijmegen  −  Utrecht −  Tuin 
Cafe Danszaal  2 Utrecht −  Fotoreportage 
Exclusieve huur  −  Gemeentehuis 
 2010-08-04T19:11:51.796Z
Gemeentehuis Utrecht  −  Bloemendaal −  Strand 
Cafe Danszaal  3 Zuid-Holland
−

Exclusieve huur
Live muziek

−

Strand & Zee

2010-08-04T19:11:51.812Z
Beachclub Vroeger   

So we see that the full field has been indexed:
Zuid-Holland

BUT, when I check the facets via
http://localhost:8983/solr/db/select/?wt=json&indent=on&q=*:*&fl=id,title,city,score,features,official,services&facet=true&facet.field=theme&facet.field=features&facet.field=province&facet.field=services

I get this (snippet):
"facet_counts":{
  "facet_queries":{},
  "facet_fields":{
"theme":[
 "Gemeentehuis",2,
 "&",1,   < a
 "Strand",1,
 "Zee",1],
"features":[
 "cafe",3,
 "danszaal",2,
 "tuin",2,
 "strand",1],
"province":[
 "gelderland",1,
 "holland",1,
 "utrecht",1,
 "zuid",1, <  b
 "zuidholland",1],
"services":[
 "exclusiev",2,
 "fotoreportag",2, <  c
 "huur",2,
 "live",1,  <  d
 "muziek",1]},

There several weird things happen which I have indicated with <===

a. the full field value is "Strand & Zee", but now one facet is "&"
b. the full field value is "Zuid-Holland", but now "zuid" is a separate facet 
c. the full field value is "fotoreportage", but somehow the last character has 
been truncated d. the full field value "live muziek", but now "live" and 
"muziek" have become separate facets

What can I do about this?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023699.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing boolean value

I could be wrong, but I thought bit was an integer. Try changing fieldtype to 
integer.

-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 3:42 PM
To: solr-user@lucene.apache.org
Subject: Indexing boolean value

Im trying to index a boolean location, but for some reason it does not show up 
in my indexed data.

data-config.xml

OFFICIALLOCATION is a MSSQL database field of type 'bit'

schema.xml

(im not sure why I would use http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023708.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing fieldvalues with dashes and spaces


I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

   
   
   
   
   

It has now become:

 "facet_counts":{
  "facet_queries":{},
  "facet_fields":{
"theme":[
 "Gemeentehuis",2,
 "&",1,   < still & is created as separate facet
 "Strand",1,
 "Zee",1],
"features":[
 "Cafe",3,
 "Danszaal",2,
 "Tuin",2,
 "Strand",1],
"province":[
 "Gelderland",1,
 "Utrecht",1,
 "Zuid-Holland",1], < this is now correct
"services":[
 "Exclusieve",2,
 "Fotoreportage",2,
 "huur",2,
 "Live",1, < "Live muziek" is split and separate facets are 
created
 "muziek",1]},
  "facet_dates":{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Markus Jelsma

You shouldn't fetch faceting results from analyzed fields, it will mess with 
your results. Search on analyzed fields but don't retrieve values from them. 
 
-Original message-
From: PeterKerk 
Sent: Wed 04-08-2010 22:15
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

  
  
  
  
  

It has now become:

"facet_counts":{
 "facet_queries":{},
 "facet_fields":{
"theme":[
"Gemeentehuis",2,
"&",1,   < still & is created as separate facet
"Strand",1,
"Zee",1],
"features":[
"Cafe",3,
"Danszaal",2,
"Tuin",2,
"Strand",1],
"province":[
"Gelderland",1,
"Utrecht",1,
"Zuid-Holland",1], < this is now correct
"services":[
"Exclusieve",2,
"Fotoreportage",2,
"huur",2,
"Live",1, < "Live muziek" is split and separate facets are created
"muziek",1]},
 "facet_dates":{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing boolean value


Hi,

I tried that already, so that would make this:




(still not sure what copyField does though)

But even that wont work. I also dont see the officallocation columns indexed
in the documents:
http://localhost:8983/solr/db/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023811.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing fieldvalues with dashes and spaces


Sorry, but Im a newbie to Solr...how would I change my schema.xml to match
your requirements?

And what do you mean by "it will mess with your results"? What will happen
then?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023824.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing fieldvalues with dashes and spaces

Echoing Markus - use the tokenized field to return results, but have a 
duplicate field of fieldtype="string" to show the untokenized results. E.g. 
facet on that field.

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@buyways.nl] 
Sent: Wednesday, August 04, 2010 4:18 PM
To: solr-user@lucene.apache.org
Subject: RE: Indexing fieldvalues with dashes and spaces

You shouldn't fetch faceting results from analyzed fields, it will mess with 
your results. Search on analyzed fields but don't retrieve values from them. 
 
-Original message-
From: PeterKerk 
Sent: Wed 04-08-2010 22:15
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

  
  
  
  
  

It has now become:

"facet_counts":{
 "facet_queries":{},
 "facet_fields":{
"theme":[
"Gemeentehuis",2,
"&",1,   < still & is created as separate facet
"Strand",1,
"Zee",1],
"features":[
"Cafe",3,
"Danszaal",2,
"Tuin",2,
"Strand",1],
"province":[
"Gelderland",1,
"Utrecht",1,
"Zuid-Holland",1], < this is now correct
"services":[
"Exclusieve",2,
"Fotoreportage",2,
"huur",2,
"Live",1, < "Live muziek" is split and separate facets are created
"muziek",1]},
 "facet_dates":{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing boolean value

Copyfield copies the field so you can have multiple versions. Useful to dump 
all fields into one "super" field you can search on, for perf reasons.

If the column isn't being indexed, I'd suggest the problem is in DIH. No 
suggestions as to why, I'm afraid.

-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 4:22 PM
To: solr-user@lucene.apache.org
Subject: RE: Indexing boolean value

Hi,

I tried that already, so that would make this:

(still not sure what copyField does though)

But even that wont work. I also dont see the officallocation columns indexed in 
the documents:
http://localhost:8983/solr/db/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023811.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Markus Jelsma

Hmm, you should first read a bit more on schema design on the wiki and learn 
about indexing and querying Solr.

 

The copyField directive is what is commonly used in a faceted navigation 
system, search on analyzed fields, show faceting results using the primitive 
string field type. With copyField, you can, well, copy the field from one to 
another without it being analyzed by the first - so no chaining is possible, 
which is good. 

 

Let's say you have a city field you want to navigate with, but also search in, 
then you would have an analyzed field for search and a string field for 
displaying the navigation.

 

But, check the wiki on this subject.
 
-Original message-
From: PeterKerk 
Sent: Wed 04-08-2010 22:23
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


Sorry, but Im a newbie to Solr...how would I change my schema.xml to match
your requirements?

And what do you mean by "it will mess with your results"? What will happen
then?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023824.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH and Cassandra

On Wed, Aug 4, 2010 at 9:11 PM, Mark  wrote:

> Is it possible to use DIH with Cassandra either out of the box or with
> something more custom? Thanks
>

It will take some modifications but DIH is built to create denormalized
documents so it is possible.

Also see https://issues.apache.org/jira/browse/SOLR-853

-- 
Regards,
Shalin Shekhar Mangar.

RE: Indexing fieldvalues with dashes and spaces