Re: Restricting search results by field value

2012-12-06 Thread Way Cool
Grouping should work:
group=true&group.field=source_id&group.limit=3&group.main=true

On Thu, Dec 6, 2012 at 2:35 AM, Tom Mortimer  wrote:

> Sounds like it's worth a try! Thanks Andre.
> Tom
>
> On 5 Dec 2012, at 17:49, Andre Bois-Crettez  wrote:
>
> > If you do grouping on source_id, it should be enough to request 3 times
> > more documents than you need, then reorder and drop the bottom.
> >
> > Is a 3x overhead acceptable ?
> >
> >
> >
> > On 12/05/2012 12:04 PM, Tom Mortimer wrote:
> >> Hi everyone,
> >>
> >> I've got a problem where I have docs with a source_id field, and there
> can be many docs from each source. Searches will typically return docs from
> many sources. I want to restrict the number of docs from each source in
> results, so there will be no more than (say) 3 docs from source_id=123 etc.
> >>
> >> Field collapsing is the obvious approach, but I want to get the results
> back in relevancy order, not grouped by source_id. So it looks like I'll
> have to fetch more docs than I need to and re-sort them. It might even be
> better to count source_ids in the client code and drop excess docs that
> way, but the potential overhead is large.
> >>
> >> Is there any way of doing this in Solr without hacking in a custom
> Lucene Collector? (which doesn't look all that straightforward).
> >>
> >> cheers,
> >> Tom
> >>
> >>
> >> --
> >> André Bois-Crettez
> >>
> >> Search technology, Kelkoo
> >> http://www.kelkoo.com/
> >
> > Kelkoo SAS
> > Société par Actions Simplifiée
> > Au capital de € 4.168.964,30
> > Siège social : 8, rue du Sentier 75002 Paris
> > 425 093 069 RCS Paris
> >
> > Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>
>


Re: Couple issues with edismax in 3.5

2012-03-01 Thread Way Cool
Thanks Ahmet! That's good to know someone else also tried to make  phrase
queries to fix multi-word synonym issue. :-)


On Thu, Mar 1, 2012 at 1:42 AM, Ahmet Arslan  wrote:

> > I don't think mm will help here because it defaults to 100%
> > already by the
> > following code.
>
> Default behavior of mm has changed recently. So it is a good idea to
> explicitly set it to 100%. Then all of the search terms must match.
>
> > Regarding multi-word synonym, what is the best way to handle
> > it now? Make
> > it as a phrase with " or adding -  in between?
> > I don't like index time expansion because it adds lots of
> > noises.
>
> Solr wiki advices to use them at index time for various reasons.
>
> "... The recommended approach for dealing with synonyms like this, is to
> expand the synonym when indexing..."
>
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
> However index time synonyms has its own problems as well. If you add a new
> synonym, you need to re-index those documents that contain this  newly
> added synonym.
>
> Also highlighting highlights whole phrases. For example you have :
>us, united states
> Searching for states will highlight both united and stated.
> Not sure but this seems fixed with LUCENE-3668
>
> I was thinking to have query expansion module to handle multi-word
> synonyms at query time only. Either using o.a.l.search.Query manipulation
> or String manipulation. Similar to Lukas' posting here
> http://www.searchworkings.org/forum/-/message_boards/view_message/146097
>
>
>
>


Re: Couple issues with edismax in 3.5

2012-02-29 Thread Way Cool
Thanks Ahmet for your reply.

I don't think mm will help here because it defaults to 100% already by the
following code.

 if (parsedUserQuery != null && doMinMatched) {
String minShouldMatch = solrParams.get(DMP.MM, "100%");
if (parsedUserQuery instanceof BooleanQuery) {
  U.setMinShouldMatch((BooleanQuery)parsedUserQuery,
minShouldMatch);
}
  }

Regarding multi-word synonym, what is the best way to handle it now? Make
it as a phrase with " or adding -  in between?
I don't like index time expansion because it adds lots of noises.

That's good to know Analysis.jsp does not perform actual query parsing. I
was hoping edismax can do something similar to analysis tool because it
shows everything I need for multi-word synonym.

Thanks.

On Wed, Feb 29, 2012 at 1:23 AM, Ahmet Arslan  wrote:

> > 1. Search for 4X6 generated the following parsed query:
> > +DisjunctionMaxQueryid:4 id:x id:6)^1.2) | ((name:4
> > name:x
> > name:6)^1.025) )
> > while the search for "4 X 6" (with space in between)
> > generated the query
> > below: (I like this one)
> > +((DisjunctionMaxQuery((id:4^1.2 | name:4^1.025)
> > +((DisjunctionMaxQuery((id:x^1.2 | name:x^1.025)
> > +((DisjunctionMaxQuery((id:6^1.2 | name:6^1.025)
> >
> > Is that really intentional? The first query is pretty weird
> > because it will
> > return all of the docs with one of 4, x, 6.
>
> Minimum Should Match (mm) parameter is used to control how many search
> terms should match. For example, you can set it to &mm=100%.
>
> Also you can tweak relevancy be setting phrase fields (pf) parameter.
>
> > Any easy way we can force "4X6" search to be the same as "4
> > X 6"?
> >
> > 2. Issue with multi words synonym because edismax separates
> > keywords to
> > multiple words via the line below:
> > clauses = splitIntoClauses(userQuery, false);
> > and seems like edismax doesn't quite respect fieldType at
> > query time, for
> > example, handling stopWords differently than what's
> > specified in schema.
> >
> > For example: I have the following synonym:
> > AAA BBB, AAABBB, AAA-BBB, CCC DDD
> >
> > When I search for "AAA-BBB", it works, however search for
> > "CCC DDD" was not
> > returning results containing AAABBB. What is interesting is
> > that
> > admin/analysis.jsp is returning great results.
>
> Query string is tokenized (according to white spaces) before it reaches
> analyzer. https://issues.apache.org/jira/browse/LUCENE-2605
> That's why multi-word synonyms are not advised to use at query time.
>
> Analysis.jsp does not perform actual query parsing.
>


Couple issues with edismax in 3.5

2012-02-28 Thread Way Cool
Hi, Guys,

I am having the following issues with edismax:

1. Search for 4X6 generated the following parsed query:
+DisjunctionMaxQueryid:4 id:x id:6)^1.2) | ((name:4 name:x
name:6)^1.025) )
while the search for "4 X 6" (with space in between)  generated the query
below: (I like this one)
+((DisjunctionMaxQuery((id:4^1.2 | name:4^1.025)
+((DisjunctionMaxQuery((id:x^1.2 | name:x^1.025)
+((DisjunctionMaxQuery((id:6^1.2 | name:6^1.025)

Is that really intentional? The first query is pretty weird because it will
return all of the docs with one of 4, x, 6.

Any easy way we can force "4X6" search to be the same as "4 X 6"?

2. Issue with multi words synonym because edismax separates keywords to
multiple words via the line below:
clauses = splitIntoClauses(userQuery, false);
and seems like edismax doesn't quite respect fieldType at query time, for
example, handling stopWords differently than what's specified in schema.

For example: I have the following synonym:
AAA BBB, AAABBB, AAA-BBB, CCC DDD

When I search for "AAA-BBB", it works, however search for "CCC DDD" was not
returning results containing AAABBB. What is interesting is that
admin/analysis.jsp is returning great results.


Thanks,

YH


Re: Boost Exact matches on Specific Fields

2011-09-28 Thread Way Cool
I will give str_category more weight than ts_category because we want
str_category to win if they have "exact" matches ( you converted to
lowercase).

On Mon, Sep 26, 2011 at 10:23 PM, Balaji S  wrote:

> Hi
>
>   You mean to say copy the String field to a Text field or the reverse .
> This is the approach I am currently following
>
> Step 1: Created a FieldType
>
>
>  sortMissingLast="true" omitNorms="true">
>
>
>
>
>
> 
>
> Step 2 :  stored="true"/>
>
> Step 3 : 
>
> And in the SOLR Query planning to q=hospitals&qf=body^4.0 title^5.0
> ts_category^10.0 str_category^8.0
>
>
> The One Question I have here is All the above mentioned fields will have
> "Hospital" present in them , will the above approach work to get the exact
> match on the top and bring "Hospitalization" below in the results
>
>
> Thanks
> Balaji
>
>
> On Tue, Sep 27, 2011 at 9:38 AM, Way Cool  wrote:
>
> > If I were you, probably I will try defining two fields:
> > 1. ts_category as a string type
> > 2. ts_category1 as a text_en type
> > Make sure copy ts_category to ts_category1.
> >
> > You can use the following as qf in your dismax:
> > qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0
> > or something like that.
> >
> > YH
> > http://thetechietutorials.blogspot.com/
> >
> >
> > On Mon, Sep 26, 2011 at 2:06 PM, balaji  wrote:
> >
> > > Hi all
> > >
> > >I am new to SOLR and have a doubt on Boosting the Exact Terms to the
> > top
> > > on a Particular field
> > >
> > > For ex :
> > >
> > > I have a text field names ts_category and I want to give more boost
> > to
> > > this field rather than other fields, SO in my Query I pass the
> following
> > in
> > > the QF params "qf=body^4.0 title^5.0 ts_category^21.0" and also sort on
> > > SCORE desc
> > >
> > > When I do a search against "Hospitals" . I get "Hospitalization
> > > Management , Hospital Equipment & Supplies " on Top rather than the
> exact
> > > matches of "Hospitals"
> > >
> > >  So It would be great , If I could be helped over here
> > >
> > >
> > > Thanks
> > > Balaji
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Thanks in Advance
> > > Balaji
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Boost-Exact-matches-on-Specific-Fields-tp3370513p3370513.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>


Re: Searching multiple fields

2011-09-28 Thread Way Cool
It will be nice if we can have dissum in addition to dismax. ;-)

On Tue, Sep 27, 2011 at 9:26 AM, lee carroll
wrote:

> see
>
>
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
>
>
>
> On 27 September 2011 16:04, Mark  wrote:
> > I thought that a similarity class will only affect the scoring of a
> single
> > field.. not across multiple fields? Can anyone else chime in with some
> > input? Thanks.
> >
> > On 9/26/11 9:02 PM, Otis Gospodnetic wrote:
> >>
> >> Hi Mark,
> >>
> >> Eh, I don't have Lucene/Solr source code handy, but I *think* for that
> >> you'd need to write custom Lucene similarity.
> >>
> >> Otis
> >> 
> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >> Lucene ecosystem search :: http://search-lucene.com/
> >>
> >>
> >>> 
> >>> From: Mark
> >>> To: solr-user@lucene.apache.org
> >>> Sent: Monday, September 26, 2011 8:12 PM
> >>> Subject: Searching multiple fields
> >>>
> >>> I have a use case where I would like to search across two fields but I
> do
> >>> not want to weight a document that has a match in both fields higher
> than a
> >>> document that has a match in only 1 field.
> >>>
> >>> For example.
> >>>
> >>> Document 1
> >>> - Field A: "Foo Bar"
> >>> - Field B: "Foo Baz"
> >>>
> >>> Document 2
> >>> - Field A: "Foo Blarg"
> >>> - Field B: "Something else"
> >>>
> >>> Now when I search for "Foo" I would like document 1 and 2 to be
> similarly
> >>> scored however document 1 will be scored much higher in this use case
> >>> because it matches in both fields. I could create a third field and use
> >>> copyField directive to search across that but I was wondering if there
> is an
> >>> alternative way. It would be nice if we could search across some sort
> of
> >>> "virtual field" that will use both underlying fields but not actually
> >>> increase the size of the index.
> >>>
> >>> Thanks
> >>>
> >>>
> >>>
> >
>


Re: Boost Exact matches on Specific Fields

2011-09-26 Thread Way Cool
If I were you, probably I will try defining two fields:
1. ts_category as a string type
2. ts_category1 as a text_en type
Make sure copy ts_category to ts_category1.

You can use the following as qf in your dismax:
qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0
or something like that.

YH
http://thetechietutorials.blogspot.com/


On Mon, Sep 26, 2011 at 2:06 PM, balaji  wrote:

> Hi all
>
>I am new to SOLR and have a doubt on Boosting the Exact Terms to the top
> on a Particular field
>
> For ex :
>
> I have a text field names ts_category and I want to give more boost to
> this field rather than other fields, SO in my Query I pass the following in
> the QF params "qf=body^4.0 title^5.0 ts_category^21.0" and also sort on
> SCORE desc
>
> When I do a search against "Hospitals" . I get "Hospitalization
> Management , Hospital Equipment & Supplies " on Top rather than the exact
> matches of "Hospitals"
>
>  So It would be great , If I could be helped over here
>
>
> Thanks
> Balaji
>
>
>
>
>
>
>
> Thanks in Advance
> Balaji
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Boost-Exact-matches-on-Specific-Fields-tp3370513p3370513.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Any plans to support function queries on score?

2011-09-26 Thread Way Cool
Hi, guys,

Do you have any plans to support function queries on score field? for
example, sort=floor(product(score, 100)+0.5) desc?

So far I am getting the following error:
undefined field score

I can't use subquery in this case because I am trying to use secondary
sorting, however I will be open for that if someone successfully use
another field to boost the results.

Thanks,

YH
http://thetechietutorials.blogspot.com/


Re: Solr Faceting & DIH

2011-08-29 Thread Way Cool
I think you need to setup entity hierarchy with product as a top level
entity and attribute as another entity under product, otherwise the record
#2 and 3 will override the first one.

On Mon, Aug 29, 2011 at 3:52 PM, Aaron Bains  wrote:

> Hello,
>
> I am trying to setup Solr Faceting on products by using the
> DataImportHandler to import data from my database. I have setup my
> data-config.xml with the proper queries and schema.xml with the fields.
> After the import/index is complete I can only search one productid record
> in
> Solr. For example of the three productid '10100039' records there are I am
> only able to search for one of those. Should I somehow disable unique ids?
> What is the best way of doing this?
>
> Below is the schema I am trying to index:
>
> +---+-+-++
> | productid | attributeid | valueid | categoryid |
> +---+-+-++
> |  10100039 |  331100 |1580 |  1 |
> |  10100039 |  331694 |1581 |  1 |
> |  10100039 |33113319 | 1537370 |  1 |
> |  10100040 |  331100 |1580 |  1 |
> |  10100040 |  331694 | 1540230 |  1 |
> |  10100040 |33113319 | 1537370 |  1 |
> +---+-+-++
>
> Thanks!
>


When are you planning to release SolrCloud feature with ZooKeeper?

2011-08-18 Thread Way Cool
Hi, guys,

When are you planning to release the SolrCloud feature with ZooKeeper
currently in trunk? The new admin interface looks great. Great job.

Thanks,

YH


Re: Problem with xinclude in solrconfig.xml

2011-08-10 Thread Way Cool
Sorry for the spam. I just figured it out. Thanks.

On Wed, Aug 10, 2011 at 2:17 PM, Way Cool  wrote:

> Hi, Guys,
>
> Based on the document below, I should be able to include a file under the
> same directory by specifying relative path via xinclude in solrconfig.xml:
> http://wiki.apache.org/solr/SolrConfigXml
>
> However I am getting the following error when I use relative path (absolute
> path works fine though):
> SEVERE: org.xml.sax.SAXParseException: Error attempting to parse XML file
>
> Any ideas?
>
> Thanks,
>
> YH
>


Problem with xinclude in solrconfig.xml

2011-08-10 Thread Way Cool
Hi, Guys,

Based on the document below, I should be able to include a file under the
same directory by specifying relative path via xinclude in solrconfig.xml:
http://wiki.apache.org/solr/SolrConfigXml

However I am getting the following error when I use relative path (absolute
path works fine though):
SEVERE: org.xml.sax.SAXParseException: Error attempting to parse XML file

Any ideas?

Thanks,

YH


Re: Is there anyway to sort differently for facet values?

2011-08-05 Thread Way Cool
That's right. It should work if I already know these values ahead of time,
however I want to use business rules to control display orders for different
search terms. Maybe I have to code it by myself. Thanks everyone.

On Fri, Aug 5, 2011 at 12:25 AM, Jayendra Patil <
jayendra.patil@gmail.com> wrote:

> you can give it a try with the facet.sort.
>
> We had such a requirement for sorting facets by order determined by
> other field and had to resort to a very crude way to get through it.
> We pre-pended the facets values with the order in which it had to be
> displayed ... and used the facet.sort to sort alphabetically.
>
> e.g. prepend Small -> 0_Small, Medium -> 1_Medium, Large -> 2_Large, XL ->
> 3_XL
>
> You would need to handle the display part though.
>
> Surely not the best way, but worked for us.
>
> Regards,
> Jayendra
>
> On Thu, Aug 4, 2011 at 4:38 PM, Sethi, Parampreet
>  wrote:
> > It can be achieved by creating own (app specific) custom comparators for
> > fields defined in schema.xml and having an extra attribute to specify the
> > comparator class in the field tag itself. But it will require changes in
> the
> > Solr to support this feature. (Not sure if it's feasible though just
> > throwing an idea.)
> >
> > -param
> >
> > On 8/4/11 4:29 PM, "Jonathan Rochkind"  wrote:
> >
> >> No, it can not. It just sorts "alphabetically", actually by raw
> byte-order.
> >>
> >> No other facet sorting functionality is available, and it would be
> >> tricky to implement in a performant way because of the way lucene
> >> works.  But it would certainly be useful to me too if someone could
> >> figure out a way to do it.
> >>
> >> On 8/4/2011 2:43 PM, Way Cool wrote:
> >>> Thanks Eric for your reply. I am aware of facet.sort, but I haven't
> used it.
> >>> I will try that though.
> >>>
> >>> Can it handle the values below in the correct order?
> >>> Under 10
> >>> 10 - 20
> >>> 20 - 30
> >>> Above 30
> >>>
> >>> Or
> >>> Small
> >>> Medium
> >>> Large
> >>> XL
> >>> ...
> >>>
> >>> My second question is that if Solr can't do that for the values above
> by
> >>> using facet.sort. Is there any other ways in Solr?
> >>>
> >>> Thanks in advance,
> >>>
> >>> YH
> >>>
> >>> On Wed, Aug 3, 2011 at 8:35 PM, Erick Erickson >wrote:
> >>>
> >>>> have you looked at the facet.sort parameter? The "index" value is what
> I
> >>>> think you want.
> >>>>
> >>>> Best
> >>>> Erick
> >>>> On Aug 3, 2011 7:03 PM, "Way Cool"  wrote:
> >>>>> Hi, guys,
> >>>>>
> >>>>> Is there anyway to sort differently for facet values? For example,
> >>>> sometimes
> >>>>> I want to sort facet values by their values instead of # of docs, and
> I
> >>>> want
> >>>>> to be able to have a predefined order for certain facets as well. Is
> that
> >>>>> possible in Solr we can do that?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> YH
> >
> >
>


Re: What's the best way (practice) to do index distribution at this moment? Hadoop? rsyncd?

2011-08-05 Thread Way Cool
 I will look at that. Thanks Shalin!

On Fri, Aug 5, 2011 at 1:39 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Fri, Aug 5, 2011 at 12:22 AM, Way Cool  wrote:
>
> > Hi, guys,
> >
> > What's the best way (practice) to do index distribution at this moment?
> > Hadoop? or rsyncd (back to 3 years ago ;-)) ?
> >
> >
> See http://wiki.apache.org/solr/SolrReplication
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: What's the best way (practice) to do index distribution at this moment? Hadoop? rsyncd?

2011-08-04 Thread Way Cool
Yes, I am talking about replication feature. I remember I tried rsync 3
years ago with solr 1.2. Just not sure if someone else have done anything
better than that during the last 3 years. ;-) Personally I am thinking about
using Hadoop and ZooKeeper. Has anyone tried those features?
I found a couple links below, but no success on that yet.
http://wiki.apache.org/solr/SolrCloud
http://wiki.apache.org/solr/DeploymentofSolrCoreswithZookeeper

Thanks for your reply Jonathan.

On Thu, Aug 4, 2011 at 2:31 PM, Jonathan Rochkind  wrote:

> I'm not sure what you mean by "index distribution", that could possibly
> mean several things.
>
> But Solr has had a replication feature built into it from 1.4, that can
> probably handle the same use cases as rsync, but better.  So that may be
> what you want.
>
> There are certainly other experiments going on involving various kinds of
> scaling distribution, that I'm not familiar with, including the sharding
> feature, that I'm not very familiar with. I don't know if anyone's tried to
> do anything with hadoop.
>
>
>
>
> On 8/4/2011 2:52 PM, Way Cool wrote:
>
>> Hi, guys,
>>
>> What's the best way (practice) to do index distribution at this moment?
>> Hadoop? or rsyncd (back to 3 years ago ;-)) ?
>>
>> Thanks,
>>
>> Yugang
>>
>>


What's the best way (practice) to do index distribution at this moment? Hadoop? rsyncd?

2011-08-04 Thread Way Cool
Hi, guys,

What's the best way (practice) to do index distribution at this moment?
Hadoop? or rsyncd (back to 3 years ago ;-)) ?

Thanks,

Yugang


Re: Is there anyway to sort differently for facet values?

2011-08-04 Thread Way Cool
Thanks Eric for your reply. I am aware of facet.sort, but I haven't used it.
I will try that though.

Can it handle the values below in the correct order?
Under 10
10 - 20
20 - 30
Above 30

Or
Small
Medium
Large
XL
...

My second question is that if Solr can't do that for the values above by
using facet.sort. Is there any other ways in Solr?

Thanks in advance,

YH

On Wed, Aug 3, 2011 at 8:35 PM, Erick Erickson wrote:

> have you looked at the facet.sort parameter? The "index" value is what I
> think you want.
>
> Best
> Erick
> On Aug 3, 2011 7:03 PM, "Way Cool"  wrote:
> > Hi, guys,
> >
> > Is there anyway to sort differently for facet values? For example,
> sometimes
> > I want to sort facet values by their values instead of # of docs, and I
> want
> > to be able to have a predefined order for certain facets as well. Is that
> > possible in Solr we can do that?
> >
> > Thanks,
> >
> > YH
>


Is there anyway to sort differently for facet values?

2011-08-03 Thread Way Cool
Hi, guys,

Is there anyway to sort differently for facet values? For example, sometimes
I want to sort facet values by their values instead of # of docs, and I want
to be able to have a predefined order for certain facets as well. Is that
possible in Solr we can do that?

Thanks,

YH


Re: fetcher no agents listed in 'http.agent.name' property

2011-07-07 Thread Way Cool
Cool. Glad it worked out.

On Thu, Jul 7, 2011 at 11:22 AM, serenity keningston <
serenity.kenings...@gmail.com> wrote:

> Thank you very much, I never tried to modify the config files from
> /runtime/local/conf .
>
> In Nutch-0.9, we will just modify from /conf  directory. I
> appreciate your time and help.
>
> Merci
>
> On Thu, Jul 7, 2011 at 12:05 PM, Way Cool  wrote:
>
> > Just make sure you did change the files under
> > /runtime/local/conf if you are running from runtime/local.
> >
> > On Thu, Jul 7, 2011 at 8:34 AM, serenity keningston <
> > serenity.kenings...@gmail.com> wrote:
> >
> > > Hello Friends,
> > >
> > >
> > > I am experiencing this error message " fetcher no agents listed in '
> > > http.agent.name' property" when I am trying to crawl with Nutch 1.3
> > > I referred other mails regarding the same error message and tried to
> > change
> > > the nutch-default.xml and nutch-site.xml file details with
> > >
> > > 
> > >  http.agent.name
> > >  My Nutch Spider
> > >  EMPTY
> > > 
> > >
> > > I also filled out the other property details without blank and still
> > > getting
> > > the same error. May I know my mistake ?
> > >
> > >
> > > Serenity
> > >
> >
>


Re: fetcher no agents listed in 'http.agent.name' property

2011-07-07 Thread Way Cool
Just make sure you did change the files under
/runtime/local/conf if you are running from runtime/local.

On Thu, Jul 7, 2011 at 8:34 AM, serenity keningston <
serenity.kenings...@gmail.com> wrote:

> Hello Friends,
>
>
> I am experiencing this error message " fetcher no agents listed in '
> http.agent.name' property" when I am trying to crawl with Nutch 1.3
> I referred other mails regarding the same error message and tried to change
> the nutch-default.xml and nutch-site.xml file details with
>
> 
>  http.agent.name
>  My Nutch Spider
>  EMPTY
> 
>
> I also filled out the other property details without blank and still
> getting
> the same error. May I know my mistake ?
>
>
> Serenity
>


Re: A beginner problem

2011-07-05 Thread Way Cool
You can follow the links below to setup Nutch and Solr:
http://thetechietutorials.blogspot.com/2011/06/solr-and-nutch-integration.html

http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
http://wiki.apache.org/nutch/RunningNutchAndSolr

Of course, more details will be helpful for troubleshooting your env issue.
:-)

Have fun!

On Tue, Jul 5, 2011 at 11:49 AM, Chris Hostetter
wrote:

> : follow a receipe.  So I went to the the solr site, downloaded solr and
> : tried to follow the tutorial.  In the  "example" folder of solr, using
> : "java -jar start.jar " I got:
> :
> : 2011-07-04 13:22:38.439:INFO::Logging to STDERR via
> org.mortbay.log.StdErrLog
> : 2011-07-04 13:22:38.893:INFO::jetty-6.1-SNAPSHOT
> : 2011-07-04 13:22:38.946:INFO::Started SocketConnector@0.0.0.0:8983
>
> if that is everything you got in the logs, then i suspect:
>  a) you download a source release (ie: has "*-src-*" in it's name) in
> which the solr.war app has not yet been compiled)
>  b) you did not run "ant example" to build solr and setup the example
> instance.
>
> If i'm wrong, then yes please more details would be helpful: what exact
> URL did you download?
>
> -Hoss
>


Re: Apache Nutch and Solr Integration

2011-07-05 Thread Way Cool
Sorry, Serenity, somehow I don't see the attachment.

On Tue, Jul 5, 2011 at 11:23 AM, serenity keningston <
serenity.kenings...@gmail.com> wrote:

> Please find attached screenshot
>
>
> On Tue, Jul 5, 2011 at 11:53 AM, Way Cool  wrote:
>
>> Can you let me know when and where you were getting the error? A
>> screen-shot
>> will be helpful.
>>
>> On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston <
>> serenity.kenings...@gmail.com> wrote:
>>
>> > Hello Friends,
>> >
>> >
>> > I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr
>> 3.2
>> > . I did the steps explained in the following two URL's :
>> >
>> > http://wiki.apache.org/nutch/RunningNutchAndSolr
>> >
>> >
>> >
>> http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
>> >
>> >
>> > I downloaded both the softwares, however, I am getting error (*solrUrl
>> is
>> > not set, indexing will be skipped..*) when I am trying to crawl using
>> > Cygwin.
>> >
>> > Can anyone please help me out to fix this issue ?
>> > Else any other website suggesting for Apache Nutch and Solr integration
>> > would be greatly helpful.
>> >
>> >
>> >
>> > Thanks & Regards,
>> > Serenity
>> >
>>
>
>


Re: Dynamic Facets

2011-07-05 Thread Way Cool
Thanks Erik and Darren.
A pre-faceting component (post querying) will be ideal as though maybe a
little performance penalty there. :-) I will try to implement one if no one
has done so.

Darren, I did look at the taxonomy faceting thread. My main concern is that
I want to have dynamic facets to be returned because I don't know what
facets I can specify as a part of query ahead of time, and there are too
many search terms. ;-)

Thanks for help.

On Tue, Jul 5, 2011 at 11:49 AM,  wrote:

>
> You can issue a new facet search as you drill down from your UI.
> You have to specify the fields you want to facet on and they can be
> dynamic.
>
> Take a look at recent threads here on taxonomy faceting for help.
> Also, look here[1]
>
> [1] http://wiki.apache.org/solr/SimpleFacetParameters
>
> On Tue, 5 Jul 2011 11:15:51 -0600, Way Cool 
> wrote:
> > Hi, guys,
> >
> > We have more than 1000 attributes scattered around 700K docs. Each doc
> > might
> > have about 50 attributes. I would like Solr to return up to 20 facets
> for
> > every searches, and each search can return facets dynamically depending
> on
> > the matched docs. Anyone done that before? That'll be awesome if the
> facets
> > returned will be changed after we drill down facets.
> >
> > I have looked at the following docs:
> > http://wiki.apache.org/solr/SimpleFacetParameters
> >
>
> http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr
> >
> > Wondering what's the best way to accomplish that. Any advice?
> >
> > Thanks,
> >
> > YH
>


Dynamic Facets

2011-07-05 Thread Way Cool
Hi, guys,

We have more than 1000 attributes scattered around 700K docs. Each doc might
have about 50 attributes. I would like Solr to return up to 20 facets for
every searches, and each search can return facets dynamically depending on
the matched docs. Anyone done that before? That'll be awesome if the facets
returned will be changed after we drill down facets.

I have looked at the following docs:
http://wiki.apache.org/solr/SimpleFacetParameters
http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr

Wondering what's the best way to accomplish that. Any advice?

Thanks,

YH


Re: Apache Nutch and Solr Integration

2011-07-05 Thread Way Cool
Can you let me know when and where you were getting the error? A screen-shot
will be helpful.

On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston <
serenity.kenings...@gmail.com> wrote:

> Hello Friends,
>
>
> I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2
> . I did the steps explained in the following two URL's :
>
> http://wiki.apache.org/nutch/RunningNutchAndSolr
>
>
> http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
>
>
> I downloaded both the softwares, however, I am getting error (*solrUrl is
> not set, indexing will be skipped..*) when I am trying to crawl using
> Cygwin.
>
> Can anyone please help me out to fix this issue ?
> Else any other website suggesting for Apache Nutch and Solr integration
> would be greatly helpful.
>
>
>
> Thanks & Regards,
> Serenity
>


Re: Getting started with Velocity

2011-07-01 Thread Way Cool
By default, browse is using the following config:

 
   explicit

   
   velocity

   browse
   layout
   Solritas

   edismax
   *:*
   10
   *,score
   
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   
   text,features,name,sku,id,manu,cat
   3

   
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   

   on
   cat
   manu_exact
   ipod
   GB
   1
   cat,inStock
   price
   0
   600
   50
   after
   manufacturedate_dt
   NOW/YEAR-10YEARS
   NOW
   +1YEAR
   before
   after


   
   on
   text features name
   0
   name
 
 
   spellcheck
 
 
  

while the normal search is using the following:


 
   explicit
   10
 
.

Just make sure you have those fields defined in browse also in your doc,
otherwise change to not use dismax. :-)


On Fri, Jul 1, 2011 at 12:51 PM, Chip Calhoun  wrote:

> I'm a Solr novice, so I hope I'm missing something obvious.  When I run a
> search in the Admin view, everything works fine.  When I do the same search
> in http://localhost:8983/solr/browse , I invariably get "0 results found".
>  What am i missing?  Are these not supposed to be searching the same index?
>
> Thanks,
> Chip
>


Re: How to avoid double counting for facet query

2011-06-14 Thread Way Cool
I just checked SolrQueryParser.java from 3.2.0 source. Looks like Yonik
Seeley's changes for
LUCENE-996<https://issues.apache.org/jira/browse/LUCENE-996>is not in.
I will check trunk later. Thanks!

On Tue, Jun 14, 2011 at 5:34 PM, Way Cool  wrote:

> I already checked out facet range query. By the way, I did put the
> facet.range.include as below:
> lower
>
> Couple things I don't like though are:
> 1. It returns the following without end values (I have to re-calculate the
> end values) :
> 
> 20
> 3
> 
> 50.0
> 0.0
> 600.0
> 0
>
> 2. I can't specify custom ranges of values, for example, 1,2,3,4,5,...10,
> 15, 20, 30,40,50,60,80,90,100,200, ..., 600, 800, 900, 1000, 2000, ... etc.
>
> Thanks.
>
>
> On Tue, Jun 14, 2011 at 3:50 PM, Chris Hostetter  > wrote:
>
>>
>> : You can use exclusive range queries which are denoted by curly brackets.
>>
>> that will solve the problem of making the fq exclude a bound, but
>> for the range facet counts you'll want to pay attention to look at
>> facet.range.include...
>>
>> http://wiki.apache.org/solr/SimpleFacetParameters#facet.range.include
>>
>>
>> -Hoss
>>
>
>


Re: How to avoid double counting for facet query

2011-06-14 Thread Way Cool
I already checked out facet range query. By the way, I did put the
facet.range.include as below:
lower

Couple things I don't like though are:
1. It returns the following without end values (I have to re-calculate the
end values) :

20
3

50.0
0.0
600.0
0

2. I can't specify custom ranges of values, for example, 1,2,3,4,5,...10,
15, 20, 30,40,50,60,80,90,100,200, ..., 600, 800, 900, 1000, 2000, ... etc.

Thanks.

On Tue, Jun 14, 2011 at 3:50 PM, Chris Hostetter
wrote:

>
> : You can use exclusive range queries which are denoted by curly brackets.
>
> that will solve the problem of making the fq exclude a bound, but
> for the range facet counts you'll want to pay attention to look at
> facet.range.include...
>
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.range.include
>
>
> -Hoss
>


Re: Modifying Configuration from a Browser

2011-06-14 Thread Way Cool
+1 Good idea! I was thinking to write a web interface to change contents for
elevate.xml and feed back to Solr core.

On Tue, Jun 14, 2011 at 1:51 PM, Markus Jelsma
wrote:

> There is no API. Upload and restart the core is the way to go.
>
> > Does anyone have any examples of modifying a configuration file, like
> > "elevate.xml" from a browser? Is there an API that would help for this?
> >
> > If nothing exists for this, I am considering implementing something that
> > would change the "elevate.xml" file then reload the core. Or is there a
> > better approach for dynamic configuration?
> >
> > Thank you.
>


Re: How to avoid double counting for facet query

2011-06-14 Thread Way Cool
That's good to know. From the ticket, looks like the fix will be in 4.0
then?

Currently I can see {} and [] worked, but not combined for Solr 3.1. I will
try 3.2 soon. Thanks.

On Tue, Jun 14, 2011 at 2:07 PM, Ahmet Arslan  wrote:

> > You sure Solr supports that?
> > I am getting exceptions by doing that. Ahmet, do you
> > remember where you see
> > that document? Thanks.
>
> I tested it with trunk.
> https://issues.apache.org/jira/browse/SOLR-355
> https://issues.apache.org/jira/browse/LUCENE-996
>
>


Re: How to avoid double counting for facet query

2011-06-14 Thread Way Cool
You sure Solr supports that?
I am getting exceptions by doing that. Ahmet, do you remember where you see
that document? Thanks.



On Tue, Jun 14, 2011 at 1:58 PM, Way Cool  wrote:

> Thanks! That's what I was trying to find.
>
>
> On Tue, Jun 14, 2011 at 1:48 PM, Ahmet Arslan  wrote:
>
>> > 23
>> > 1
>> > 
>> > ...
>> > *
>> >
>> > As you notice, the number of the results is 23, however an
>> > extra doc was
>> > found in the 160-200 range.
>> >
>> > Any way I can avoid double counting issue?
>>
>> You can use exclusive range queries which are denoted by curly brackets.
>>
>> price:[110 TO 160}
>> price:[160 TO 200}
>>
>
>


Re: How to avoid double counting for facet query

2011-06-14 Thread Way Cool
Thanks! That's what I was trying to find.

On Tue, Jun 14, 2011 at 1:48 PM, Ahmet Arslan  wrote:

> > 23
> > 1
> > 
> > ...
> > *
> >
> > As you notice, the number of the results is 23, however an
> > extra doc was
> > found in the 160-200 range.
> >
> > Any way I can avoid double counting issue?
>
> You can use exclusive range queries which are denoted by curly brackets.
>
> price:[110 TO 160}
> price:[160 TO 200}
>


How to avoid double counting for facet query

2011-06-14 Thread Way Cool
Hi, guys,

I fixed Solr search UI (solr/browse) to display the price range facet values
via
http://thetechietutorials.blogspot.com/2011/06/fix-price-facet-display-in-solr-search.htm
l:

   - Under 
50
   (1331)
   - [50.0 TO 
100]
   (133)
   - [100.0 TO 
150]
   (31)
   - [150.0 TO 
200]
   (7)
   - [200.0 TO 
250]
   (2)
   - [250.0 TO 
300]
   (5)
   - [300.0 TO 
350]
   (3)
   - [350.0 TO 
400]
   (6)
   - [400.0 TO 
450]
   (1)
   - 
600.0+(1)

However I am having double counting issue.

Here is the URL to only return docs whose prices are in between 110.0 and
160.0 and price facets:
http://localhost:8983/solr/select/?q=Shakespeare&version=2.2&rows=0&*
fq=price:[110.0+TO+160]*&*
facet.query=price:[110%20TO%20160]&facet.query=price:[160%20TO%20200]*
&facet.field=price

The response is as below:
*


23
1

...
*

As you notice, the number of the results is 23, however an extra doc was
found in the 160-200 range.

Any way I can avoid double counting issue? Or does anyone have similar
issues?

Thanks,

YH


FYI: How to build and start Apache Solr admin app from source with Maven

2011-06-10 Thread Way Cool
Hi, guys,

FYI: Here is the link to how to build and start Apache Solr admin app from
source with Maven just in case you might be interested:
http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html

Have fun.

YH