Re: Search and Entity structure

2012-10-26 Thread adityab
Hi Vijith,

See if this solution solves your problem. There might be other ways this is
the one i have on top of my mind at this hour.

You might be having and ID for each qualification. then have the relation
using dotted notation.
1 = MBA, 2 = LEAD etc. 

 
1.A 
2.B 
 


the format is like X.Y where X is QualificationID and Y is grade value. If
you have ID for Grade value too then use it in Y instead of actual Grade
value. 
So in solr when searching for MBA with A Grad you can query "q=grade:1.A" .
This should give you the result. 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-and-Entity-structure-tp4015890p4015996.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH nested entities don't work

2012-10-26 Thread mroosendaal
Hi,

I tried giving the pdt_id from the subentity a specific value and it worked.
Only now every product has the same value.

I tried a different subentity with the construct
subentity.pdt_id='${entity.pdt_id}' 
with the same result as above, all products had a songtitle with the same
value.

So what am i doing wrong?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4015997.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re:Facet date/range + facet.mincount + distributed search issue

2012-10-26 Thread Dovao Jimenez, Oscar
Dear all,

Using facet date/range on a date typed field and on a distributed search 
between schema compatible cores, the use of facet.mincount=1 brings a cut down 
number of facet values (over 500 facet values expected, 5 facet values 
retrieved). I wonder whether facet.mincount is supported on distributed 
searches as it seems to work well on single core ones. I'm using Solr 4.0 full 
release by the way.

Please, do let me know if you need any other detail about this issue.

Many thanks in advance,

Oscar



[Solr boost] Date boost for certain query set

2012-10-26 Thread Alessandro Benedetti
Hi guys,
I was fighting with boost factor in my edismax request handler :


   edismax
 (idente:2)^0.1
 recip(ms(NOW,data),3.16e-11,1,1)
   

I'm playing with bq( boost query) and bf (boost function).
Is it possible to catenate bq and bf in certain manners?
For example :
I want that when  matches,  will start to effect this document,
and I want that  when  matches,  will start to effect this
document.
Of course in cumulative way.
Is possibile to associate bq and bf in the previous way?
In simple words I want a bq to activate specific bf.

Cheers

-- 
---
Alessandro Benedetti

Sourcesense - making sense of Open Source: http://www.sourcesense.com


Re: Occasional Solr performance issues

2012-10-26 Thread Dotan Cohen
On Wed, Oct 24, 2012 at 4:33 PM, Walter Underwood  wrote:
> Please consider never running "optimize". That should be called "force merge".
>

Thanks. I have been letting the system run for about two days already
without an optimize. I will let it run a week, then merge to see the
effect.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Index-time field boosting

2012-10-26 Thread vsl
Hi,
I have a problem with index time boosting. I created 4  new fields:







The first field: alltext is used as default field for searching. I copy
other 3 fields to it using copyField directive.

Unfortunately while searching boosting mechaism does not work. Always
scoring property equals to the same value. 

For example: search term = messi
1. title: messi, text: ronaldo
2. title ronaldo, text messi
Expected result: the first entry should have higher scoring 
Current: Both have the same scoring.

Could you tell me where the problem is?

BR
Pawel




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-time-field-boosting-tp4016010.html
Sent from the Solr - User mailing list archive at Nabble.com.


how solr to boost term value at the start of ther field?

2012-10-26 Thread YooKyuseok
i am kyuseok in Republic of Korea. 
nice to meet you. 

i am searching about 'solr boost term postion' and struggling with this
isusse. 
and i still didn`t solve it.

i am serving video contents search server. 
and my client require the contents which match the value at the start of
field shoud be displayed first. 

so i want to rank like below

*search word : indiana
*result docs.
1. indiana Jones 
2. Jones indiana 
3. Jones and the others in indiana 
(because of 'indiana' `s position order)

can  anyone help me solve this problem. ? 
i am using solr search engine. 

please help me, sir.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-solr-to-boost-term-value-at-the-start-of-ther-field-tp4015965.html
Sent from the Solr - User mailing list archive at Nabble.com.


Filtering HTML content in Solr 4.0.0

2012-10-26 Thread Pratyul Kapoor
Hi,

I am using Solr 4.0.0. I have a HTML content as description of a product.
If I index it without any filtering it is giving errors on search.
How can I filter an HTML content.

Pratyul


Re: [Solr boost] Date boost for certain query set

2012-10-26 Thread Alessandro Benedetti
I've made some steps ahead.
I'm writing a function to this sort of clustered boosting:
product(recip(ms(NOW,data),3.16e-11,1,1),exists(query(field:value)))
I multiply the boost, for a specific value of some field, the exists
function will return 0 or 1, and this would cancel or use the date boost.
But this function I wrote , has a wrong sintax, I need to correct the
"exists" part.
Any hint?

2012/10/26 Alessandro Benedetti 

> Hi guys,
> I was fighting with boost factor in my edismax request handler :
>
> 
>edismax
>  (idente:2)^0.1
>  recip(ms(NOW,data),3.16e-11,1,1)
>
>
> I'm playing with bq( boost query) and bf (boost function).
> Is it possible to catenate bq and bf in certain manners?
> For example :
> I want that when  matches,  will start to effect this document,
> and I want that  when  matches,  will start to effect this
> document.
> Of course in cumulative way.
> Is possibile to associate bq and bf in the previous way?
> In simple words I want a bq to activate specific bf.
>
> Cheers
>
> --
> ---
> Alessandro Benedetti
>
> Sourcesense - making sense of Open Source: http://www.sourcesense.com
>



-- 
---
Alessandro Benedetti

Sourcesense - making sense of Open Source: http://www.sourcesense.com


Re: Filtering HTML content in Solr 4.0.0

2012-10-26 Thread Rafał Kuć
Hello!

You try to put the HTML into the XML sent to Solr right ? You should
use the proper UTF-8 encoding to do that. For example look at the
utf8-example.xml file from the exampledocs directory that comes with
Solr and you'll see something like this:

tag with escaped chars: 

As you can see the < and > are properly encoded as < and >

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Hi,

> I am using Solr 4.0.0. I have a HTML content as description of a product.
> If I index it without any filtering it is giving errors on search.
> How can I filter an HTML content.

> Pratyul



Re: Filtering HTML content in Solr 4.0.0

2012-10-26 Thread Rogério Pereira Araújo

I think you will have to write an UpdateProcessor to strip out html tags.

http://wiki.apache.org/solr/UpdateRequestProcessor

As per Solr 4.0 you can also use scripting languages like Python, Ruby and 
Javascript to write scripts for use as updateprocessors too.


-Mensagem Original- 
From: Pratyul Kapoor

Sent: Friday, October 26, 2012 3:56 AM
To: solr-user@lucene.apache.org
Subject: Filtering HTML content in Solr 4.0.0

Hi,

I am using Solr 4.0.0. I have a HTML content as description of a product.
If I index it without any filtering it is giving errors on search.
How can I filter an HTML content.

Pratyul 



RE: how solr to boost term value at the start of ther field?

2012-10-26 Thread Markus Jelsma
Hi,

One trick is to index a special token at the beginning of the content and do a 
phrase query for your terms and the special token with little or no slop. You 
can also use Lucene's SpanFirstQuery but it's not yet exposed in Solr. There's 
a patch for trunk exposing the SpanFirstQuery in Solr's Edismax query parser.

It cannot do progressively smaller boosts to words further from the start by 
itself.

https://issues.apache.org/jira/browse/SOLR-3925

Cheers,
Markus

 
-Original message-
> From:YooKyuseok 
> Sent: Fri 26-Oct-2012 14:39
> To: solr-user@lucene.apache.org
> Subject: how solr to boost term value at the start of ther field?
> 
> i am kyuseok in Republic of Korea. 
> nice to meet you. 
> 
> i am searching about 'solr boost term postion' and struggling with this
> isusse. 
> and i still didn`t solve it.
> 
> i am serving video contents search server. 
> and my client require the contents which match the value at the start of
> field shoud be displayed first. 
> 
> so i want to rank like below
> 
> *search word : indiana
> *result docs.
> 1. indiana Jones 
> 2. Jones indiana 
> 3. Jones and the others in indiana 
> (because of 'indiana' `s position order)
> 
> can  anyone help me solve this problem. ? 
> i am using solr search engine. 
> 
> please help me, sir.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-solr-to-boost-term-value-at-the-start-of-ther-field-tp4015965.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Filtering HTML content in Solr 4.0.0

2012-10-26 Thread Rafał Kuć
Hello!

You don't need a custom update request processor - there is a char
filter dedicated to strip HTML tags from your content and index only
relevant parts of it - 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory

However, you first need to properly send it to Solr for indexing. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> I think you will have to write an UpdateProcessor to strip out html tags.

> http://wiki.apache.org/solr/UpdateRequestProcessor

> As per Solr 4.0 you can also use scripting languages like Python, Ruby and
> Javascript to write scripts for use as updateprocessors too.

> -Mensagem Original- 
> From: Pratyul Kapoor
> Sent: Friday, October 26, 2012 3:56 AM
> To: solr-user@lucene.apache.org
> Subject: Filtering HTML content in Solr 4.0.0

> Hi,

> I am using Solr 4.0.0. I have a HTML content as description of a product.
> If I index it without any filtering it is giving errors on search.
> How can I filter an HTML content.

> Pratyul 



Re: Index-time field boosting

2012-10-26 Thread Otis Gospodnetic
Hi,

Can you show us the configuration for your request handler from
solrconfig.xml?

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 26, 2012 8:33 AM, "vsl"  wrote:

> Hi,
> I have a problem with index time boosting. I created 4  new fields:
>
>  stored="false"
> multiValued="true" omitNorms="false"/>
>
>  stored="true"
> multiValued="true"  boost="5.0" omitNorms="false"/>
>  multiValued="true"  boost="3.0" omitNorms="false" />
>  required="true"  boost="2.0" omitNorms="false"/>
>
> The first field: alltext is used as default field for searching. I copy
> other 3 fields to it using copyField directive.
>
> Unfortunately while searching boosting mechaism does not work. Always
> scoring property equals to the same value.
>
> For example: search term = messi
> 1. title: messi, text: ronaldo
> 2. title ronaldo, text messi
> Expected result: the first entry should have higher scoring
> Current: Both have the same scoring.
>
> Could you tell me where the problem is?
>
> BR
> Pawel
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Index-time-field-boosting-tp4016010.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: DIH update?

2012-10-26 Thread Billy Newman
Sorry, to be more specific I am referring to partial document update,
which I believe is new to Solr 4.  Also I am using a URLDataSource and
I cannot use the delta-import feature, nor is it what I am looking
for.

I.E.

DIH - creates:

id: 12345
first: hello

Runs again and pulls in the same doc id
id: 12345
second: world

I would like to end up with
id: 12345
first: hello
second: world

But instead the DIH just replaces entire doc so I lose the 'first' field.

Would also be nice to be able to 'add' value to multiValued fields
when a doc with the same id comes into the DIH.

Again I do not believe that this functionality exists (I cannot get it
to work with my simple example).  Just wondering if anyone had thought
about plans in the future.

Thanks,
Billy


On Thu, Oct 25, 2012 at 9:46 PM, Gora Mohanty  wrote:
> On 26 October 2012 08:51, Billy Newman  wrote:
>> Any plans on adding update functionality to DIH?
>
> What do you mean by "update functionality"?
>
> Re-running an import with changed values for a
> document with an existing ID will update values
> in the Solr index.
>
> If you mean adding new documents, please take
> a look at delta import.
>
> Regards,
> Gora


Re: Occasional Solr performance issues

2012-10-26 Thread Dotan Cohen
I spoke too soon! Wereas three days ago when the index was new 500
records could be written to it in <3 seconds, now that operation is
taking a minute and a half, sometimes longer. I ran optimize() but
that did not help the writes. What can I do to improve the write
performance?

Even opening the Logging tab of the Solr instance is taking quite a
long time. In fact, I just left it for 20 minutes and it still hasn't
come back with anything. I do have an SSH window open on the server
hosting Solr and it doesn't look overloaded at all:

$ date && du -sh data/ && uptime && free -m
Fri Oct 26 13:15:59 UTC 2012
578Mdata/
 13:15:59 up 4 days, 17:59,  1 user,  load average: 0.06, 0.12, 0.22
 total   used   free sharedbuffers cached
Mem: 14980   3237  11743  0284   
-/+ buffers/cache:729  14250
Swap:0  0  0


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Index-time field boosting

2012-10-26 Thread vsl
Hi,
this is my request handler from solrconfig.xml:



 
 
   explicit
   10
   text
 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-time-field-boosting-tp4016010p4016049.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH nested entities don't work

2012-10-26 Thread Gora Mohanty
On 26 October 2012 13:14, mroosendaal  wrote:
> Hi,
>
> I tried giving the pdt_id from the subentity a specific value and it worked.
> Only now every product has the same value.
[...]

No offence, but it is difficult to try help you if you provide partial
information, and keep trying different things instead of proceeding
systematically.

Please share with us your complete Solr schema.xml, and the DIH
configuration file. If these are too long, you can use pastebin.com ,
and provide links. As you are not using field definitions inside your
entities, please share with us the the columns that are returned by
the "select *" that you are doing on each entity.

If the SELECTs works, the column names in the DB match those in
schema.xml (independent of case), the import of data into Solr will
work.

Regards,
Gora


Re: Best way to commit data to Solr

2012-10-26 Thread Erick Erickson
Here's a great blog explaining one major difference between
3.x and 4.x:
http://www.searchworkings.org/blog/-/blogs/gimme-all-resources-you-have-i-can-use-them!/

In a nutshell, 3.x blocks on segment merges (which can be triggered by commits).

I've heard anecdotal accounts that pushing your rambuffer size much above 256M
doesn't help, you may want to lower it just to see if there's any difference

But the key would be looking at CPU utilization on your server. If
that's near 100% (when not
merging in 3.x), you can't go any faster.

FWIW,
Erick

On Fri, Oct 26, 2012 at 2:05 AM, adityab  wrote:
> thanks for the replies,
>
> the reason we have the custom program is we need to gather data from
> different sources, massage it and then post to solr. Our program is
> multithreaded with 10 threads fetching data and putting it in blocking Queue
> and 5 threads posting the data from Queue to Solr. Each thread writes 1000
> documents at a time.
> We have observed that with 1024 rambuffer set on master we are able to
> publish 20M documents in around 3.5 to 4.5 hrs
> Every 1000 document posted to Solr on an average takes 3 seconds (based on
> QT)
>
> Any advice on speeding up the process.
>
> I will try doing commit at the end to see if there is any difference.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Best-way-to-commit-data-to-Solr-tp4015921p4015992.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Problem with loading dictionary for Hunspell

2012-10-26 Thread Rob Koeling
I'm trying to employ the HunspellStemFilterFactory, but have trouble
loading a dictionary.

I  downloaded the .dic and .aff file for en_GB, en_US and nl_NL from
the OpenOffice site, but they all give me the same error message.

When I use them AS IS, I get the error message:

Oct 26, 2012 2:39:37 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: Unable to load hunspell data!
[dictionary=en_GB.dic,affix=en_GB.aff]
at 
org.apache.solr.analysis.HunspellStemFilterFactory.inform(HunspellStemFilterFactory.java:87)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:551)

Caused by: java.text.ParseException: The first non-comment line in the
affix file must be a 'SET charset', was: 'FLAG num'
at 
org.apache.lucene.analysis.hunspell.HunspellDictionary.getDictionaryEncoding(HunspellDictionary.java:280)
at 
org.apache.lucene.analysis.hunspell.HunspellDictionary.(HunspellDictionary.java:112)
at 
org.apache.solr.analysis.HunspellStemFilterFactory.inform(HunspellStemFilterFactory.java:85)
... 32 more


When I add the following first line to both the .dic and the .aff file:
 SET UTF-8

The error message changes into:

Oct 26, 2012 10:16:42 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: Unable to load hunspell data!
[dictionary=en_GB.dic,affix=en_GB.aff]
at org.apache.solr.analysis.HunspellStemFilterFactory.infoOSX
10.7.5rm(HunspellStemFilterFactory.java:87)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:551)

Caused by: java.nio.charset.IllegalCharsetNameException: 'UTF-8'
at java.nio.charset.Charset.checkName(Charset.java:284)
at java.nio.charset.Charset.lookup2(Charset.java:458)
at java.nio.charset.Charset.lookup(Charset.java:437)
at java.nio.charset.Charset.forName(Charset.java:502)
at 
org.apache.lucene.analysis.hunspell.HunspellDictionary.getJavaEncoding(HunspellDictionary.java:293)
at 
org.apache.lucene.analysis.hunspell.HunspellDictionary.(HunspellDictionary.java:113)
at 
org.apache.solr.analysis.HunspellStemFilterFactory.inform(HunspellStemFilterFactory.java:85)
... 32 more



I am aware of a similar issue that was raised on this list in 12-2011,
which was escalated to the Jiria list
(https://issues.apache.org/jira/browse/SOLR-2934), but am not sure if
that was ever resolved. Or am I just missing something? In either
case, could anyone who has working dictionary files share them with me
(any old language; as long as it works!)

I am using Solr 3.6.1 on a Mac running OSX 10.7.5

  - Rob


Re: DIH update?

2012-10-26 Thread Gora Mohanty
On 26 October 2012 18:45, Billy Newman  wrote:
> Sorry, to be more specific I am referring to partial document update,
> which I believe is new to Solr 4.  Also I am using a URLDataSource and
> I cannot use the delta-import feature, nor is it what I am looking
> for.
[...]

Haven't yet had occasion to try this out, so I could
be wrong, but I do not think that DIH supports partial
document updates.

I think that the way to do partial document updates is
through curl, or SolrJ. This seems to be the blog post
that people refer to for this feature:
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

Regards,
Gora


Re: Occasional Solr performance issues

2012-10-26 Thread Shawn Heisey

On 10/26/2012 7:16 AM, Dotan Cohen wrote:

I spoke too soon! Wereas three days ago when the index was new 500
records could be written to it in <3 seconds, now that operation is
taking a minute and a half, sometimes longer. I ran optimize() but
that did not help the writes. What can I do to improve the write
performance?

Even opening the Logging tab of the Solr instance is taking quite a
long time. In fact, I just left it for 20 minutes and it still hasn't
come back with anything. I do have an SSH window open on the server
hosting Solr and it doesn't look overloaded at all:

$ date && du -sh data/ && uptime && free -m
Fri Oct 26 13:15:59 UTC 2012
578Mdata/
  13:15:59 up 4 days, 17:59,  1 user,  load average: 0.06, 0.12, 0.22
  total   used   free sharedbuffers cached
Mem: 14980   3237  11743  0284   
-/+ buffers/cache:729  14250
Swap:0  0  0


Taking all the information I've seen so far, my bet is on either cache 
warming or heap/GC trouble as the source of your problem.  It's now 
specific information gathering time.  Can you gather all the following 
information and put it into a web paste page, such as pastie.org, and 
reply with the link?  I have gathered the same information from my test 
server and created a pastie example. http://pastie.org/5118979


On the dashboard of the GUI, it lists all the jvm arguments. Include those.

Click Java Properties and gather the "java.runtime.version" and 
"java.specification.vendor" information.


After one of the long update times, pause/stop your indexing 
application.  Click on your core in the GUI, open Plugins/Stats, and 
paste the following bits with a header to indicate what each section is:

CACHE->filterCache
CACHE->queryResultCache
CORE->searcher

Thanks,
Shawn



Re: SolrCloud and distributed search

2012-10-26 Thread Bill Au
I am currently using one master with multiple slaves so I do have high
availability for searching now.

My index does fit on a single machine and a single query does not take too
long to execute.  But I do want to take advantage of high availability of
indexing and real time replication.  So it looks like I can set up
SolrCloud with only 1 shard (ie numShards=1).

In this case is SolrCloud still using distributed search behind the
screen?  Will MoreLikeThis work?

Does using SolrCloud with only 1 shard make any sense at all?

Bill

On Thu, Oct 25, 2012 at 4:29 PM, Tomás Fernández Löbbe <
tomasflo...@gmail.com> wrote:

> It also provides high availability for indexing and searching.
>
> On Thu, Oct 25, 2012 at 4:43 PM, Bill Au  wrote:
>
> > So I guess one would use SolrCloud for the same reasons as distributed
> > search:
> >
> > When an index becomes too large to fit on a single system, or when a
> single
> > query takes too long to execute.
> >
> > Bill
> >
> > On Thu, Oct 25, 2012 at 3:38 PM, Shawn Heisey  wrote:
> >
> > > On 10/25/2012 1:29 PM, Bill Au wrote:
> > >
> > >> Is SolrCloud using distributed search behind the scene?  Does it have
> > the
> > >> same limitations (for example, doesn't support MoreLikeThis)
> distributed
> > >> search has?
> > >>
> > >
> > > Yes and yes.
> > >
> > >
> >
>


Re: DIH nested entities don't work

2012-10-26 Thread mroosendaal
None taken:-)

Here's the info

OS: Linux
Java: 1.6
Oracle driver: ojdbc14.jar

*view structure:*
END_FRG_PRODUCTS_VW
pdt_id
pdt_title
stock_availability
pdt_pce_bolprice
gpc_id
offer_type
search_rank
search_title

END_FRG_FEATURES_VW
pdt_id
pdt_features

END_FRG_SONGS_VW
pdt_id
songtitle
te_beluisteren

there are # products
some have 1 or more features
some have 1 or more songs

*data-config.xml:*

:1521/ENDDEV" user=""
password="pw"/>








*schema.xml*
http://pastebin.com/TtSQuhCX

*what i;ve tried*
* importing only the products --> worked
* importing only the songs (as the root entity) --> worked
* importing products and songs (as a subentity) --> every product gets the
same songtitle but no 'te_beluisteren'-field
* importing products with features and songs --> same as previous.

Hope you can point me in the right direction.

Thanks,
Maarten



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4016077.html
Sent from the Solr - User mailing list archive at Nabble.com.


lukeall.jar for Solr4r?

2012-10-26 Thread Carrie Coy
Where can I get a copy of Luke capable of reading Solr4 indexes?  My 
lukeall-4.0.0-ALPHA.jar no longer works.


Thx,
Carrie Coy


Re: SolrCloud and distributed search

2012-10-26 Thread Erick Erickson
Yes, I think SolrCloud makes sense with a single shard for exactly
this reason, NRT and multiple replicas. I don't know how you'd get NRT
on multiple machines without it.

But do be aware of: https://issues.apache.org/jira/browse/SOLR-3971
"A collection that is created with numShards=1 turns into a
numShards=2 collection after starting up a second core and not
specifying numShards."

Erick

On Fri, Oct 26, 2012 at 10:14 AM, Bill Au  wrote:
> I am currently using one master with multiple slaves so I do have high
> availability for searching now.
>
> My index does fit on a single machine and a single query does not take too
> long to execute.  But I do want to take advantage of high availability of
> indexing and real time replication.  So it looks like I can set up
> SolrCloud with only 1 shard (ie numShards=1).
>
> In this case is SolrCloud still using distributed search behind the
> screen?  Will MoreLikeThis work?
>
> Does using SolrCloud with only 1 shard make any sense at all?
>
> Bill
>
> On Thu, Oct 25, 2012 at 4:29 PM, Tomás Fernández Löbbe <
> tomasflo...@gmail.com> wrote:
>
>> It also provides high availability for indexing and searching.
>>
>> On Thu, Oct 25, 2012 at 4:43 PM, Bill Au  wrote:
>>
>> > So I guess one would use SolrCloud for the same reasons as distributed
>> > search:
>> >
>> > When an index becomes too large to fit on a single system, or when a
>> single
>> > query takes too long to execute.
>> >
>> > Bill
>> >
>> > On Thu, Oct 25, 2012 at 3:38 PM, Shawn Heisey  wrote:
>> >
>> > > On 10/25/2012 1:29 PM, Bill Au wrote:
>> > >
>> > >> Is SolrCloud using distributed search behind the scene?  Does it have
>> > the
>> > >> same limitations (for example, doesn't support MoreLikeThis)
>> distributed
>> > >> search has?
>> > >>
>> > >
>> > > Yes and yes.
>> > >
>> > >
>> >
>>


Re: SolrCloud and distributed search

2012-10-26 Thread Tomás Fernández Löbbe
You should still use some kind of load balancer for searches, unless you
use the CloudSolrServer (SolrJ) which includes the load balancing.
Tomás

On Fri, Oct 26, 2012 at 11:46 AM, Erick Erickson wrote:

> Yes, I think SolrCloud makes sense with a single shard for exactly
> this reason, NRT and multiple replicas. I don't know how you'd get NRT
> on multiple machines without it.
>
> But do be aware of: https://issues.apache.org/jira/browse/SOLR-3971
> "A collection that is created with numShards=1 turns into a
> numShards=2 collection after starting up a second core and not
> specifying numShards."
>
> Erick
>
> On Fri, Oct 26, 2012 at 10:14 AM, Bill Au  wrote:
> > I am currently using one master with multiple slaves so I do have high
> > availability for searching now.
> >
> > My index does fit on a single machine and a single query does not take
> too
> > long to execute.  But I do want to take advantage of high availability of
> > indexing and real time replication.  So it looks like I can set up
> > SolrCloud with only 1 shard (ie numShards=1).
> >
> > In this case is SolrCloud still using distributed search behind the
> > screen?  Will MoreLikeThis work?
> >
> > Does using SolrCloud with only 1 shard make any sense at all?
> >
> > Bill
> >
> > On Thu, Oct 25, 2012 at 4:29 PM, Tomás Fernández Löbbe <
> > tomasflo...@gmail.com> wrote:
> >
> >> It also provides high availability for indexing and searching.
> >>
> >> On Thu, Oct 25, 2012 at 4:43 PM, Bill Au  wrote:
> >>
> >> > So I guess one would use SolrCloud for the same reasons as distributed
> >> > search:
> >> >
> >> > When an index becomes too large to fit on a single system, or when a
> >> single
> >> > query takes too long to execute.
> >> >
> >> > Bill
> >> >
> >> > On Thu, Oct 25, 2012 at 3:38 PM, Shawn Heisey 
> wrote:
> >> >
> >> > > On 10/25/2012 1:29 PM, Bill Au wrote:
> >> > >
> >> > >> Is SolrCloud using distributed search behind the scene?  Does it
> have
> >> > the
> >> > >> same limitations (for example, doesn't support MoreLikeThis)
> >> distributed
> >> > >> search has?
> >> > >>
> >> > >
> >> > > Yes and yes.
> >> > >
> >> > >
> >> >
> >>
>


Re: DIH nested entities don't work

2012-10-26 Thread Gora Mohanty
On 26 October 2012 20:01, mroosendaal  wrote:
> None taken:-)
>
> Here's the info
[...]

The DIH configuration, and schema look fine. I have
not used the new caching setup in 4.0 enough to
comment on it.

> *what i;ve tried*
> * importing only the products --> worked
> * importing only the songs (as the root entity) --> worked
> * importing products and songs (as a subentity) --> every product gets the
> same songtitle but no 'te_beluisteren'-field
[...]

This is an indication that the SELECT is failing for the sub-entity.
I cannot quite see why that would be the case, but is it possible
that Oracle column names are case sensitive? I am not familiar
with Oracle.

Could you manually try the sub-entity SELECT, e.g.,
select * from END_FRG_FEATURES_VW where PDT_ID=XX
where XX is some product ID that you know has related
features?

I am guessing that every product getting the same song
title is an artifact of caching. I believe that you said that
if you took out the caching, the song title was also not
indexed.

Regards,
Gora


Re: SolrCloud and distributed search

2012-10-26 Thread Bill Au
I am thinking of using a load balancer for both indexing and querying to
spread both the indexing and querying load across all the machines.

Bill

On Fri, Oct 26, 2012 at 10:48 AM, Tomás Fernández Löbbe <
tomasflo...@gmail.com> wrote:

> You should still use some kind of load balancer for searches, unless you
> use the CloudSolrServer (SolrJ) which includes the load balancing.
> Tomás
>
> On Fri, Oct 26, 2012 at 11:46 AM, Erick Erickson  >wrote:
>
> > Yes, I think SolrCloud makes sense with a single shard for exactly
> > this reason, NRT and multiple replicas. I don't know how you'd get NRT
> > on multiple machines without it.
> >
> > But do be aware of: https://issues.apache.org/jira/browse/SOLR-3971
> > "A collection that is created with numShards=1 turns into a
> > numShards=2 collection after starting up a second core and not
> > specifying numShards."
> >
> > Erick
> >
> > On Fri, Oct 26, 2012 at 10:14 AM, Bill Au  wrote:
> > > I am currently using one master with multiple slaves so I do have high
> > > availability for searching now.
> > >
> > > My index does fit on a single machine and a single query does not take
> > too
> > > long to execute.  But I do want to take advantage of high availability
> of
> > > indexing and real time replication.  So it looks like I can set up
> > > SolrCloud with only 1 shard (ie numShards=1).
> > >
> > > In this case is SolrCloud still using distributed search behind the
> > > screen?  Will MoreLikeThis work?
> > >
> > > Does using SolrCloud with only 1 shard make any sense at all?
> > >
> > > Bill
> > >
> > > On Thu, Oct 25, 2012 at 4:29 PM, Tomás Fernández Löbbe <
> > > tomasflo...@gmail.com> wrote:
> > >
> > >> It also provides high availability for indexing and searching.
> > >>
> > >> On Thu, Oct 25, 2012 at 4:43 PM, Bill Au  wrote:
> > >>
> > >> > So I guess one would use SolrCloud for the same reasons as
> distributed
> > >> > search:
> > >> >
> > >> > When an index becomes too large to fit on a single system, or when a
> > >> single
> > >> > query takes too long to execute.
> > >> >
> > >> > Bill
> > >> >
> > >> > On Thu, Oct 25, 2012 at 3:38 PM, Shawn Heisey 
> > wrote:
> > >> >
> > >> > > On 10/25/2012 1:29 PM, Bill Au wrote:
> > >> > >
> > >> > >> Is SolrCloud using distributed search behind the scene?  Does it
> > have
> > >> > the
> > >> > >> same limitations (for example, doesn't support MoreLikeThis)
> > >> distributed
> > >> > >> search has?
> > >> > >>
> > >> > >
> > >> > > Yes and yes.
> > >> > >
> > >> > >
> > >> >
> > >>
> >
>


Re: Occasional Solr performance issues

2012-10-26 Thread Dotan Cohen
On Fri, Oct 26, 2012 at 4:02 PM, Shawn Heisey  wrote:
>
> Taking all the information I've seen so far, my bet is on either cache
> warming or heap/GC trouble as the source of your problem.  It's now specific
> information gathering time.  Can you gather all the following information
> and put it into a web paste page, such as pastie.org, and reply with the
> link?  I have gathered the same information from my test server and created
> a pastie example. http://pastie.org/5118979
>
> On the dashboard of the GUI, it lists all the jvm arguments. Include those.
>
> Click Java Properties and gather the "java.runtime.version" and
> "java.specification.vendor" information.
>
> After one of the long update times, pause/stop your indexing application.
> Click on your core in the GUI, open Plugins/Stats, and paste the following
> bits with a header to indicate what each section is:
> CACHE->filterCache
> CACHE->queryResultCache
> CORE->searcher
>
> Thanks,
> Shawn
>

Thank you Shawn. The information is here:
http://pastebin.com/aqEfeYVA

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: SolrCloud and distributed search

2012-10-26 Thread Tomás Fernández Löbbe
If you are going to use SolrJ, CloudSolrServer is even better than a
round-robin load balancer for indexing, because it will send the documents
straight to the shard leader (you save one internal request). If not,
round-robin should be fine.

Tomás

On Fri, Oct 26, 2012 at 12:27 PM, Bill Au  wrote:

> I am thinking of using a load balancer for both indexing and querying to
> spread both the indexing and querying load across all the machines.
>
> Bill
>
> On Fri, Oct 26, 2012 at 10:48 AM, Tomás Fernández Löbbe <
> tomasflo...@gmail.com> wrote:
>
> > You should still use some kind of load balancer for searches, unless you
> > use the CloudSolrServer (SolrJ) which includes the load balancing.
> > Tomás
> >
> > On Fri, Oct 26, 2012 at 11:46 AM, Erick Erickson <
> erickerick...@gmail.com
> > >wrote:
> >
> > > Yes, I think SolrCloud makes sense with a single shard for exactly
> > > this reason, NRT and multiple replicas. I don't know how you'd get NRT
> > > on multiple machines without it.
> > >
> > > But do be aware of: https://issues.apache.org/jira/browse/SOLR-3971
> > > "A collection that is created with numShards=1 turns into a
> > > numShards=2 collection after starting up a second core and not
> > > specifying numShards."
> > >
> > > Erick
> > >
> > > On Fri, Oct 26, 2012 at 10:14 AM, Bill Au  wrote:
> > > > I am currently using one master with multiple slaves so I do have
> high
> > > > availability for searching now.
> > > >
> > > > My index does fit on a single machine and a single query does not
> take
> > > too
> > > > long to execute.  But I do want to take advantage of high
> availability
> > of
> > > > indexing and real time replication.  So it looks like I can set up
> > > > SolrCloud with only 1 shard (ie numShards=1).
> > > >
> > > > In this case is SolrCloud still using distributed search behind the
> > > > screen?  Will MoreLikeThis work?
> > > >
> > > > Does using SolrCloud with only 1 shard make any sense at all?
> > > >
> > > > Bill
> > > >
> > > > On Thu, Oct 25, 2012 at 4:29 PM, Tomás Fernández Löbbe <
> > > > tomasflo...@gmail.com> wrote:
> > > >
> > > >> It also provides high availability for indexing and searching.
> > > >>
> > > >> On Thu, Oct 25, 2012 at 4:43 PM, Bill Au 
> wrote:
> > > >>
> > > >> > So I guess one would use SolrCloud for the same reasons as
> > distributed
> > > >> > search:
> > > >> >
> > > >> > When an index becomes too large to fit on a single system, or
> when a
> > > >> single
> > > >> > query takes too long to execute.
> > > >> >
> > > >> > Bill
> > > >> >
> > > >> > On Thu, Oct 25, 2012 at 3:38 PM, Shawn Heisey 
> > > wrote:
> > > >> >
> > > >> > > On 10/25/2012 1:29 PM, Bill Au wrote:
> > > >> > >
> > > >> > >> Is SolrCloud using distributed search behind the scene?  Does
> it
> > > have
> > > >> > the
> > > >> > >> same limitations (for example, doesn't support MoreLikeThis)
> > > >> distributed
> > > >> > >> search has?
> > > >> > >>
> > > >> > >
> > > >> > > Yes and yes.
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>


Re: SolrCloud and distributed search

2012-10-26 Thread Yonik Seeley
On Fri, Oct 26, 2012 at 10:14 AM, Bill Au  wrote:
> I am currently using one master with multiple slaves so I do have high
> availability for searching now.
>
> My index does fit on a single machine and a single query does not take too
> long to execute.  But I do want to take advantage of high availability of
> indexing and real time replication.  So it looks like I can set up
> SolrCloud with only 1 shard (ie numShards=1).
>
> In this case is SolrCloud still using distributed search behind the
> screen?  Will MoreLikeThis work?
>
> Does using SolrCloud with only 1 shard make any sense at all?

Yep.

Just pass distrib=false when querying.
At some point we need to optimize the case when a single node will
satisfy the query - and not just in the single shard case.
For upcoming custom hashing, if the requested hash range lies
completely in the shard, we shouldn't go distributed.

-Yonik
http://lucidworks.com


Re: Search and Entity structure

2012-10-26 Thread v vijith
Thanks for the response. This workaround  would be difficult to
implement. Also Im finding it very difficult to understand that SOLR
doesnt provide this feature for searching.


On Fri, Oct 26, 2012 at 9:42 AM, adityab  wrote:
> Hi Vijith,
>
> See if this solution solves your problem. There might be other ways this is
> the one i have on top of my mind at this hour.
>
> You might be having and ID for each qualification. then have the relation
> using dotted notation.
> 1 = MBA, 2 = LEAD etc.
>
> 
> 1.A
> 2.B
> 
> 
>
> the format is like X.Y where X is QualificationID and Y is grade value. If
> you have ID for Grade value too then use it in Y instead of actual Grade
> value.
> So in solr when searching for MBA with A Grad you can query "q=grade:1.A" .
> This should give you the result.
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Search-and-Entity-structure-tp4015890p4015996.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search and Entity structure

2012-10-26 Thread Gora Mohanty
On 25 October 2012 23:48, v vijith  wrote:
> Dear All,
>
> Apologize for lengthy email 
>
> SOLR Version: 4
>
> Im a newbie to SOLR and have gone through tutorial but could not get a
> solution. The below requirement doesnt seem to be impossible but I
> think Im missing the obvious.
>
> In my RDBMS, there is a Qualification table and an Employee table. An
> employee can have many qualifications. The qualification can have
> following attributes - GradeName and Grade. The search using sql query
> to achieve my requirement is as below
>
> select * from qualification a, employee b where a.empid= b.empid and
> a.gradename='MBA' and b.grade='A';
>
> This will return me the employee along with the dept who has the grade
> as MBA and has grade of A.
>
> Employee: 2 records
> -
> Empid: 1
> Name: John
> Location: California
>
> Qualifications:
> Gradeid: 1
> Empid: 1
> Name: MBA
> Grade: B
>
> Gradeid: 2
> Empid: 1
> Name: LEAD
> Grade: A
> 
>
> Empid: 2
> Name: George
> Location: Nevada
>
> Qualifications:
> Gradeid: 3
> Empid: 2
> Name: MBA
> Grade: A
>
> Gradeid: 4
> Empid: 2
> Name: Graduate
> Grade: C

Stop thinking of Solr in terms of RDBMS. Instead, flatten out your
data. Thus, in your example, you could have a schema with the
following fields:
doc_id name location qualification grade
doc_id is a unique identifier for Solr. If you want to retain Empid
and Gradeid you could also add these.

and the following entries
1 John California MBA B
2 John California Lead A
3 George Nevada MBA A
4 George Nevada Graduate C

Searching for qualification:MBA and grade:A will then give you only
record 3.

Regards,
Gora


Re: Search and Entity structure

2012-10-26 Thread v vijith
The schema content that I have put in is

   
   
   
   
   
 EMPID

The dataconfig file is






With this as well, when I try, I get the entity as below -


3Viktor
2
George
C
4
PM
1John
B2LEAD


The issue is that, employee George has 2 qualifications but is not
shown in the result. This is due to unique id I believe. Can you
provide some help?



On Fri, Oct 26, 2012 at 8:46 PM, Gora Mohanty  wrote:
> On 25 October 2012 23:48, v vijith  wrote:
>> Dear All,
>>
>> Apologize for lengthy email 
>>
>> SOLR Version: 4
>>
>> Im a newbie to SOLR and have gone through tutorial but could not get a
>> solution. The below requirement doesnt seem to be impossible but I
>> think Im missing the obvious.
>>
>> In my RDBMS, there is a Qualification table and an Employee table. An
>> employee can have many qualifications. The qualification can have
>> following attributes - GradeName and Grade. The search using sql query
>> to achieve my requirement is as below
>>
>> select * from qualification a, employee b where a.empid= b.empid and
>> a.gradename='MBA' and b.grade='A';
>>
>> This will return me the employee along with the dept who has the grade
>> as MBA and has grade of A.
>>
>> Employee: 2 records
>> -
>> Empid: 1
>> Name: John
>> Location: California
>>
>> Qualifications:
>> Gradeid: 1
>> Empid: 1
>> Name: MBA
>> Grade: B
>>
>> Gradeid: 2
>> Empid: 1
>> Name: LEAD
>> Grade: A
>> 
>>
>> Empid: 2
>> Name: George
>> Location: Nevada
>>
>> Qualifications:
>> Gradeid: 3
>> Empid: 2
>> Name: MBA
>> Grade: A
>>
>> Gradeid: 4
>> Empid: 2
>> Name: Graduate
>> Grade: C
>
> Stop thinking of Solr in terms of RDBMS. Instead, flatten out your
> data. Thus, in your example, you could have a schema with the
> following fields:
> doc_id name location qualification grade
> doc_id is a unique identifier for Solr. If you want to retain Empid
> and Gradeid you could also add these.
>
> and the following entries
> 1 John California MBA B
> 2 John California Lead A
> 3 George Nevada MBA A
> 4 George Nevada Graduate C
>
> Searching for qualification:MBA and grade:A will then give you only
> record 3.
>
> Regards,
> Gora


Re: Occasional Solr performance issues

2012-10-26 Thread Shawn Heisey

On 10/26/2012 9:41 AM, Dotan Cohen wrote:

On the dashboard of the GUI, it lists all the jvm arguments. Include those.

Click Java Properties and gather the "java.runtime.version" and
"java.specification.vendor" information.

After one of the long update times, pause/stop your indexing application.
Click on your core in the GUI, open Plugins/Stats, and paste the following
bits with a header to indicate what each section is:
CACHE->filterCache
CACHE->queryResultCache
CORE->searcher

Thanks,
Shawn

Thank you Shawn. The information is here:
http://pastebin.com/aqEfeYVA



Warming doesn't seem to be a problem here -- all your warm times are 
zero, so I am going to take a guess that it may be a heap/GC issue.  I 
would recommend starting with the following additional arguments to your 
JVM.  Since I have no idea how solr gets started on your server, I don't 
know where you would add these:


-Xmx4096M -Xms4096M -XX:NewRatio=1 -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled


This allocates 4GB of RAM to java, sets up a larger than normal Eden 
space in the heap, and uses garbage collection options that usually fare 
better in a server environment than the default.Java memory management 
options are like religion to some people ... I may start a flamewar with 
these recommendations. ;)  The best I can tell you about these choices: 
They made a big difference for me.


I would also recommend switching to a Sun/Oracle jvm.  I have heard that 
previous versions of Solr were not happy on variants like OpenJDK, I 
have no idea whether that might still be the case with 4.0.  If you 
choose to do this, you probably have package choices in Ubuntu.  I know 
that in Debian, the package is called sun-java6-jre ... Ubuntu is 
probably something similar. Debian has a CLI command 
'update-java-alternatives' that will quickly switch between different 
java implementations that are installed.  Hopefully Ubuntu also has 
this.  If not, you might need the following command instead to switch 
the main java executable:


update-alternatives --config java

Thanks,
Shawn



Re: [Solr boost] Date boost for certain query set

2012-10-26 Thread Jack Krupansky
The "exists" function returns a boolean - which you can use in an "if" 
function:


if(exists(query(field:value)),recip(ms(NOW,data),3.16e-11,1,1),0)

See:
http://wiki.apache.org/solr/FunctionQuery#exists

-- Jack Krupansky

-Original Message- 
From: Alessandro Benedetti

Sent: Friday, October 26, 2012 8:37 AM
To: solr-user@lucene.apache.org
Subject: Re: [Solr boost] Date boost for certain query set

I've made some steps ahead.
I'm writing a function to this sort of clustered boosting:
product(recip(ms(NOW,data),3.16e-11,1,1),exists(query(field:value)))
I multiply the boost, for a specific value of some field, the exists
function will return 0 or 1, and this would cancel or use the date boost.
But this function I wrote , has a wrong sintax, I need to correct the
"exists" part.
Any hint?

2012/10/26 Alessandro Benedetti 


Hi guys,
I was fighting with boost factor in my edismax request handler :


   edismax
 (idente:2)^0.1
 recip(ms(NOW,data),3.16e-11,1,1)
   

I'm playing with bq( boost query) and bf (boost function).
Is it possible to catenate bq and bf in certain manners?
For example :
I want that when  matches,  will start to effect this document,
and I want that  when  matches,  will start to effect this
document.
Of course in cumulative way.
Is possibile to associate bq and bf in the previous way?
In simple words I want a bq to activate specific bf.

Cheers

--
---
Alessandro Benedetti

Sourcesense - making sense of Open Source: http://www.sourcesense.com





--
---
Alessandro Benedetti

Sourcesense - making sense of Open Source: http://www.sourcesense.com 



Re: Index-time field boosting

2012-10-26 Thread Otis Gospodnetic
Hi,

Have a look at http://search-lucene.com/?q=extendeddismax.  Use "qf"
param in edismax to assign weights.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Fri, Oct 26, 2012 at 9:19 AM, vsl  wrote:
> Hi,
> this is my request handler from solrconfig.xml:
>
>
> 
>
>  
>explicit
>10
>text
>  
>
> 
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Index-time-field-boosting-tp4016010p4016049.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Get metadata for query

2012-10-26 Thread Otis Gospodnetic
Hi,

No... but you could simply query your index, get all the fields you
need and process them to get what you need.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Fri, Oct 26, 2012 at 10:19 AM, Torben Honigbaum
 wrote:
> Hi everybody,
>
> with http://localhost:8983/solr/admin/luke it's possible to get metadata for 
> all indices. But is there a way to get only the metadata for a special query? 
> I want to query all documents which are in a special category. For the query 
> I need the metadata containing a list of all fields of the documents.
>
> Thank you
> Torben


edismax bq, ignore tf/idf?

2012-10-26 Thread Ryan McKinley
Hi-

I am trying to add a setting that will boost results based on
existence in different buckets.  Using edismax, I added the bq
parameter:

location:A^5 location:B^3

I want this to put everything in location A above everything in
location B.  This mostly works, BUT depending on the number of matches
for each location, location:B can get a higher final score.

Is there a way to ignore tf/idf when boosting this location?

location from a field type:
 class="solr.StrField"  omitNorms="true"


Thanks for any pointers!

ryan


Re: edismax bq, ignore tf/idf?

2012-10-26 Thread Jack Krupansky

How about a boost function, "bf" or "boost"?

bf=if(exists(query(location:A)),5,if(exists(query(location:B)),3,0))

Use bf if you want to add to the score, boost if you want to multiply the 
score


-- Jack Krupansky

-Original Message- 
From: Ryan McKinley

Sent: Friday, October 26, 2012 6:14 PM
To: solr-user@lucene.apache.org
Subject: edismax bq, ignore tf/idf?

Hi-

I am trying to add a setting that will boost results based on
existence in different buckets.  Using edismax, I added the bq
parameter:

location:A^5 location:B^3

I want this to put everything in location A above everything in
location B.  This mostly works, BUT depending on the number of matches
for each location, location:B can get a higher final score.

Is there a way to ignore tf/idf when boosting this location?

location from a field type:
class="solr.StrField"  omitNorms="true"


Thanks for any pointers!

ryan 



Re: edismax bq, ignore tf/idf?

2012-10-26 Thread Chris Hostetter
: How about a boost function, "bf" or "boost"?
: 
: bf=if(exists(query(location:A)),5,if(exists(query(location:B)),3,0))

Right ... assuming you only want to ignore tf/idf on these fields in this 
specifc context, function queries are the way to go -- otherwise you could 
just use a per-field similarity to ignore tf/idf.

I would suggest however that instead of using the "exists(query())" 
consider the "tf()" function ... 

bf=if(tf(location,A),5,0)&bf=if(tf(location,B),3,0)

s/bf/boost/g && s/0/1/g if you wnat mutiplicitive boosts.


-Hoss


Re: SolrJ missing CollectionAdmin Api to create new collections dynamically

2012-10-26 Thread Chris Hostetter

: I can't find a good way to create a new Collection with SolrJ.
: I need to create my Collections dynamically and at the moment the only way I
: see is to call the CollectionAdmin with a HTTP Call directly to any of my
: SolrServers.
: 
: I don't like this because I think its a better way only to communicate through
: the CloudSolrServer connected to the zookeeper Servers and my application dont

There may not be any specific "convinience methods" for doing collection 
creation requests (patches welcome!) but the 
SolrServer.request(SolrRequest) method should be capable of making a  
request to the collections admin API.

I think it would be something along the lines of...

UpdateRequest req = new UpdateRequest("/admin/collections");
req.setParam("action","CREATE");
req.setParam("name","mycollection");
...
myCloudServer.request(req);

-Hoss


Re: DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities

2012-10-26 Thread Lance Norskog
Which database rows cause the problem? The bug report talks about fields with 
an empty string. Do your rows have empty string values?

- Original Message -
| From: "Dominik Siebel" 
| To: solr-user@lucene.apache.org
| Sent: Monday, October 22, 2012 3:15:29 AM
| Subject: Re: DIH throws NullPointerException when using 
dataimporter.functions.escapeSql with parent entities
| 
| That's what I thought.
| I'm just curious that nobody else seems to have this problem although
| I found the exact same issue description in the issue tracker
| (https://issues.apache.org/jira/browse/SOLR-2141) which goes back to
| October 2010 and is flagged as "Resolved: Cannot Reproduce".
| 
| 
| 2012/10/20 Lance Norskog :
| > If it worked before and does not work now, I don't think you are
| > doing anything wrong :)
| >
| > Do you have a different version of your JDBC driver?
| > Can you make a unit test with a minimal DIH script and schema?
| > Or, scan through all of the JIRA issues against the DIH from your
| > old Solr capture date.
| >
| >
| > - Original Message -
| > | From: "Dominik Siebel" 
| > | To: solr-user@lucene.apache.org
| > | Sent: Thursday, October 18, 2012 11:22:54 PM
| > | Subject: Fwd: DIH throws NullPointerException when using
| > | dataimporter.functions.escapeSql with parent entities
| > |
| > | Hi folks,
| > |
| > | I am currently migrating our Solr servers from a 4.0.0 nightly
| > | build
| > | (aprox. November 2011, which worked very well) to the newly
| > | released
| > | 4.0.0 and am running into some issues concerning the existing
| > | DataImportHandler configuratiions. Maybe you have an idea where I
| > | am
| > | going wrong here.
| > |
| > | The following lines are a highly simplified excerpt from one of
| > | the
| > | problematic imports:
| > |
| > | 
| > |
| > | 
| > |
| > | 
| > |
| > | While this configuration worked without any problem for over half
| > | a
| > | year now, when upgrading to 4.0.0-BETA AND 4.0.0 the Import
| > | throws
| > | the
| > | followeing Stacktrace and exits:
| > |
| > |  SEVERE: Exception while processing: path document :
| > | null:org.apache.solr.handler.dataimport.DataImportHandlerException:
| > | java.lang.NullPointerException
| > |
| > | which is caused by
| > |
| > | Caused by: java.lang.NullPointerException
| > | at
| > | 
org.apache.solr.handler.dataimport.EvaluatorBag$1.evaluate(EvaluatorBag.java:79)
| > |
| > | In other words: The EvaluatorBag doesn't seem to resolve the
| > | given
| > | path.name variable properly and returns null.
| > |
| > | Does anyone have any idea?
| > | Appreciate your input!
| > |
| > | Regards
| > | Dom
| > |
| 


Re: edismax bq, ignore tf/idf?

2012-10-26 Thread Ryan McKinley
thanks!


On Fri, Oct 26, 2012 at 4:20 PM, Chris Hostetter
 wrote:
> : How about a boost function, "bf" or "boost"?
> :
> : bf=if(exists(query(location:A)),5,if(exists(query(location:B)),3,0))
>
> Right ... assuming you only want to ignore tf/idf on these fields in this
> specifc context, function queries are the way to go -- otherwise you could
> just use a per-field similarity to ignore tf/idf.
>
> I would suggest however that instead of using the "exists(query())"
> consider the "tf()" function ...
>
> bf=if(tf(location,A),5,0)&bf=if(tf(location,B),3,0)
>
> s/bf/boost/g && s/0/1/g if you wnat mutiplicitive boosts.
>
>
> -Hoss


Re: Index-time field boosting

2012-10-26 Thread Chris Hostetter

: I have a problem with index time boosting. I created 4  new fields:

I think you are missunderstanding the meaning of index time boosting vs 
query time boosting.

First of all, this is not meaninful syntax in your schema.xml...

:   

...there is no "boost" property that can be set on a  in your 
schema.xml.

you can set boosts on the field values of individual documents when you 
index them -- but these boosts should vary per document.  It makes almost 
no sense to use the same index time boost of X for field Y on every 
document you index, because the relative effect on the query scores would 
be exactly the same for every document.

the goal you are describing sounds exacly like *query* time boosting -- 
you are quering for a word in multiple fields, and you want matches in 
some fields (title) to score higher then matches in other fields (text) 
regardless of document.

As otis points out: you can use something like dismax/edismax to cofigure 
this to happen automaticly, or you can be explicit in your query string 
with the lucene qparser (ie q=title:messi^5 text:messi^3)



-Hoss


Re: Get metadata for query

2012-10-26 Thread Lance Norskog
Ah, there's the problem- what is a fast way to fetch all fields in a 
collection, including dynamic fields?

- Original Message -
| From: "Otis Gospodnetic" 
| To: solr-user@lucene.apache.org
| Sent: Friday, October 26, 2012 3:05:04 PM
| Subject: Re: Get metadata for query
| 
| Hi,
| 
| No... but you could simply query your index, get all the fields you
| need and process them to get what you need.
| 
| Otis
| --
| Search Analytics - http://sematext.com/search-analytics/index.html
| Performance Monitoring - http://sematext.com/spm/index.html
| 
| 
| On Fri, Oct 26, 2012 at 10:19 AM, Torben Honigbaum
|  wrote:
| > Hi everybody,
| >
| > with http://localhost:8983/solr/admin/luke it's possible to get
| > metadata for all indices. But is there a way to get only the
| > metadata for a special query? I want to query all documents which
| > are in a special category. For the query I need the metadata
| > containing a list of all fields of the documents.
| >
| > Thank you
| > Torben
| 


Any way to by pass the checking on QueryElevationComponent

2012-10-26 Thread James Ji
Hi there

We are currently working on having Solr files read from HDFS. We extended
some of the classes so as to avoid modifying the original Solr code and
make it compatible with the future release. So here comes the question, I
found in QueryElevationComponent, there is a piece of code checking whether
elevate.xml exists at local file system. I am wondering if there is a way
to by pass this?
QueryElevationComponent.inform(){

File fC = new File(core.getResourceLoader().getConfigDir(), f);
File fD = new File(core.getDataDir(), f);
if (fC.exists() == fD.exists()) { throw new
SolrException(SolrException.ErrorCode.SERVER_ERROR,
"QueryElevationComponent missing config file: '" + f + "\n" + "either: " +
fC.getAbsolutePath() + " or " + fD.getAbsolutePath() + " must exist, but
not both."); }
if (fC.exists()) { exists = true; log.info("Loading QueryElevation from:
"+fC.getAbsolutePath()); Config cfg = new Config(core.getResourceLoader(),
f); elevationCache.put(null, loadElevationMap(cfg)); }

}

-- 
Jiayu (James) Ji,

***

Cell: (312)823-7393
Website: https://sites.google.com/site/jiayuji/

***


Re: Search and Entity structure

2012-10-26 Thread Lance Norskog
A side point: in fact, the connection between MBA and grade is not lost. The 
values in a multi-valued field are stored in order. You can have separate 
multi-valued fields with matching entries, and the values will be fetched in 
order and you can match them by counting. This is not database-ish, but it is a 
permanent feature.

Lance

- Original Message -
| From: "v vijith" 
| To: solr-user@lucene.apache.org
| Sent: Friday, October 26, 2012 12:50:29 PM
| Subject: Re: Search and Entity structure
| 
| The schema content that I have put in is
| 
|
|
|
|
|
|  EMPID
| 
| The dataconfig file is
| 
| 
| 
| 
| 
| 
| With this as well, when I try, I get the entity as below -
| 
| 
| 3Viktor
| 2
| George
| C
| 4
| PM
| 1John
| B2LEAD
| 
| 
| The issue is that, employee George has 2 qualifications but is not
| shown in the result. This is due to unique id I believe. Can you
| provide some help?
| 
| 
| 
| On Fri, Oct 26, 2012 at 8:46 PM, Gora Mohanty 
| wrote:
| > On 25 October 2012 23:48, v vijith  wrote:
| >> Dear All,
| >>
| >> Apologize for lengthy email 
| >>
| >> SOLR Version: 4
| >>
| >> Im a newbie to SOLR and have gone through tutorial but could not
| >> get a
| >> solution. The below requirement doesnt seem to be impossible but I
| >> think Im missing the obvious.
| >>
| >> In my RDBMS, there is a Qualification table and an Employee table.
| >> An
| >> employee can have many qualifications. The qualification can have
| >> following attributes - GradeName and Grade. The search using sql
| >> query
| >> to achieve my requirement is as below
| >>
| >> select * from qualification a, employee b where a.empid= b.empid
| >> and
| >> a.gradename='MBA' and b.grade='A';
| >>
| >> This will return me the employee along with the dept who has the
| >> grade
| >> as MBA and has grade of A.
| >>
| >> Employee: 2 records
| >> -
| >> Empid: 1
| >> Name: John
| >> Location: California
| >>
| >> Qualifications:
| >> Gradeid: 1
| >> Empid: 1
| >> Name: MBA
| >> Grade: B
| >>
| >> Gradeid: 2
| >> Empid: 1
| >> Name: LEAD
| >> Grade: A
| >> 
| >>
| >> Empid: 2
| >> Name: George
| >> Location: Nevada
| >>
| >> Qualifications:
| >> Gradeid: 3
| >> Empid: 2
| >> Name: MBA
| >> Grade: A
| >>
| >> Gradeid: 4
| >> Empid: 2
| >> Name: Graduate
| >> Grade: C
| >
| > Stop thinking of Solr in terms of RDBMS. Instead, flatten out your
| > data. Thus, in your example, you could have a schema with the
| > following fields:
| > doc_id name location qualification grade
| > doc_id is a unique identifier for Solr. If you want to retain Empid
| > and Gradeid you could also add these.
| >
| > and the following entries
| > 1 John California MBA B
| > 2 John California Lead A
| > 3 George Nevada MBA A
| > 4 George Nevada Graduate C
| >
| > Searching for qualification:MBA and grade:A will then give you only
| > record 3.
| >
| > Regards,
| > Gora
| 


Re: Get metadata for query

2012-10-26 Thread Jack Krupansky

I'm not sure I understand the real question here. What is the "metadata".

I mean, q=x&fl=* gives you all the (stored) fields for documents matching 
the query.


What else is there?

-- Jack Krupansky

-Original Message- 
From: Lance Norskog

Sent: Friday, October 26, 2012 9:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Get metadata for query

Ah, there's the problem- what is a fast way to fetch all fields in a 
collection, including dynamic fields?


- Original Message -
| From: "Otis Gospodnetic" 
| To: solr-user@lucene.apache.org
| Sent: Friday, October 26, 2012 3:05:04 PM
| Subject: Re: Get metadata for query
|
| Hi,
|
| No... but you could simply query your index, get all the fields you
| need and process them to get what you need.
|
| Otis
| --
| Search Analytics - http://sematext.com/search-analytics/index.html
| Performance Monitoring - http://sematext.com/spm/index.html
|
|
| On Fri, Oct 26, 2012 at 10:19 AM, Torben Honigbaum
|  wrote:
| > Hi everybody,
| >
| > with http://localhost:8983/solr/admin/luke it's possible to get
| > metadata for all indices. But is there a way to get only the
| > metadata for a special query? I want to query all documents which
| > are in a special category. For the query I need the metadata
| > containing a list of all fields of the documents.
| >
| > Thank you
| > Torben
| 



Re: SolrJ missing CollectionAdmin Api to create new collections dynamically

2012-10-26 Thread Markus.Mirsberger

Yes thanks.
But how can I check the status of a collection? The action STATUS not 
exist in the CollectionAdmin, only in the CoreAdmin.
At the moment probably the only way to get information about this is 
somehow through the ZkStateReader?


Regards,
Markus

On 27.10.2012 06:37, Chris Hostetter wrote:

: I can't find a good way to create a new Collection with SolrJ.
: I need to create my Collections dynamically and at the moment the only way I
: see is to call the CollectionAdmin with a HTTP Call directly to any of my
: SolrServers.
:
: I don't like this because I think its a better way only to communicate through
: the CloudSolrServer connected to the zookeeper Servers and my application dont

There may not be any specific "convinience methods" for doing collection
creation requests (patches welcome!) but the
SolrServer.request(SolrRequest) method should be capable of making a
request to the collections admin API.

I think it would be something along the lines of...

UpdateRequest req = new UpdateRequest("/admin/collections");
req.setParam("action","CREATE");
req.setParam("name","mycollection");
...
myCloudServer.request(req);

-Hoss




Re: Search and Entity structure

2012-10-26 Thread Gora Mohanty
On 27 October 2012 07:55, Lance Norskog  wrote:
> A side point: in fact, the connection between MBA and grade is not lost. The 
> values in a multi-valued field are stored in order. You can have separate 
> multi-valued fields with matching entries, and the values will be fetched in 
> order and you can match them by counting. This is not database-ish, but it is 
> a permanent feature.
[...]

Yes, thanks for pointing that out.

Somehow that solution has always seemed a
bit "hacky" to me, but that is probably just
personal bias.

Regards,
Gora


Re: Search and Entity structure

2012-10-26 Thread Gora Mohanty
On 27 October 2012 01:20, v vijith  wrote:
[...]
> The dataconfig file is
> 
> 
> 
> 
> 
[...]

The SELECT in the nested entity "qualification" should fetch
all qualifications for the given employee. How to do that is
database dependent, e.g., one would use something like
group_concat() in mysql. After collecting multiple qualifications
in a single string, one can use a transformer to break the
string at the separator used in group_concat(), and populate
the desired Solr field with the pieces.

Depending on your expertise, it might be easier to do this
through a Solr XML document, or SolrJ.

Regards,
Gora


Re: DIH nested entities don't work

2012-10-26 Thread mroosendaal
Hi,

I've tried the subselection query and it works fine. It's weekend now:-) so
what i'm going to do is remove the cache option. I'll also try a different
jdbc driver and a few jdbc driver options.

Thanks,
Maarten



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4016273.html
Sent from the Solr - User mailing list archive at Nabble.com.