date:20131212

Re: unable to facet range query

2013-12-12 Thread Ahmet Arslan

Hi,

If you don't use QEC, just remove it from configuration file. If you need it 
change queryFieldType from integer to text_sw or something like that.

http://wiki.apache.org/solr/QueryElevationComponent#queryFieldType


The word 'promotions' is not a numeric value, thats why you are getting the 
exception.



On Thursday, December 12, 2013 8:43 AM, Nutan nutanshinde1...@gmail.com wrote:
My schema has :
field name=contents type=text indexed=true stored=true
multiValued=false/
field name=id type=integer indexed=true stored=true required=true
multiValued=false/

this is my field which i want to facet:
field name=id type=integer indexed=true stored=true required=true
multiValued=false/
fieldType name=integer class=solr.IntField  omitNorms=true
positionIncrementGap=0/

I replaced above fieldtype with this:
fieldType name=integer class=solr.TrieIntField precisionStep=0
positionIncrementGap=0/

But now this shows error in elevate component.My elevate.xml is
?xml version=1.0 encoding=UTF-8 ?
elevate
query text=promotions
doc id=2 /
doc id=7 /
/query
/elevate

searchComponent name=elevator class=solr.QueryElevationComponent 
str name=queryFieldTypeinteger/str
str name=config-fileelevate.xml/str
/searchComponent

*Logs:*
Caused by: org.apache.solr.common.SolrException: Error initializing
QueryElevationComponent.
    at
org.apache.solr.handler.component.QueryElevationComponent.inform(QueryElevationComponent.java:218)
    at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:592)
    at org.apache.solr.core.SolrCore.init(SolrCore.java:801)
    ... 13 more
Caused by: org.apache.solr.common.SolrException: Invalid Number: promotions
    at
org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:122)

i read that range queries are for numeric fields,isn't IntField a numeric
one?
What are the other datatypes that support range queries?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/unable-to-facet-range-query-tp4106305.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr hardware memory question

2013-12-12 Thread Toke Eskildsen

On Thu, 2013-12-12 at 02:46 +0100, Joel Bernstein wrote:
 Curious how many documents per shard you were planning?

350-500 million, optimized to a single segment as the data are not
changing.

 The number of documents per shard and field type will drive the amount
 of a RAM needed to sort and facet. 

Very true. It makes a lot of sense to separate RAM requirements for the
Lucene/Solr structures and OS-caching.

It seems that Gil is working on about the same project as we are, so I
will elaborate in this thread:

We would like to perform some sort of grouping on URL, so that the same
page harvested at different points in time, is only displayed once. This
is probably the heaviest functionality as the cardinality of the field
will be near the number of documents.

For plain(er) faceting, things like MIME-type, harvest date and site
seems relevant. Those field have lower cardinality and they are
single-valued so the memory requirements are something like 
#docs*log2(#unique_values) bits
With 500M documents and 1000 values, that is 600MB. With 20 shards, we
are looking at 12GB per simple facet field.

Regards,
Toke Eskildsen

Re: Constantly increasing time of full data import

2013-12-12 Thread michallos

One more stack trace which is active during indexing. This call task is also
executed on the same single threaded executor as registering new searcher:

searcherExecutor-48-thread-1 prio=10 tid=0x7f24c0715000 nid=0x3de6
runnable [0x7f24b096d000]
   java.lang.Thread.State: RUNNABLE
at
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:111)
at
org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:131)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:311)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1494)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1363)
at
org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:118)
at
org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:465)
at org.apache.solr.search.LRUCache.warm(LRUCache.java:188)
at
org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:2035)
at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1676)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

   Locked ownable synchronizers:
- 0x7f2880335d38 (a 
java.util.concurrent.ThreadPoolExecutor$Worker)

Maybe warming queries are blocking commit?  But... why it increases during
not so high load - 1000-2000 request per hour.
And doesn't increase during very low load.

Best,
Michał 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4106318.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Change Velocity Template Directory in Solr 4.6

2013-12-12 Thread Ahmet Arslan

Hi Olson,

You are correct v.base_dir parameter is not used at all after SOLR-4882.
{corename}/conf/velocity is the only option.
solr.allow.unsafe.resourceloading system property does not affect this behavior.

Wiki needs update. (confluence does not mention v.base_dir parameter) Do you
want to add your findings to velocity wiki page?

P.S. If you don't have a wiki account anyone can create it. But to edit the
wiki, your username should be added wiki contributors group. This is achieved
by sending an e-mail to solr user mailing list.

On Wednesday, December 11, 2013 10:49 PM, O. Olson olson_...@yahoo.it wrote:
Thank you iorixxx. Yes, when I run:

java -Dsolr.allow.unsafe.resourceloading=true -jar start.jar

And I then load the root of my site, I get:

ERROR - 2013-12-11 14:36:03.434; org.apache.solr.common.SolrException;
null:java.io.IOException: Unable to find resource 'browse.vm'
at
org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:174)
at
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:50)

stacktrace truncated

In the above case, in the solrconfig.xml I have set:

str name=v.base_dirMyVMTemplates/str

And my velocity templates are in /corename/conf/MyVMTemplates . If you look
at the VelocityResponseWriter at
http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_6/solr/contrib/velocity/src/java/org/apache/solr/response/VelocityResponseWriter.java?revision=1541081view=markup

nowhere does it use v.base_dir. So it seems that you need to name the
velocity template directory as velocity. (I tried to set it to
/corename/conf/velocity and it works without any errors.)

Thank you,
O. O.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Change-Velocity-Template-Directory-in-Solr-4-6-tp4105381p4106232.html

Sent from the Solr - User mailing list archive at Nabble.com.

subscribe for this maillist

2013-12-12 Thread Gabriel Zhang

I want to subscribe for this solr mailing list.

Thanks and Best Regards,

Gabriel Zhang

Re: subscribe for this maillist

2013-12-12 Thread Rafał Kuć

Hello!

To subscribe please send a mail to solr-user-subscr...@lucene.apache.org

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 I want to subscribe for this solr mailing list.

 Thanks and Best Regards,

 Gabriel Zhang

Re: Getting Solr Document Attributes from a Custom Function

2013-12-12 Thread Mukundaraman valakumaresan

Hi

Thanks a lot , that helps

Regards
Mukund


On Thu, Dec 12, 2013 at 1:18 AM, Kydryavtsev Andrey werde...@yandex.ruwrote:

 As I know (not 100% sure actually), function queries don't work with
 multivalued fields. Why do you need multivalued fields here? Your price
 and numberOfCities don't look like multivalued. At least you can try to
 use, you know, some tricky format like
 50;40;20 to index multivalued field as single-valued and then parse this
 into values list in function.

 11.12.2013, 11:13, Mukundaraman valakumaresan muk...@8kmiles.com:
  Hi Kydryavtsev
 
  Thanks a lot it works,  but how do i pass a multivalued field values to a
  function query?
 
  Can it be passed as a String array?
 
  Thanks  Regards
  Mukund
 
  On Tue, Dec 10, 2013 at 12:05 PM, Kydryavtsev Andrey werde...@yandex.ru
 wrote:
 
   You can implement it in this way:
   Index number of cities as new int field (like field
   name=numberOfCities2/field) and implement user function like
 
   customFunction(price, numberOfCities, 1, 2000, 5)
 
   Custom parser should parse this into value sources list. From first two
   field sources we can get per doc value for this particular fields,
 another
   three will be ConstValueSource instances - just constants, so we can
 access
   all 5 values and implement custom formula per doc id. Find examples in
   ValueSourceParser and solr functions like DefFunction or
 MinFloatFunction
 
   10.12.2013, 09:31, Mukundaraman valakumaresan muk...@8kmiles.com:
   Hi Hoss,
 
   Thanks a lot for your response. The actual problem is,
 
   For every record that I query, I have to execute a formula and sort
 the
   records based on the value of the formula.
   The formula has elements from the record.
 
   For eg. for the following document ,I need to apply the formula
   (maxprice -
   solrprice)/ (maxprice - minprice)  +  count(cities)/totalcities.
   where maxprice, maxprice and total cities will be available at run
 time.
 
   So for the following record, it has to execute as  (1 -
   *5000*)/(1-2000)
   + *2*/5   (where 5000 and 2, which are in bold are from the document)
 
   doc
   field name=idapartment_1/field
   field name=nameCasa Grande/field
   field name=localitychennai/field
   field name=localitybangalore/field
   field name=price5000/field
   /doc
 
   Thanks  Regards
   Mukund
 
   On Tue, Dec 10, 2013 at 12:22 AM, Chris Hostetter
   hossman_luc...@fucit.orgwrote:
Smells like an XY problem ...
 
Can you please describe what your end goal is in writing a custom
function, and what you would do with things like the name field
   inside
your funciton?
 
In general, accessing stored field values for indexed documents ca
 be
prohibitively expensive, it rather defeats the entire point of the
inverted index data structure.  If you help us understand what your
   goal
is, people may be able to offer performant suggestions.
 
https://people.apache.org/~hossman/#xyproblem
XY Problem
 
Your question appears to be an XY Problem ... that is: you are
   dealing
with X, you are assuming Y will help you, and you are asking
 about
   Y
without giving more details about the X so that we can understand
 the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341
 
: Date: Mon, 9 Dec 2013 20:24:15 +0530
: From: Mukundaraman valakumaresan muk...@8kmiles.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Getting Solr Document Attributes from a Custom Function
:
: Hi All,
:
: I have a written a custom solr function and I would like to read a
property
: of the document inside my custom function. Is it possible to get
 that
using
: Solr?
:
: For eg. inside the floatVal method, I would like to get the value
 of
   the
: attribute name
:
: public class CustomValueSource extends ValueSource {
:
: @Override
: public FunctionValues getValues(Map context,
: AtomicReaderContext readerContext) throws IOException {
:  return new FloatDocValues(this) { @Override public float
   floatVal(int
doc)
: {
: /***
:  getDocument(doc).getAttribute(name)
:
: / }}}
:
: Thanks  Regards
: Mukund
:
 
-Hoss
http://www.lucidworks.com/

Re: unable to facet range query

2013-12-12 Thread Nutan

I remove the QEC but for facet it still gives this error:

Unable to range facet on
field:id{type=integer,properties=indexed,stored,omitNorms,required,
required=true}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/unable-to-facet-range-query-tp4106305p4106335.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Cloud graph gone after manually editing clusterstate.json

2013-12-12 Thread Stefan Matheis

Michael

that only shows that the http request is a success .. the white page might be 
caused through
a) invalid json structure -- which should be easy to check
b) missing information inside the clusterstate -- therefore it would be good to 
know the difference between the original file and your modified one.

-Stefan 


On Wednesday, December 11, 2013 at 5:06 PM, michael.boom wrote:

 I had a look, but all looks fine there too:
 
 [Wed Dec 11 2013 17:04:41 GMT+0100 (CET)] runRoute get #/~cloud
 GET tpl/cloud.html?_=1386777881244
 200 OK
 57ms 
 GET /solr/zookeeper?wt=json_=1386777881308
 200 OK
 509ms 
 GET /solr/zookeeper?wt=jsonpath=%2Flive_nodes_=1386777881822
 200 OK
 62ms 
 GET
 /solr/zookeeper?wt=jsondetail=truepath=%2Fclusterstate.json_=1386777881886
 200 OK
 84ms 
 
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Cloud-graph-gone-after-manually-editing-clusterstate-json-tp4106142p4106172.html
 Sent from the Solr - User mailing list archive at Nabble.com 
 (http://Nabble.com).

RE: Solr hardware memory question

2013-12-12 Thread Hoggarth, Gil

Thanks for this - I haven't any previous experience with utilising SSDs in the 
way you suggest, so I guess I need to start learning! And thanks for the 
Danish-webscale URL, looks like very informed reading. (Yes, I think we're 
working in similar industries with similar constraints and expectations).

Compiliing my answers into one email,  Curious how many documents per shard 
you were planning? The number of documents per shard and field type will drive 
the amount of a RAM needed to sort and facet.
- Number of documents per shard, I think about 200 million. That's a bit of a 
rough estimate based on other Solrs we run though. Which I think means we hold 
a lot of data for each document, though I keep arguing to keep this to the 
truly required minimum. We also have many facets, some of which are pretty 
large (I'm stretching my understanding here but I think most documents have 
many 'entries' in many facets so these really hit us performance-wise.)

I try to keep a 1-to-1 ratio of Solr nodes to CPUs with a few spare for the 
operating system. I utilise MMapDirectory to manage memory via the OS. So at 
this moment I guessing that we'll have 56 Solr dedicated CPUs across 2 physical 
32 CPU servers and _hopefully_ 256GB RAM on each. This would give 28 shards and 
each would have 5GB java memory (in Tomcat), leaving 126GB on each server for 
the OS and MMap. (I believe the Solr theory for this doesn't accurately work 
out but we can accept the edge cases where this will fail.)

I can also see that our hardware requirements will also depend on usage as well 
as the volume of data, and I've been pondering how best we can structure our 
index/es to facilitate a long term service (which means that, given it's a lot 
of data, I need to structure the data so that new usage doesn't require 
re-indexing.) But at this early stage, as people say, we need to prototype, 
test, profile etc. and to do that I need the hardware to run the trials (policy 
dictates that I buy the production hardware now, before profiling - I get to 
control much of the design and construction so I don't argue with this!) 

Thanks for all the comments everyone, all very much appreciated :)
Gil


-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: 11 December 2013 12:02
To: solr-user@lucene.apache.org
Subject: Re: Solr hardware memory question

On Tue, 2013-12-10 at 17:51 +0100, Hoggarth, Gil wrote:
 We're probably going to be building a Solr service to handle a dataset 
 of ~60TB, which for our data and schema typically gives a Solr index 
 size of 1/10th - i.e., 6TB. Given there's a general rule about the 
 amount of hardware memory required should exceed the size of the Solr 
 index (exceed to also allow for the operating system etc.), how have 
 people handled this situation?

By acknowledging that it is cheaper to buy SSDs instead of trying to compensate 
for slow spinning drives with excessive amounts of RAM. 

Our plans for an estimated 20TB of indexes out of 372TB of raw web data is to 
use SSDs controlled by a single machine with 512GB of RAM (or was it 256GB? 
I'll have to ask the hardware guys):
https://sbdevel.wordpress.com/2013/12/06/danish-webscale/

As always YMMW and the numbers you quite elsewhere indicates that your queries 
are quite complex. You might want to be a bit of profiling to see if they are 
heavy enough to make the CPU the bottleneck.

Regards,
Toke Eskildsen, State and University Library, Denmark

Re: Equivalent of SQL JOIN in SOLR across multiple cores

2013-12-12 Thread bijalcm

I had gone through link - http://wiki.apache.org/solr/Join and it says there
is a Limitation in JOIN, you will be able to get resulting documents
containing fields in either of two

I have used below query
http://localhost:8983/solr/coreTO/select?q={!join from=docId to=id
fromIndex=coreFROM}query

I want consolidation of fields from multiple cores and there are two fields
in common across all cores.

I have data stored in normalized form across 3 cores on same JVM. Want to
merge and select multiple fields depending on WHERE clause/common fields in
each core.

Any help would be appreciated!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Equivalent-of-SQL-JOIN-in-SOLR-across-multiple-cores-tp4106152p4106344.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Cloud graph gone after manually editing clusterstate.json

2013-12-12 Thread michael.boom

Hi guys, thanks for the replies!

The json was valid, i validated it and the only diff between the fiels was
my edit.

But actually, it got fixed by itself - when i got to work today, everything
was working as it should.
Maybe it was something on my machine or browser, can't put a finger on the
problem.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cloud-graph-gone-after-manually-editing-clusterstate-json-tp4106142p4106350.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud and MoreLikeThis: SOLR-788

2013-12-12 Thread Furkan KAMACI

Hi;

SOLR-4414 has no patch. It's related issue has patches bu it seems fixed
since Solr 4.1

Thanks;
Furkan KAMACI




2013/12/12 gf80 giuseppe_fe...@hotmail.com

 Hi guys,

 could you kindly help me to apply patch for MoreLikeThis on solrcloud.
 I'm using Solr 4.6 and I'm using solrcloud with 10 shards.
 The problem is described here
 https://issues.apache.org/jira/browse/SOLR-4414
 but I think that it was solved but not already delivered in Solr4.6.

 Thanks a lot in advance,
 Giuseppe

 P.S. Rakudten: Did you figure out the problem applying patch? Tx



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106258.html
 Sent from the Solr - User mailing list archive at Nabble.com.

RE: SolrCloud and MoreLikeThis: SOLR-788

2013-12-12 Thread gf80

Hi, 
thanks for the answer, I think that you mean the issue SOLR-788, isn't it? if 
yes, I think that it's solved as you say, but I see Fix Version/s:


4.1,
5.0, so is it possible that it's 
not already delivered in Solr4.6? 
However I think that solving related issue my problem is not solved :(.

I am just trying to find a warkaround, for example is there any way to ask 
solrcloud about where is document with this id?, if yes I can try to 
customize morelikethis to do this question before to ask for MLT on a owner 
shard. Obviously, I am assuming that after selecting the right shard the MLT 
answer include documents in other shards.

Let me know if you have any suggestion, thanks in advance,
-giuseppe

 
Date: Thu, 12 Dec 2013 03:51:04 -0800
From: ml-node+s472066n4106355...@n3.nabble.com
To: giuseppe_fe...@hotmail.com
Subject: Re: SolrCloud and MoreLikeThis: SOLR-788



Hi;


SOLR-4414 has no patch. It's related issue has patches bu it seems fixed

since Solr 4.1


Thanks;

Furkan KAMACI





2013/12/12 gf80 [hidden email]


 Hi guys,



 could you kindly help me to apply patch for MoreLikeThis on solrcloud.

 I'm using Solr 4.6 and I'm using solrcloud with 10 shards.

 The problem is described here

 https://issues.apache.org/jira/browse/SOLR-4414
 but I think that it was solved but not already delivered in Solr4.6.



 Thanks a lot in advance,

 Giuseppe



 P.S. Rakudten: Did you figure out the problem applying patch? Tx







 --

 View this message in context:

 http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106258.html
 Sent from the Solr - User mailing list archive at Nabble.com.














If you reply to this email, your message will be added to the 
discussion below:

http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106355.html



To unsubscribe from SolrCloud and MoreLikeThis: SOLR-788, click 
here.

NAML
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106364.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr hardware memory question

2013-12-12 Thread Toke Eskildsen

On Thu, 2013-12-12 at 11:10 +0100, Hoggarth, Gil wrote:
 Thanks for this - I haven't any previous experience with utilising SSDs
 in the way you suggest, so I guess I need to start learning!

There's a bit of divide in the Lucene/Solr-world on this. Everybody
agrees that SSDs in themselves are great for Lucene/Solr searches,
compared to a spinning drives solution. How much better is another
matter and the issue gets confusing when RAM caching is factored in.

Some are also very concerned about the reliability of SSDs and the write
performance degradation without TRIM (you need to have a quite specific
setup to have TRIM enabled on a server with SSDs in RAID). Guessing that
your 6TB index is not heavily updated, the TRIM part should not be one
of your worries though.

At Statsbiblioteket, we have been using SSDs for our search servers
since 2008. That was back when random write performance was horrible and
a large drive was 64GB. As you have probably guessed, we are very much
in the SSD camp.

We have done some testing and for simple searches (i.e. a lot of IO and
comparatively little CPU usage), we have observed that SSDs + 10% index
size RAM for caching deliver something like 80% of pure RAM speed.
https://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/

Your mileage will surely vary.

 [...] leaving 126GB on each server for the OS and MMap. [...]

So about the same as your existing 3TB setup? Seems like you will get
the same performance then. I must say that 1 minute response times would
be very hard to sell at our library, even for a special search only used
by a small and dedicated audience. Even your goal of 20 seconds seems
adverse to exploratory search.

May I be so frank as to suggest a course of action? Buy one ½ TB Samsung
840 EVO SSD, fill it with indexes and test it in a machine with 32GB of
RAM, thus matching the 1/20 index size RAM that your servers will have.
Such a drive costs £250 on Amazon and the experiment would spare you for
a lot of speculation and time.

Next, conclude that SSDs are the obvious choice and secure the 840 for
your workstation with reference to further testing.

 I can also see that our hardware requirements will also depend on usage
 as well as the volume of data, and I've been pondering how best we can
 structure our index/es to facilitate a long term service (which means
 that, given it's a lot of data, I need to structure the data so that
 new usage doesn't require re-indexing.)

We definitely have this problem too. We have resigned to re-indexing the
data after some months of real world usage.

Regards,
Toke Eskildsen, State and University Library, Denmark

Sudden Solr crush after commit

2013-12-12 Thread Manuel Le Normand

In the last days one of my tomcat servlet, running only a Solr instance,
crushed unexpectedly twice.

Low memory usage, nothing written in the tomcat log, and the last thing
happening in solr log is 'end_commit_flush' followed by 'UnInverted
mutli-valued field' for the fields faceted during the newsearcher run.
Right after this, the tomcat crushed leaving no trace.

Has anyone experienced a similar issue before?

Thanks,
Manu

Re: SolrCloud and MoreLikeThis: SOLR-788

2013-12-12 Thread Furkan KAMACI

Hi;

Yes, I am talking about SOLR-788. There writes 4.1 so it means that it has
fixed at 4.1. On the other hand some patches are applied both for ongoing
versions and trunk. 5.0 is the trunk version of Solr.

For your other question: what do you mean with: where is document with
this id?. If you want to learn the shard that document belongs to you can
do that:
http://localhost:8983/solr/collection1/select?q=*%3A*fl=url%2C+%5Bshard%5Dwt=jsonindent=true

Thanks;
Furkan KAMACI

2013/12/12 gf80 giuseppe_fe...@hotmail.com

Hi,
thanks for the answer, I think that you mean the issue SOLR-788, isn't it?
if yes, I think that it's solved as you say, but I see Fix Version/s:

4.1,
5.0, so is it possible
that it's not already delivered in Solr4.6?
However I think that solving related issue my problem is not solved :(.

I am just trying to find a warkaround, for example is there any way to ask
solrcloud about where is document with this id?, if yes I can try to
customize morelikethis to do this question before to ask for MLT on a owner
shard. Obviously, I am assuming that after selecting the right shard the
MLT answer include documents in other shards.

Let me know if you have any suggestion, thanks in advance,
-giuseppe

Date: Thu, 12 Dec 2013 03:51:04 -0800
From: ml-node+s472066n4106355...@n3.nabble.com
To: giuseppe_fe...@hotmail.com
Subject: Re: SolrCloud and MoreLikeThis: SOLR-788

Hi;

SOLR-4414 has no patch. It's related issue has patches bu it seems fixed

since Solr 4.1

Thanks;

Furkan KAMACI

2013/12/12 gf80 [hidden email]

Hi guys,

could you kindly help me to apply patch for MoreLikeThis on solrcloud.

I'm using Solr 4.6 and I'm using solrcloud with 10 shards.

The problem is described here

https://issues.apache.org/jira/browse/SOLR-4414
but I think that it was solved but not already delivered in Solr4.6.

Thanks a lot in advance,

Giuseppe

P.S. Rakudten: Did you figure out the problem applying patch? Tx

View this message in context:

http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106258.html
Sent from the Solr - User mailing list archive at Nabble.com.

If you reply to this email, your message will be added to
the discussion below:

http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106355.html

To unsubscribe from SolrCloud and MoreLikeThis: SOLR-788,
click here.

NAML

--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106364.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: [Solr Wiki] Your wiki account data

2013-12-12 Thread tedswiss

Sorry, your first email hit the spam box !



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fwd-Solr-Wiki-Your-wiki-account-data-tp4104901p4106383.html
Sent from the Solr - User mailing list archive at Nabble.com.

Metrics in monitoring SolrCloud

2013-12-12 Thread michael.boom

Hi,

I'm trying to add SolrCloud to out internal monitoring tools and I wonder if
anybody else proceeded in this direction and could maybe provide some tips.
I would want to be able to get from SolrCloud:
1. The status for each collection - meaning can it serve queries or not.  
2. Average query time per collection
3. Nr of requests per second/min for each collection

Would i need to implement some solr plugins for this, or is the information
already existing?

Thanks!



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Metrics-in-monitoring-SolrCloud-tp4106384.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SolrCloud and MoreLikeThis: SOLR-788

2013-12-12 Thread gf80

great, thanks very much for your kind support, with this query I can perform a 
sort of workaround to SOLR-4414 issue, what do you think about? I am wrong? 

Anyway, I am new with solr and solrcloud too, but I am having a full immersion 
with it from few days to index very large mole of documents. So, any hint is 
appreciate, for instance I don't know if the choice to use 10 shards on the 
same server (30giga RAM) is good and how much number of shards is impact on 
indexing time.

Tx a lot,
-giuseppe

Date: Thu, 12 Dec 2013 06:38:09 -0800
From: ml-node+s472066n4106382...@n3.nabble.com
To: giuseppe_fe...@hotmail.com
Subject: Re: SolrCloud and MoreLikeThis: SOLR-788



Hi;


Yes, I am talking about SOLR-788. There writes 4.1 so it means that it has

fixed at 4.1. On the other hand some patches are applied both for ongoing

versions and trunk. 5.0 is the trunk version of Solr.


For your other question: what do you mean with: where is document with

this id?. If you want to learn the shard that document belongs to you can

do that:

http://localhost:8983/solr/collection1/select?q=*%3A*fl=url%2C+%5Bshard%5Dwt=jsonindent=true

Thanks;

Furkan KAMACI



2013/12/12 gf80 [hidden email]


 Hi,

 thanks for the answer, I think that you mean the issue SOLR-788, isn't it?

 if yes, I think that it's solved as you say, but I see Fix Version/s:





 4.1,

  5.0, so is it possible

 that it's not already delivered in Solr4.6?

 However I think that solving related issue my problem is not solved :(.



 I am just trying to find a warkaround, for example is there any way to ask

 solrcloud about where is document with this id?, if yes I can try to

 customize morelikethis to do this question before to ask for MLT on a owner

 shard. Obviously, I am assuming that after selecting the right shard the

 MLT answer include documents in other shards.



 Let me know if you have any suggestion, thanks in advance,

 -giuseppe





 Date: Thu, 12 Dec 2013 03:51:04 -0800

 From: [hidden email]

 To: [hidden email]

 Subject: Re: SolrCloud and MoreLikeThis: SOLR-788







 Hi;





 SOLR-4414 has no patch. It's related issue has patches bu it seems fixed



 since Solr 4.1





 Thanks;



 Furkan KAMACI











 2013/12/12 gf80 [hidden email]





  Hi guys,



 



  could you kindly help me to apply patch for MoreLikeThis on solrcloud.



  I'm using Solr 4.6 and I'm using solrcloud with 10 shards.



  The problem is described here



  https://issues.apache.org/jira/browse/SOLR-4414
  but I think that it was solved but not already delivered in Solr4.6.



 



  Thanks a lot in advance,



  Giuseppe



 



  P.S. Rakudten: Did you figure out the problem applying patch? Tx



 



 



 



  --



  View this message in context:



 

 http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106258.html
  Sent from the Solr - User mailing list archive at Nabble.com.



 

























 If you reply to this email, your message will be added to

 the discussion below:



 http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106355.html






 To unsubscribe from SolrCloud and MoreLikeThis: SOLR-788,

 click here.



 NAML









 --

 View this message in context:

 http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106364.html
 Sent from the Solr - User mailing list archive at Nabble.com.














If you reply to this email, your message will be added to the 
discussion below:

http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106382.html



To unsubscribe from SolrCloud and MoreLikeThis: SOLR-788, click 
here.

NAML
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106385.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configurable collectors for custom ranking

2013-12-12 Thread Peter Keegan

Regarding my original goal, which is to perform a math function using the
scaled score and a field value, and sort on the result, how does this fit
in? Must I implement another custom PostFilter with a higher cost than the
scale PostFilter?

Thanks,
Peter


On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan peterlkee...@gmail.comwrote:

 Thanks very much for the guidance. I'd be happy to donate a working
 solution.

 Peter


 On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein joels...@gmail.comwrote:

 SOLR-5020 has the commit info, it's mainly changes to SolrIndexSearcher I
 believe. They might apply to 4.3.
 I think as long you have the finish method that's all you'll need. If you
 can get this working it would be excellent if you could donate back the
 Scale PostFilter.


 On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan peterlkee...@gmail.com
 wrote:

  This is what I was looking for, but the DelegatingCollector 'finish'
 method
  doesn't exist in 4.3.0 :(   Can this be patched in and are there any
 other
  PostFilter dependencies on 4.5?
 
  Thanks,
  Peter
 
 
  On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein joels...@gmail.com
  wrote:
 
   Here is one approach to use in a postfilter
  
   1) In the collect() method call score for each doc. Use the scores to
   create your scaleInfo.
   2) Keep a bitset of the hits and a priorityQueue of your top X
 ScoreDocs.
   3) Don't delegate any documents to lower collectors in the collect()
   method.
   4) In the finish method create a score mapping (use the hppc
   IntFloatOpenHashMap) with your top X docIds pointing to their score,
  using
   the priorityQueue created in step 2. Then iterate the bitset (also
  created
   in step 2) sending down each doc to the lower collectors, retrieving
 and
   scaling the score from the score map. If the document is not in the
 score
   map then send down 0.
  
   You'll have setup a dummy scorer to feed to lower collectors. The
   CollapsingQParserPlugin has an example of how to do this.
  
  
  
  
   On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan peterlkee...@gmail.com
   wrote:
  
Hi Joel,
   
I thought about using a PostFilter, but the problem is that the
 'scale'
function must be done after all matching docs have been scored but
  before
adding them to the PriorityQueue that sorts just the rows to be
  returned.
Doing the 'scale' function wrapped in a 'query' is proving to be too
  slow
when it visits every document in the index.
   
In the Collector, I can see how to get the field values like this:
   
   
  
 
 indexSearcher.getSchema().getField(field(myfield).getType().getValueSource(SchemaField,
QParser).getValues()
   
But, 'getValueSource' needs a QParser, which isn't available.
And I can't create a QParser without a SolrQueryRequest, which isn't
available.
   
Thanks,
Peter
   
   
On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein joels...@gmail.com
 
wrote:
   
 Peter,

 It sounds like you could achieve what you want to do in a
 PostFilter
rather
 then extending the TopDocsCollector. Is there a reason why a
  PostFilter
 won't work for you?

 Joel


 On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan 
  peterlkee...@gmail.com
 wrote:

  Quick question:
  In the context of a custom collector, how does one get the
 values
  of
   a
  field of type 'ExternalFileField'?
 
  Thanks,
  Peter
 
 
  On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan 
   peterlkee...@gmail.com
  wrote:
 
   Hi Joel,
  
   This is related to another thread on function query matching (
  
 

   
  
 
 http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
  ).
   The patch in SOLR-4465 will allow me to extend
 TopDocsCollector
  and
  perform
   the 'scale' function on only the documents matching the main
  dismax
  query.
   As you mention, it is a slightly intrusive design and requires
   that I
   manage my own PriorityQueue (and a local duplicate of
 HitQueue),
   but
  should
   work. I think a better design would hide the PQ from the
 plugin.
  
   Thanks,
   Peter
  
  
   On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein 
  joels...@gmail.com
   
  wrote:
  
   Hi Peter,
  
   I've been meaning to revisit configurable ranking collectors,
  but
   I
   haven't
   yet had a chance. It's on the shortlist of things I'd like to
   tackle
   though.
  
  
  
   On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan 
peterlkee...@gmail.com
   wrote:
  
I looked at SOLR-4465 and SOLR-5045, where it appears that
  there
is
 a
   goal
to be able to do custom sorting and ranking in a
 PostFilter.
  So
far,
  it
looks like only custom aggregation can be implemented in
PostFilter
   (5045).

Re: SolrCloud and MoreLikeThis: SOLR-788

2013-12-12 Thread Furkan KAMACI

Hi;

Yes, you can it with that way. On the other hand you can start a new thread
about your second question. I can help you to decide the shard size and
other parameters. However you should know that it depends on your system
and your needs.

Thanks;
Furkan KAMACI

2013/12/12 gf80 giuseppe_fe...@hotmail.com

great, thanks very much for your kind support, with this query I can
perform a sort of workaround to SOLR-4414 issue, what do you think about? I
am wrong?

Anyway, I am new with solr and solrcloud too, but I am having a full
immersion with it from few days to index very large mole of documents. So,
any hint is appreciate, for instance I don't know if the choice to use 10
shards on the same server (30giga RAM) is good and how much number of
shards is impact on indexing time.

Tx a lot,
-giuseppe

Date: Thu, 12 Dec 2013 06:38:09 -0800
From: ml-node+s472066n4106382...@n3.nabble.com
To: giuseppe_fe...@hotmail.com
Subject: Re: SolrCloud and MoreLikeThis: SOLR-788

Hi;

Yes, I am talking about SOLR-788. There writes 4.1 so it means that it has

fixed at 4.1. On the other hand some patches are applied both for ongoing

versions and trunk. 5.0 is the trunk version of Solr.

For your other question: what do you mean with: where is document with

this id?. If you want to learn the shard that document belongs to you can

do that:

http://localhost:8983/solr/collection1/select?q=*%3A*fl=url%2C+%5Bshard%5Dwt=jsonindent=true

Thanks;

Furkan KAMACI

2013/12/12 gf80 [hidden email]

Hi,

thanks for the answer, I think that you mean the issue SOLR-788, isn't
it?

if yes, I think that it's solved as you say, but I see Fix Version/s:

4.1,

5.0, so is it possible

that it's not already delivered in Solr4.6?

However I think that solving related issue my problem is not solved :(.

I am just trying to find a warkaround, for example is there any way to
ask

solrcloud about where is document with this id?, if yes I can try to

customize morelikethis to do this question before to ask for MLT on a
owner

shard. Obviously, I am assuming that after selecting the right shard the

MLT answer include documents in other shards.

Let me know if you have any suggestion, thanks in advance,

-giuseppe

Date: Thu, 12 Dec 2013 03:51:04 -0800

From: [hidden email]

To: [hidden email]

Subject: Re: SolrCloud and MoreLikeThis: SOLR-788

Hi;

SOLR-4414 has no patch. It's related issue has patches bu it seems fixed

since Solr 4.1

Thanks;

Furkan KAMACI

2013/12/12 gf80 [hidden email]

Hi guys,

could you kindly help me to apply patch for MoreLikeThis on solrcloud.

I'm using Solr 4.6 and I'm using solrcloud with 10 shards.

The problem is described here

https://issues.apache.org/jira/browse/SOLR-4414
but I think that it was solved but not already delivered in Solr4.6.

Thanks a lot in advance,

Giuseppe

P.S. Rakudten: Did you figure out the problem applying patch? Tx

View this message in context:

http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106258.html
Sent from the Solr - User mailing list archive at Nabble.com.

If you reply to this email, your message will be added to

the discussion below:

http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106355.html

To unsubscribe from SolrCloud and MoreLikeThis: SOLR-788,

click here.

NAML

View this message in context:

http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106364.html
Sent from the Solr - User mailing list archive at Nabble.com.

If you reply to this email, your message will be added to
the discussion below:

http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106382.html

To unsubscribe from SolrCloud and MoreLikeThis: SOLR-788,
click here.

NAML

--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-and-MoreLikeThis-SOLR-788-tp4022581p4106385.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sudden Solr crush after commit

2013-12-12 Thread Furkan KAMACI

Hi Manuel;

Mutivalued field via un-inverted field is introduced at Solr 1.4 and you
can check it from here: https://issues.apache.org/jira/browse/SOLR-475 Could
you give more details about your system (i.e. Solr version) and other
parameters?

Thanks;
Furkan KAMACI




2013/12/12 Manuel Le Normand manuel.lenorm...@gmail.com

 In the last days one of my tomcat servlet, running only a Solr instance,
 crushed unexpectedly twice.

 Low memory usage, nothing written in the tomcat log, and the last thing
 happening in solr log is 'end_commit_flush' followed by 'UnInverted
 mutli-valued field' for the fields faceted during the newsearcher run.
 Right after this, the tomcat crushed leaving no trace.

 Has anyone experienced a similar issue before?

 Thanks,
 Manu

Re: unable to facet range query

2013-12-12 Thread Ahmet Arslan

Hi,

Did you reindex after schema change? 




On Thursday, December 12, 2013 11:51 AM, Nutan nutanshinde1...@gmail.com 
wrote:
I remove the QEC but for facet it still gives this error:

Unable to range facet on
field:id{type=integer,properties=indexed,stored,omitNorms,required,
required=true}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/unable-to-facet-range-query-tp4106305p4106335.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Metrics in monitoring SolrCloud

2013-12-12 Thread Ahmet Arslan

Hi Michael,

Not sure about collection, solr exposes 2) and 3) via :

http://wiki.apache.org/solr/SolrJmx

https://cwiki.apache.org/confluence/display/solr/MBean+Request+Handler




On Thursday, December 12, 2013 4:47 PM, michael.boom my_sky...@yahoo.com 
wrote:
Hi,

I'm trying to add SolrCloud to out internal monitoring tools and I wonder if
anybody else proceeded in this direction and could maybe provide some tips.
I would want to be able to get from SolrCloud:
1. The status for each collection - meaning can it serve queries or not.  
2. Average query time per collection
3. Nr of requests per second/min for each collection

Would i need to implement some solr plugins for this, or is the information
already existing?

Thanks!



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Metrics-in-monitoring-SolrCloud-tp4106384.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: CollapsingQParserPlugin scores incorrectly in Solr 4.6.0 when multiple sort criteria are used

2013-12-12 Thread Joel Bernstein

Hi,

This is a known issue resolved in SOLR-5408. It's fixed in trunk and 4x and
if there is a 4.6.1 it will be in there. If not it will be Solr 4.7.

https://issues.apache.org/jira/browse/SOLR-5408

Joel


On Wed, Dec 11, 2013 at 11:36 PM, Umesh Prasad umesh.i...@gmail.com wrote:

 Issue occurs in Single Segment index also ..

 sort: score desc,floSalesRank asc
 response: {

- numFound: 21461,
- start: 0,
- maxScore: 4.4415073,
- docs: [
   - {
  - floSalesRank: 0,
  - score: 0.123750895,
  - [docid]: 9208
  -




 On Thu, Dec 12, 2013 at 9:50 AM, Umesh Prasad umesh.i...@gmail.com
 wrote:

  Hi All,
  I am using new CollapsingQParserPlugin for Grouping and found that it
  works incorrectly when I use multiple sort criteria.
 
 
 
 
 http://localhost:8080/solr/toys/select/?q=car%20and%20toysversion=2.2start=0rows=10indent=onsort=score%20desc,floSalesRank%20ascfacet=onfacet.field=store_pathfacet.mincount=1bq=store_path:%22mgl/ksc/gcv%22
 
 ^10wt=jsonfl=score,floSalesRank,[docid]bq=id:STFDCHZM3552AHXE^1000fq={!collapse%20field=item_id}
 
 
 - sort: score desc,floSalesRank asc,
 - fl: score,floSalesRank,[docid],
 - start: 0,
 - q: car and toys,
 - facet.field: store_path,
 - fq: {!collapse field=item_id}
 
 
  response:
 
  {
 
 - numFound: 21461,
 - start: 0,
 - maxScore: 4.447499,
 - docs: [
- {
   - floSalesRank: 0,
   - score: 0.12396862,
   - [docid]: 9703
   },
- {
-
 
 
  I found a bug opened for same
  https://issues.apache.org/jira/browse/SOLR-5408 ..
 
 
  The bug is closed but I am not really sure that it works specially for
  Multiple segment parts ..
 
  I am using Solr 4.6.0 and my index contains 4 segments ..
 
  Have anyone else faced the same issue ?
 
  ---
  Thanks  Regards
  Umesh Prasad
 



 --
 ---
 Thanks  Regards
 Umesh Prasad




-- 
Joel Bernstein
Search Engineer at Heliosearch

Re: Configurable collectors for custom ranking

2013-12-12 Thread Joel Bernstein

The sorting is going to happen in the lower level collectors. You need a
value source that returns the score of the document being collected.

Here is how you can make this happen:

1) Create an object in your PostFilter that simply holds the current score.
Place this object in the SearchRequest context map. Update object.score as
you pass the docs and scores to the lower collectors.

2) Create a values source that checks the SearchRequest context for the
object that's holding the current score. Use this object to return the
current score when called. For example if you give the value source a
handle called score a compound function call will look like this:
sum(score(), field(x))

Joel










On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan peterlkee...@gmail.comwrote:

 Regarding my original goal, which is to perform a math function using the
 scaled score and a field value, and sort on the result, how does this fit
 in? Must I implement another custom PostFilter with a higher cost than the
 scale PostFilter?

 Thanks,
 Peter


 On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan peterlkee...@gmail.com
 wrote:

  Thanks very much for the guidance. I'd be happy to donate a working
  solution.
 
  Peter
 
 
  On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein joels...@gmail.com
 wrote:
 
  SOLR-5020 has the commit info, it's mainly changes to SolrIndexSearcher
 I
  believe. They might apply to 4.3.
  I think as long you have the finish method that's all you'll need. If
 you
  can get this working it would be excellent if you could donate back the
  Scale PostFilter.
 
 
  On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan peterlkee...@gmail.com
  wrote:
 
   This is what I was looking for, but the DelegatingCollector 'finish'
  method
   doesn't exist in 4.3.0 :(   Can this be patched in and are there any
  other
   PostFilter dependencies on 4.5?
  
   Thanks,
   Peter
  
  
   On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein joels...@gmail.com
   wrote:
  
Here is one approach to use in a postfilter
   
1) In the collect() method call score for each doc. Use the scores
 to
create your scaleInfo.
2) Keep a bitset of the hits and a priorityQueue of your top X
  ScoreDocs.
3) Don't delegate any documents to lower collectors in the collect()
method.
4) In the finish method create a score mapping (use the hppc
IntFloatOpenHashMap) with your top X docIds pointing to their score,
   using
the priorityQueue created in step 2. Then iterate the bitset (also
   created
in step 2) sending down each doc to the lower collectors, retrieving
  and
scaling the score from the score map. If the document is not in the
  score
map then send down 0.
   
You'll have setup a dummy scorer to feed to lower collectors. The
CollapsingQParserPlugin has an example of how to do this.
   
   
   
   
On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan 
 peterlkee...@gmail.com
wrote:
   
 Hi Joel,

 I thought about using a PostFilter, but the problem is that the
  'scale'
 function must be done after all matching docs have been scored but
   before
 adding them to the PriorityQueue that sorts just the rows to be
   returned.
 Doing the 'scale' function wrapped in a 'query' is proving to be
 too
   slow
 when it visits every document in the index.

 In the Collector, I can see how to get the field values like this:


   
  
 
 indexSearcher.getSchema().getField(field(myfield).getType().getValueSource(SchemaField,
 QParser).getValues()

 But, 'getValueSource' needs a QParser, which isn't available.
 And I can't create a QParser without a SolrQueryRequest, which
 isn't
 available.

 Thanks,
 Peter


 On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein 
 joels...@gmail.com
  
 wrote:

  Peter,
 
  It sounds like you could achieve what you want to do in a
  PostFilter
 rather
  then extending the TopDocsCollector. Is there a reason why a
   PostFilter
  won't work for you?
 
  Joel
 
 
  On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan 
   peterlkee...@gmail.com
  wrote:
 
   Quick question:
   In the context of a custom collector, how does one get the
  values
   of
a
   field of type 'ExternalFileField'?
  
   Thanks,
   Peter
  
  
   On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan 
peterlkee...@gmail.com
   wrote:
  
Hi Joel,
   
This is related to another thread on function query
 matching (
   
  
 

   
  
 
 http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
   ).
The patch in SOLR-4465 will allow me to extend
  TopDocsCollector
   and
   perform
the 'scale' function on only the documents matching the main
   dismax
   query.
As you mention, it is a slightly intrusive design and
 requires
that I

Re: Solr hardware memory question

2013-12-12 Thread Michael Della Bitta

Hello, Gil,

I'm wondering if you've been in touch with the Hathi Trust people, because
I imagine your use cases are somewhat similar.

They've done some blogging around getting digitized texts indexed at scale,
which I what I assume you're doing:

http://www.hathitrust.org/blogs/Large-scale-Search

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Thu, Dec 12, 2013 at 5:10 AM, Hoggarth, Gil gil.hogga...@bl.uk wrote:

 Thanks for this - I haven't any previous experience with utilising SSDs in
 the way you suggest, so I guess I need to start learning! And thanks for
 the Danish-webscale URL, looks like very informed reading. (Yes, I think
 we're working in similar industries with similar constraints and
 expectations).

 Compiliing my answers into one email,  Curious how many documents per
 shard you were planning? The number of documents per shard and field type
 will drive the amount of a RAM needed to sort and facet.
 - Number of documents per shard, I think about 200 million. That's a bit
 of a rough estimate based on other Solrs we run though. Which I think means
 we hold a lot of data for each document, though I keep arguing to keep this
 to the truly required minimum. We also have many facets, some of which are
 pretty large (I'm stretching my understanding here but I think most
 documents have many 'entries' in many facets so these really hit us
 performance-wise.)

 I try to keep a 1-to-1 ratio of Solr nodes to CPUs with a few spare for
 the operating system. I utilise MMapDirectory to manage memory via the OS.
 So at this moment I guessing that we'll have 56 Solr dedicated CPUs across
 2 physical 32 CPU servers and _hopefully_ 256GB RAM on each. This would
 give 28 shards and each would have 5GB java memory (in Tomcat), leaving
 126GB on each server for the OS and MMap. (I believe the Solr theory for
 this doesn't accurately work out but we can accept the edge cases where
 this will fail.)

 I can also see that our hardware requirements will also depend on usage as
 well as the volume of data, and I've been pondering how best we can
 structure our index/es to facilitate a long term service (which means that,
 given it's a lot of data, I need to structure the data so that new usage
 doesn't require re-indexing.) But at this early stage, as people say, we
 need to prototype, test, profile etc. and to do that I need the hardware to
 run the trials (policy dictates that I buy the production hardware now,
 before profiling - I get to control much of the design and construction so
 I don't argue with this!)

 Thanks for all the comments everyone, all very much appreciated :)
 Gil


 -Original Message-
 From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
 Sent: 11 December 2013 12:02
 To: solr-user@lucene.apache.org
 Subject: Re: Solr hardware memory question

 On Tue, 2013-12-10 at 17:51 +0100, Hoggarth, Gil wrote:
  We're probably going to be building a Solr service to handle a dataset
  of ~60TB, which for our data and schema typically gives a Solr index
  size of 1/10th - i.e., 6TB. Given there's a general rule about the
  amount of hardware memory required should exceed the size of the Solr
  index (exceed to also allow for the operating system etc.), how have
  people handled this situation?

 By acknowledging that it is cheaper to buy SSDs instead of trying to
 compensate for slow spinning drives with excessive amounts of RAM.

 Our plans for an estimated 20TB of indexes out of 372TB of raw web data is
 to use SSDs controlled by a single machine with 512GB of RAM (or was it
 256GB? I'll have to ask the hardware guys):
 https://sbdevel.wordpress.com/2013/12/06/danish-webscale/

 As always YMMW and the numbers you quite elsewhere indicates that your
 queries are quite complex. You might want to be a bit of profiling to see
 if they are heavy enough to make the CPU the bottleneck.

 Regards,
 Toke Eskildsen, State and University Library, Denmark

Re: Solr Profiler

2013-12-12 Thread Michael Della Bitta

I've used VisualVM quite a bit, but not sure that it's going to top any of
the other products mentioned in this thread. It's free, though, so there's
that!

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Thu, Dec 12, 2013 at 12:39 AM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 Are you looking for a Java profiler?  Or a Solr monitoring tool?
 For a profiler I'd recommend YourKit -- http://www.yourkit.com/
 For Solr monitoring I'd recommend our SPM --
 http://sematext.com/spm/solr-performance-monitoring/index.html

 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/


 On Wed, Dec 11, 2013 at 3:46 PM, Monica Skidmore 
 monica.skidm...@careerbuilder.com wrote:

  We're trying to improve the speed of some custom Solr code we've written,
  and we'd like to use a profiler to help us focus our efforts.  However,
  we've tried both JProfiler and NewRelic, and we've found it challenging
 to
  configure them correctly to be able to tell where our bottlenecks really
  are.
 
  What profilers/configurations have people successfully used for Solr?
 
  Monica Skidmore
  Engineering Lead, Core Search
  CareerBuilder.com

RE: Load existing HDFS files into solr?

2013-12-12 Thread Tim Potter

Hi Chen,

I'm not aware of any direct integration between the two at this time. You might 
ping the Hive user list with this question too. That said, I've been thinking 
whether it makes sense to build a Hive StorageHandler for Solr? That at least 
seems like a quick way to go. However, it might also be possible to just plug a 
Hive InputFormat into Mark's MapReduce/Solr stuff?

See: https://github.com/markrmiller/solr-map-reduce-example 

Cheers,

Timothy Potter
www.lucidworks.com


From: cynosure cynosure...@gmail.com
Sent: Thursday, December 12, 2013 12:11 AM
To: solr-user@lucene.apache.org
Subject: Load existing HDFS files into solr?

Folks,
Our current data is stored in hive tables. Is there a way to specify solr
to index the existing hdfs files directly? or I have to import each hive
table to solr?
Can any one point to me some reference?
Thank you very much!
Chen

solr OOM Crash

2013-12-12 Thread Sandra Scott

Helllo,

We are experiencing unexplained OOM crashes. We have already seen it a few
times, over our different solr instances. The crash happens only at a
single shard of the collection.

Environment details:
1. Solr 4.3, running on tomcat.
2. 24 Shards.
3. Indexing rate of ~800 docs per minute.

Solrconfig.xml:
1. Merge factor 4
2. Sofrcommit every 10 min
3. Hardcommit every 30 min

Main findings:
1. Solr logs: No query failures prior to the OOM, but DOUBLE the amount of
soft and hard commits in comparison to other shards.
2. Analyzing the dump (VisualVM): Class byte[] takes 4gb out of 5gb
resourced to the JVM, mainly referenced by CompressingStoredFieldsReader GC
root (which by looking at the code, we suspect they were created due to
CompressingSortedFieldsWriter.merge).

Sub findings:
1. GC logs: Showed 108 GC fails prior to the crash.
2. CPI: Overall usage seems fine, but the % of CPU time for the GC stays
high 6 min before the OOM.
3. Memory: Half an hour before OOM the usage slowly rises, until it gets to
5.4gb.

Has anyone encountered higher than normal commit rate that seem to increase
merge rate and cause what I described?

Re: Configurable collectors for custom ranking

2013-12-12 Thread Peter Keegan

This is pretty cool, and worthy of adding to Solr in Action (v2) and the
other books. With function queries, flexible filter processing and caching,
custom collectors, and post filters, there's a lot of flexibility here.

Btw, the query times using a custom collector to scale/recompute scores is
excellent (will have to see how it compares to your outlined solution).

Thanks,
Peter


On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein joels...@gmail.com wrote:

 The sorting is going to happen in the lower level collectors. You need a
 value source that returns the score of the document being collected.

 Here is how you can make this happen:

 1) Create an object in your PostFilter that simply holds the current score.
 Place this object in the SearchRequest context map. Update object.score as
 you pass the docs and scores to the lower collectors.

 2) Create a values source that checks the SearchRequest context for the
 object that's holding the current score. Use this object to return the
 current score when called. For example if you give the value source a
 handle called score a compound function call will look like this:
 sum(score(), field(x))

 Joel










 On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan peterlkee...@gmail.com
 wrote:

  Regarding my original goal, which is to perform a math function using the
  scaled score and a field value, and sort on the result, how does this fit
  in? Must I implement another custom PostFilter with a higher cost than
 the
  scale PostFilter?
 
  Thanks,
  Peter
 
 
  On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan peterlkee...@gmail.com
  wrote:
 
   Thanks very much for the guidance. I'd be happy to donate a working
   solution.
  
   Peter
  
  
   On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein joels...@gmail.com
  wrote:
  
   SOLR-5020 has the commit info, it's mainly changes to
 SolrIndexSearcher
  I
   believe. They might apply to 4.3.
   I think as long you have the finish method that's all you'll need. If
  you
   can get this working it would be excellent if you could donate back
 the
   Scale PostFilter.
  
  
   On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan peterlkee...@gmail.com
   wrote:
  
This is what I was looking for, but the DelegatingCollector 'finish'
   method
doesn't exist in 4.3.0 :(   Can this be patched in and are there any
   other
PostFilter dependencies on 4.5?
   
Thanks,
Peter
   
   
On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein joels...@gmail.com
 
wrote:
   
 Here is one approach to use in a postfilter

 1) In the collect() method call score for each doc. Use the scores
  to
 create your scaleInfo.
 2) Keep a bitset of the hits and a priorityQueue of your top X
   ScoreDocs.
 3) Don't delegate any documents to lower collectors in the
 collect()
 method.
 4) In the finish method create a score mapping (use the hppc
 IntFloatOpenHashMap) with your top X docIds pointing to their
 score,
using
 the priorityQueue created in step 2. Then iterate the bitset (also
created
 in step 2) sending down each doc to the lower collectors,
 retrieving
   and
 scaling the score from the score map. If the document is not in
 the
   score
 map then send down 0.

 You'll have setup a dummy scorer to feed to lower collectors. The
 CollapsingQParserPlugin has an example of how to do this.




 On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan 
  peterlkee...@gmail.com
 wrote:

  Hi Joel,
 
  I thought about using a PostFilter, but the problem is that the
   'scale'
  function must be done after all matching docs have been scored
 but
before
  adding them to the PriorityQueue that sorts just the rows to be
returned.
  Doing the 'scale' function wrapped in a 'query' is proving to be
  too
slow
  when it visits every document in the index.
 
  In the Collector, I can see how to get the field values like
 this:
 
 

   
  
 
 indexSearcher.getSchema().getField(field(myfield).getType().getValueSource(SchemaField,
  QParser).getValues()
 
  But, 'getValueSource' needs a QParser, which isn't available.
  And I can't create a QParser without a SolrQueryRequest, which
  isn't
  available.
 
  Thanks,
  Peter
 
 
  On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein 
  joels...@gmail.com
   
  wrote:
 
   Peter,
  
   It sounds like you could achieve what you want to do in a
   PostFilter
  rather
   then extending the TopDocsCollector. Is there a reason why a
PostFilter
   won't work for you?
  
   Joel
  
  
   On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan 
peterlkee...@gmail.com
   wrote:
  
Quick question:
In the context of a custom collector, how does one get the
   values
of
 a
field of type 'ExternalFileField'?
   
Thanks,

Re: Configurable collectors for custom ranking

2013-12-12 Thread Joel Bernstein

Thanks, I agree this powerful stuff. One of the reasons that I haven't
gotten back to pluggable collectors is that I've been using PostFilters
instead.

When you start doing stuff with scores in postfilters you'll run into the
bug in SOLR-5416. This will effect you when you use facets in combination
with the QueryResultCache or tag and exclude faceting.

The patch in SOLR-5416 resolves this issue. You'll just need your
PostFilter to implement ScoreFilter and the SolrIndexSearcher will know how
to handle things.

The DelegatingCollector.finish() method is so new, these kinds of bugs are
still being cleaned out of the system. SOLR-5416 should be in Solr 4.7.









On Thu, Dec 12, 2013 at 12:54 PM, Peter Keegan peterlkee...@gmail.comwrote:

 This is pretty cool, and worthy of adding to Solr in Action (v2) and the
 other books. With function queries, flexible filter processing and caching,
 custom collectors, and post filters, there's a lot of flexibility here.

 Btw, the query times using a custom collector to scale/recompute scores is
 excellent (will have to see how it compares to your outlined solution).

 Thanks,
 Peter


 On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein joels...@gmail.com
 wrote:

  The sorting is going to happen in the lower level collectors. You need a
  value source that returns the score of the document being collected.
 
  Here is how you can make this happen:
 
  1) Create an object in your PostFilter that simply holds the current
 score.
  Place this object in the SearchRequest context map. Update object.score
 as
  you pass the docs and scores to the lower collectors.
 
  2) Create a values source that checks the SearchRequest context for the
  object that's holding the current score. Use this object to return the
  current score when called. For example if you give the value source a
  handle called score a compound function call will look like this:
  sum(score(), field(x))
 
  Joel
 
 
 
 
 
 
 
 
 
 
  On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan peterlkee...@gmail.com
  wrote:
 
   Regarding my original goal, which is to perform a math function using
 the
   scaled score and a field value, and sort on the result, how does this
 fit
   in? Must I implement another custom PostFilter with a higher cost than
  the
   scale PostFilter?
  
   Thanks,
   Peter
  
  
   On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan peterlkee...@gmail.com
   wrote:
  
Thanks very much for the guidance. I'd be happy to donate a working
solution.
   
Peter
   
   
On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein joels...@gmail.com
   wrote:
   
SOLR-5020 has the commit info, it's mainly changes to
  SolrIndexSearcher
   I
believe. They might apply to 4.3.
I think as long you have the finish method that's all you'll need.
 If
   you
can get this working it would be excellent if you could donate back
  the
Scale PostFilter.
   
   
On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan 
 peterlkee...@gmail.com
wrote:
   
 This is what I was looking for, but the DelegatingCollector
 'finish'
method
 doesn't exist in 4.3.0 :(   Can this be patched in and are there
 any
other
 PostFilter dependencies on 4.5?

 Thanks,
 Peter


 On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein 
 joels...@gmail.com
  
 wrote:

  Here is one approach to use in a postfilter
 
  1) In the collect() method call score for each doc. Use the
 scores
   to
  create your scaleInfo.
  2) Keep a bitset of the hits and a priorityQueue of your top X
ScoreDocs.
  3) Don't delegate any documents to lower collectors in the
  collect()
  method.
  4) In the finish method create a score mapping (use the hppc
  IntFloatOpenHashMap) with your top X docIds pointing to their
  score,
 using
  the priorityQueue created in step 2. Then iterate the bitset
 (also
 created
  in step 2) sending down each doc to the lower collectors,
  retrieving
and
  scaling the score from the score map. If the document is not in
  the
score
  map then send down 0.
 
  You'll have setup a dummy scorer to feed to lower collectors.
 The
  CollapsingQParserPlugin has an example of how to do this.
 
 
 
 
  On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan 
   peterlkee...@gmail.com
  wrote:
 
   Hi Joel,
  
   I thought about using a PostFilter, but the problem is that
 the
'scale'
   function must be done after all matching docs have been scored
  but
 before
   adding them to the PriorityQueue that sorts just the rows to
 be
 returned.
   Doing the 'scale' function wrapped in a 'query' is proving to
 be
   too
 slow
   when it visits every document in the index.
  
   In the Collector, I can see how to get the field values like
  this:

Re: Metrics in monitoring SolrCloud

2013-12-12 Thread Furkan KAMACI

Hi Michael;

I've implemented a management console and dashboard for such kind of
purposes.  After a time later I want to make it an open source project for
the people who needs it. It is a more complicated but very flexible and
pluggable management console and dashboard. I suggest you to look at the
Solr admin page. You can see what you can get and you should debug the
incoming and outcoming requests via a tools as like firebug. So you can
understand what is going on behind the scene. Of course you should read the
wiki to learn what Solr exposes via jmx and http.

If you want a basic mechanism start with a CloudSolrServer that connects to
Zookeeper. You can learn the clusterstate from there. If you check it
within a time period you can be up to date with recent situation of
cluster. That's the way how I started. If you have any question feel free
to ask it I can answer them.

Thanks;
Furkan KAMACI


12 Aralık 2013 Perşembe tarihinde Ahmet Arslan iori...@yahoo.com adlı
kullanıcı şöyle yazdı:
 Hi Michael,

 Not sure about collection, solr exposes 2) and 3) via :

 http://wiki.apache.org/solr/SolrJmx

 https://cwiki.apache.org/confluence/display/solr/MBean+Request+Handler




 On Thursday, December 12, 2013 4:47 PM, michael.boom my_sky...@yahoo.com
wrote:
 Hi,

 I'm trying to add SolrCloud to out internal monitoring tools and I wonder
if
 anybody else proceeded in this direction and could maybe provide some
tips.
 I would want to be able to get from SolrCloud:
 1. The status for each collection - meaning can it serve queries or not.
 2. Average query time per collection
 3. Nr of requests per second/min for each collection

 Would i need to implement some solr plugins for this, or is the
information
 already existing?

 Thanks!



 -
 Thanks,
 Michael
 --
 View this message in context:
http://lucene.472066.n3.nabble.com/Metrics-in-monitoring-SolrCloud-tp4106384.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Metrics in monitoring SolrCloud

2013-12-12 Thread Otis Gospodnetic

Hi Michael,

You may want to give http://sematext.com/spm/solr-performance-monitoring a
shot.  It's got the metrics you need + others + alerts + 
Ping if you want a Christmas discount. :)

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Thu, Dec 12, 2013 at 9:47 AM, michael.boom my_sky...@yahoo.com wrote:

 Hi,

 I'm trying to add SolrCloud to out internal monitoring tools and I wonder
 if
 anybody else proceeded in this direction and could maybe provide some tips.
 I would want to be able to get from SolrCloud:
 1. The status for each collection - meaning can it serve queries or not.
 2. Average query time per collection
 3. Nr of requests per second/min for each collection

 Would i need to implement some solr plugins for this, or is the information
 already existing?

 Thanks!



 -
 Thanks,
 Michael
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Metrics-in-monitoring-SolrCloud-tp4106384.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr OOM Crash

2013-12-12 Thread Otis Gospodnetic

Hi Sandra,

Not a direct answer, but if you are seeing this around merges, have you
tried relaxing the merge factor to, say, 10?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Thu, Dec 12, 2013 at 12:10 PM, Sandra Scott scottsandr...@gmail.comwrote:

 Helllo,

 We are experiencing unexplained OOM crashes. We have already seen it a few
 times, over our different solr instances. The crash happens only at a
 single shard of the collection.

 Environment details:
 1. Solr 4.3, running on tomcat.
 2. 24 Shards.
 3. Indexing rate of ~800 docs per minute.

 Solrconfig.xml:
 1. Merge factor 4
 2. Sofrcommit every 10 min
 3. Hardcommit every 30 min

 Main findings:
 1. Solr logs: No query failures prior to the OOM, but DOUBLE the amount of
 soft and hard commits in comparison to other shards.
 2. Analyzing the dump (VisualVM): Class byte[] takes 4gb out of 5gb
 resourced to the JVM, mainly referenced by CompressingStoredFieldsReader GC
 root (which by looking at the code, we suspect they were created due to
 CompressingSortedFieldsWriter.merge).

 Sub findings:
 1. GC logs: Showed 108 GC fails prior to the crash.
 2. CPI: Overall usage seems fine, but the % of CPU time for the GC stays
 high 6 min before the OOM.
 3. Memory: Half an hour before OOM the usage slowly rises, until it gets to
 5.4gb.

 Has anyone encountered higher than normal commit rate that seem to increase
 merge rate and cause what I described?

Re: Solr Cloud error with shard update

2013-12-12 Thread dboychuck

I have created Jira issue here:
https://issues.apache.org/jira/browse/SOLR-5551



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-error-with-shard-update-tp4106260p4106448.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Change Velocity Template Directory in Solr 4.6

2013-12-12 Thread O. Olson

Thank you very much for the confirmation iorixxx. When I started this thread
on Dec. 6, I did not know about the confluence wiki
(https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide).
I learned about it through another thread I started
(http://lucene.472066.n3.nabble.com/Use-of-Deprecated-Classes-SortableIntField-SortableFloatField-SortableDoubleField-tp4105762p4106001.html).
I think that is much more up to date and has a lot more information than the
official Solr Wiki and I would be reading it before posting here.

Thank you again for your help.
O. O.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Change-Velocity-Template-Directory-in-Solr-4-6-tp4105381p4106467.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re: LanguageIdentifierUpdateProcessor uses only firstValue() on multivalued fields

2013-12-12 Thread Trey Grainger

Hmm... haven't run into the case where null was returned in a multi-valued
scenario yet... I probably just haven't tested that case.  I likely need to
add a null check there - thanks for pointing it out.

-Trey


On Fri, Nov 29, 2013 at 6:10 AM, Müller, Stephan 
muel...@ponton-consulting.de wrote:

 Hello Trey, thank you for this example.

 We've solved it by omitting the multivalued field and passing the distinct
 string fields instead, still I go with proposing a patch, so the language
 processor is able to concatenate multivalues by default. I think it's a
 reasonable feature (and can't remember to have ever contributed a patch to
 an open source project)
 My thoughts on the patch implementation are quite the same as Yours,
 iterating on getValues(). I'll have this discussed in the dev-list and
 probably in JIRA.


 One thing: How do you guard against a possible NPE in line 129
  (final Object inputValue : inputField.getValues()) {

 SolrInputField.getValues() will return NULL if the associated value was
 null. It does not create an empty Collection.
 That, btw, seems to be a minor bug in the javadoc, not stating that this
 method returns null.


 Regards,
 Stephan - srm

 [...]

  The langsToPrepend variable above will contain a set of languages,
 where
  detectLanguage was called separately for each value in the multivalued
  field.  If you just want to concatenate all the values and detect
  languages once (as opposed to only using the first value in the
  multivalued field, like it does today), just concatenate each of the
 input
  values in the first loop and call detectLanguage once at the end.
 
  I wrote code that does this for an example in the Solr in Action book.
   The particular example was detecting languages for each value in a
  multivalued field and then pre-pending the language to the text for the
  multivalued field (so the analyzer would know which stemmer to use, as
  they were being dynamically substituted in based upon the language).  The
  code is available here if you are interested:
  https://github.com/treygrainger/solr-in-
 
 action/blob/master/src/main/java/sia/ch14/MultiTextFieldLanguageIdentifier
  UpdateProcessor.java
 
  Good luck!
 
  -Trey
 
 
 
 
  On Wed, Nov 27, 2013 at 10:16 AM, Müller, Stephan  Mueller@ponton-
  consulting.de wrote:
 
I suspect that it is an oversight for a use case that was not
  considered.
I mean, it should probably either ignore or convert non text/string
values.
   Ok, I'll see that I provide a patch against trunk. It actually ignores
   non string values, but is unable to check the remaining values of a
   multivalued field.
  
Hmmm... are you using JSON input? I mean, how are the types being
 set?
Solr XML doesn't have a way to set the value types.
   
   No. It's a field with multivalued=true. That results in a
   SolrInputField where value (which is defined to be Object) actually
  holds a List.
   This list is populated with Integer, String, Date, you name it.
   I'm talking about the actual Java-Datatypes. The values in the list
   are probably set by this 3rdparty Textbodyprocessor thingy.
  
   Now the Language processor just asks for field.getValue().
   This is delegated to the SolrInputField which in turn calls
   firstValue() Interestingly enough, already is able to handle a
  Collection as its value.
   But if the value is a collection, it just returns the first element.
  
You could workaround it with an update processor that copied the
field
   and
massaged the multiple values into what you really want the language
detection to see. You could even implement that processor as a
JavaScript script with the stateless script update processor.
   
   Our workaround would be to not feed the multivalued field but only the
   String fields (which are also included in the multivalued field)
  
  
   Filing a Bug/Feature request and providing the patch will take some
   time as I haven't setup a fully working trunk in my IDEA installation.
   But I'm eager to do it :)
  
   Regards,
   Stephan
  
  
-- Jack Krupansky
   
-Original Message-
From: Müller, Stephan
Sent: Wednesday, November 27, 2013 5:02 AM
To: solr-user@lucene.apache.org
Subject: LanguageIdentifierUpdateProcessor uses only firstValue() on
multivalued fields
   
Hello,
   
this is a repost. This message was originally posted on the 'general'
   list
but it was suggested, that the 'user' list might be a better place
to
   ask.
   
 Original Message 
Hi,
   
we are passing a multivalued field to the
LanguageIdentifierUpdateProcessor.
This multivalued field contains arbitrary types (Integer, String,
  Date).
   
Now, the
LanguageIdentifierUpdateProcessor.concatFields(SolrInputDocument
doc, String[] fields), which btw does not use the parameter fields,
is unable to parse all fields of the/a multivalued field. The call
Object content =

Re: Prioritize search returns by URL path?

2013-12-12 Thread Jim Glynn

Thanks Chris. I think you've hit the nail on the head.

I understand your concern about prioritizing content simply by content type,
and generally I'd agree with you. However, our situation is a bit unusual.
We don't use our Wiki feature as true wikis. We publish only authoritative
content to them, and to our blogs, so those really are the things we want
returned first. The wikis most often contain the information we want our
customers to find.

Thanks again for the syntax help. We'll give it a try.

JRG



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4106481.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: FuzzyLookupFactory fwfsta.bin

2013-12-12 Thread Areek Zillur

This error is misleading. It tries to load the suggester index from the
storeDir parameter even on the first run, when the index was not created to
begin with, and hence errors. (it will create the index itself when a build
command is issued).

I believe you will not see the error once the index is built for the
suggester and the error on the first run does not have any consequences in
terms of functionality.


On Wed, Dec 11, 2013 at 4:53 AM, Harun Reşit Zafer 
harun.za...@tubitak.gov.tr wrote:

 With the configration below:

 searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
   str name=namesuggest/str
   str name=classnameorg.apache.solr.spelling.suggest.
 Suggester/str
   str name=lookupImplorg.apache.solr.spelling.suggest.fst.
 FuzzyLookupFactory/str
   str name=storeDirfuzzy_suggest_analyzing/str
   str name=buildOnCommittrue/str

   str name=suggestAnalyzerFieldTypetext_tr/str

   str name=sourceLocationsuggestions.txt/str

   !-- Suggester properties --
   bool name=exactMatchFirsttrue/bool
   bool name=preserveSepfalse/bool

/lst

str name=queryAnalyzerFieldTypelowercase/str
 /searchComponent

 *I got th**e error:*

 ...\solr-4.6.0\example\solr\collection1\data\fuzzy_suggest_analyzing\fwfsta.bin
 (The system cannot find the file specified)


 --
 Harun Reşit Zafer
 TÜBİTAK BİLGEM BTE
 Metin Madenciliği ve Kaynaştırma Sistemleri Bölümü
 T +90 262 675 3268
 Whttp://www.hrzafer.com

Unable to check Solr 4.6 SPLITSHARD command progress

2013-12-12 Thread binit

I have a big index, approx size 350 GB in a single shard, which I want to
split.

The SPLITSHARD command initiates successfully as I can see in the logs. (It
times out but reading the forums here it is the expected behavior). 

The problem is it never completes even after a full day, and I doubt it is
actually running in the background. Because the created two shard folders
are of constant size in KBs and is not changing over time.

In the zookeeper the two shards are active and marked as under construction.
The logs do not show any error, the last log entry is like:
INFO  - 2013-12-12 14:48:36.110; org.apache.solr.update.SolrIndexSplitter;
SolrIndexSplitter: partition #0 range=8000-

Now, what I'm asking is if it is possible to check the progress of this
command. Even if it is as simple as changing of files in some directory...
Or taking dump of threads, which just assures me that it is running 
progressing. 

I'd also be happy to know how does the overseer split. I also tried looking
into its code but was unable to figure out the way it does so. Also, in the
zookeeper files related to overseer I was unable to find anything related to
the split command.

Thanks,
Binit



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-check-Solr-4-6-SPLITSHARD-command-progress-tp4106520.html
Sent from the Solr - User mailing list archive at Nabble.com.

custom group sort in solr

2013-12-12 Thread Parvesh Garg

Hi,

I want to use solr/lucene's grouping feature with a some customisations
like

   - sorting the groups based on average scores instead of max scores or
   some other complex computation over scores.
   - group articles based on some computation instead of a field value.

So far it seems like I have to write some code for it. Can someone please
point me to the right direction?

   - If I have to write a plugin, which files I need to check?
   - Which part of the code currently executes the grouping feature? Does
   it happen in solr or lucene? Is it SearchHandler?

Parvesh Garg
http://www.zettata.com

Re: custom group sort in solr

2013-12-12 Thread Mukundaraman valakumaresan

Hi

You may try to write a custom function and sort your group according to the
result of the custom function.

If that might work, check out ValueSourceParser, ValueSource and its
descendant classes for a better understanding.

Thanks  Regards
Mukund


On Fri, Dec 13, 2013 at 10:54 AM, Parvesh Garg parv...@zettata.com wrote:

 Hi,

 I want to use solr/lucene's grouping feature with a some customisations
 like

- sorting the groups based on average scores instead of max scores or
some other complex computation over scores.
- group articles based on some computation instead of a field value.

 So far it seems like I have to write some code for it. Can someone please
 point me to the right direction?

- If I have to write a plugin, which files I need to check?
- Which part of the code currently executes the grouping feature? Does
it happen in solr or lucene? Is it SearchHandler?

 Parvesh Garg
 http://www.zettata.com

Re: Updating shard range in Zookeeper

2013-12-12 Thread Manuel Le Normand

Zookeeper client for eclipse is the tool you're looking for. You can edit
directly the clusterstate.
http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper

Another option can be using the delivered zkclient (distributed with solr
4.5 and above) and upload a new clusterstate with a new shard range.

Good luck

Re: Sudden Solr crush after commit

2013-12-12 Thread Manuel Le Normand

Running solr 4.3, sharded collection. Tomcat 7.0.39
Faceting on multivalue fields works perfectly fine, I was describing this
log to emphasize the fact the servlet failed right after a new searcher was
opened and the event listener finished running a warming faceting query.

46 matches

Mail list logo