Difference between currency fieldType and float fieldType

2016-12-05 Thread Zheng Lin Edwin Yeo
Hi,

Would like to understand better between the currency fieldType and float
fieldType.

If I were to index a field that is a currency field by nature (Eg: amount)
into Solr, is it better to use the currency fieldType as compared to the
float fieldType?

I found that for the float fieldType, if the amount is very big, the last
decimal place may get cut off in the index. For example, if the amount in
the original document is 800212.64, the number that is indexed in Solr is
800212.6.

Although by using the currency fieldType will solve this issue, but however
I found that I am not able to do faceting on currency fieldType. I will
need to have the facet so that I can list out the various amount that are
available based on the search criteria.

As such, will like to seek your recommendation to determine which fieldType
is best for my needs.

I'm using Solr 6.2.1

Regards,
Edwin


RE: Solr seems to reserve facet.limit results

2016-12-05 Thread Chris Hostetter


I think what you're seeing might be a result of the overrequesting done
in phase #1 of a distriuted facet query.

The purpose of overrequesting is to mitigate the possibility of a 
constraint which should be in the topN for the collection as a whole, but 
just outside the topN on every shard -- so they never make it to the 
second phase of the distributed calculation.

The amount of overrequest is, by default, a multiplicitive function of the 
user specified facet.limit with a fudge factor (IIRC: 10+(1.5*facet.limit))

If you're using an explicitly high facet.limit, you can try setting the 
overrequets ratio/count to 1.0/0 respectively to force Solr to only 
request the # of constraints you've specified from each shard, and then 
aggregate them...

https://lucene.apache.org/solr/6_3_0/solr-solrj/org/apache/solr/common/params/FacetParams.html#FACET_OVERREQUEST_RATIO
https://lucene.apache.org/solr/6_3_0/solr-solrj/org/apache/solr/common/params/FacetParams.html#FACET_OVERREQUEST_COUNT



One side note related to the work around you suggested...

: One simple solution, in my case would be, now just thinking of it, run 
: the query with no facets and no rows, get the numFound, and set that as 
: facet.limit for the actual query.

...that assumes that the number of facet constraints returned is limited 
by the total number of documents matching the query -- in general there is 
no such garuntee because of multivalued fields (or faceting on tokenized 
fields), so this type of approach isn't a good idea as a generalized 
solution



-Hoss
http://www.lucidworks.com/


Re: SOLR index help (SQL Anywhere 16, MS SQL 2014)

2016-12-05 Thread Erick Erickson
There are two basic choices, see Data Import Handler (DIH)
or roll-your-own solrJ client, see:
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler

https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Best,
Erick

On Mon, Dec 5, 2016 at 10:41 AM, VenkatBR  wrote:
> Hello SOLR Experts,
>
> I am a newbie to SOLR and Java programming.
>
> I trying to prototype SOLR at my client site. I was able to install SOLR on
> a win 2012 Virtual machine and I am stuck, not sure how to index my data
> which are in SQL Anywhere version 16 databases, MS SQL 2014 databases &  lot
> of files from network file system into SOLR.
>
> -Can you please provide some guidance on how to go about it?
> -What kind of database driver do I need to install to access these database
> -Do I need to write lot of stub java code to integrate SOLR?
>
> please advise.
>
> Thanks,
> Venkat
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SOLR-index-help-SQL-Anywhere-16-MS-SQL-2014-tp4308542.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reserved characters in password used by Jetty (Solr)

2016-12-05 Thread Chris Hostetter

: I fixed the issue by URL encoding.  Here is a slim down version of my code
: (with the fix):
...
: // Gives back: http://username:password@server:port/solr/...
: String solrUrl = "http://; + username + ":" + password + "@" +
: getSolrServerName() + ":" getSolrServerPort() + getSolrUpdatePathURI();
: 
: HttpSolrClient solrClient = new HttpSolrClient(solrUrl);

...if you're going to embedd the user+pass in the URL you pass to 
HttpSolrClient then you're absolutely going to be required to URL escape 
them yourself -- otherwise it's not a valid URL (at least, not in the way 
you want it to be)

The official way to use BasicAuth with SolrJ is to set the credientials on 
the SolrRequest object via the setBasicAuthCredentials method...

https://lucene.apache.org/solr/6_3_0/solr-solrj/org/apache/solr/client/solrj/SolrRequest.html#setBasicAuthCredentials-java.lang.String-java.lang.String-

https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin

-Hoss
http://www.lucidworks.com/


Re: Reserved characters in password used by Jetty (Solr)

2016-12-05 Thread Steven White
Thanks Shawn.

I fixed the issue by URL encoding.  Here is a slim down version of my code
(with the fix):

// Gives back: http://username:password@server:port/solr/...
public HttpSolrClient getSolrClient()
{
// the next two lines is the fix
  String username = URLEncoder.encode(getSolrUserID(), "UTF-8");
  String password = URLEncoder.encode(getSolrPasswordClearText(), "UTF-8");

// Gives back: http://username:password@server:port/solr/...
String solrUrl = "http://; + username + ":" + password + "@" +
getSolrServerName() + ":" getSolrServerPort() + getSolrUpdatePathURI();

HttpSolrClient solrClient = new HttpSolrClient(solrUrl);

solrClient.setParser(new XMLResponseParser());

return solrClient;
}

As you can see, I'm posting the username / password as part of the URL
which appears to be the root of my issue, but I cannot figure out how to
set basic authentication on HttpSolrClient any other way, do you?

A side note, the exception that was being thrown (see my original posting
on this topic) shows the URL and with the URL the username and password in
the log.  This is bad from a security perspective, Should a security defect
be open against Solr about this?

Steve



On Mon, Dec 5, 2016 at 10:45 AM, Shawn Heisey  wrote:

> On 12/5/2016 8:10 AM, Steven White wrote:
> > Hi everyone,
> >
> > I'm password protecting Solr using Jetty's realm.properties and noticed
> > that if the password has "@" Jetty throws an error and thus I cannot
> access
> > Solr:
> 
> > My question is, what are the reserved character list?  Are they listed
> > somewhere?
>
> The password is being included with the URL, so the restrictions are
> whatever's legal in a URL.  I am guessing that what is happening here is
> that the password is not being run through URI encoding.  Encoding the
> string should allow *any* character to be used, as long as it's valid
> UTF-8.
>
> As a possible workaround, you could try setting the password in SolrJ to
> the URI encoded version, which for the password you indicated would be:
>
> 81%23Mst%23Demo%4018
>
> If this works, which I think it probably will, then there's a bug.  I do
> not know whether the bug is in SolrJ or HttpClient.  One of them is not
> URI encoding the password before sending it.  It would be helpful if you
> shared your SolrJ code that sets the user/password, so we can determine
> where the bug is.
>
> I got the URI encoded version of the password by using the form at this
> URL:
>
> http://urldecode.org/
>
> Thanks,
> Shawn
>
>


Re: [ANN] InvisibleQueriesRequestHandler

2016-12-05 Thread Andrea Gazzarini
Hi Charlie,
Great to hear that! I never worked in a Drupal / Hybris -> Solr
integration. So it seems things sound moreless like the Magento scenario
That means what I did could make a sense and most important, it could be
useful for someone.

Best,
Andrea

On 5 Dec 2016 18:08, "Charlie Hull"  wrote:

> On 05/12/2016 09:18, Andrea Gazzarini wrote:
>
>> Hi guys,
>> I developed this handler [1] while doing some work on a Magento ->  Solr
>> project.
>>
>> If someone is interested (this is a post [2] where I briefly explain the
>> goal), or wants to contribute with some idea / improvement, feel free to
>> give me a shout or a feedback.
>>
>> Best,
>> Andrea
>>
>> [1] https://github.com/agazzarini/invisible-queries-request-handler
>> [2]
>> https://andreagazzarini.blogspot.it/2016/12/composing-and-
>> reusing-request-handlers.html
>>
>> We like this idea: we've seen plenty of systems where it's hard to change
> what the container system using Solr is doing (e.g. Hybris, Drupal...) so
> to be able to run multiple searches in Solr itself is very useful. Nice one!
>
> Charlie
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>
>


Re: [ANN] InvisibleQueriesRequestHandler

2016-12-05 Thread Andrea Gazzarini
Hi Erik,
interesting approach, but, please correct me if I didn't get you, this is
different because

- it requires some kind of control on the client side and for example, in
Magento you don't have that: it is not aware about group responses or
arbitrary facet queries (BTW the Magento/Solr connector officially supports
Solr 3.6.2 :( )

- it always executes all group / facet queries without any conditional /
cascading logic between them

Again please correct me if I misunderstood your approach.

Thanks for the hint
Andrea

On 5 Dec 2016 19:14, "Erik Hatcher"  wrote:

Another technique for this is to use Grouping’s `group.query` a few times,
with exact to fuzzier types of queries and get it all back in one
response.  So you _can_ run multiple searches in a single query already :)

I’ve used a similar technique with faceting and `facet.query` to give you
counts of exact to fuzzier types of queries to get the counts.

Erik



> On Dec 5, 2016, at 9:08 AM, Charlie Hull  wrote:
>
> On 05/12/2016 09:18, Andrea Gazzarini wrote:
>> Hi guys,
>> I developed this handler [1] while doing some work on a Magento ->  Solr
>> project.
>>
>> If someone is interested (this is a post [2] where I briefly explain the
>> goal), or wants to contribute with some idea / improvement, feel free to
>> give me a shout or a feedback.
>>
>> Best,
>> Andrea
>>
>> [1] https://github.com/agazzarini/invisible-queries-request-handler
>> [2]
>> https://andreagazzarini.blogspot.it/2016/12/composing-
and-reusing-request-handlers.html
>>
> We like this idea: we've seen plenty of systems where it's hard to change
what the container system using Solr is doing (e.g. Hybris, Drupal...) so
to be able to run multiple searches in Solr itself is very useful. Nice one!
>
> Charlie
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>


SOLR index help (SQL Anywhere 16, MS SQL 2014)

2016-12-05 Thread VenkatBR
Hello SOLR Experts,

I am a newbie to SOLR and Java programming. 

I trying to prototype SOLR at my client site. I was able to install SOLR on
a win 2012 Virtual machine and I am stuck, not sure how to index my data
which are in SQL Anywhere version 16 databases, MS SQL 2014 databases &  lot
of files from network file system into SOLR. 

-Can you please provide some guidance on how to go about it? 
-What kind of database driver do I need to install to access these database
-Do I need to write lot of stub java code to integrate SOLR?

please advise.

Thanks,
Venkat





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-index-help-SQL-Anywhere-16-MS-SQL-2014-tp4308542.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [ANN] InvisibleQueriesRequestHandler

2016-12-05 Thread Erik Hatcher
Another technique for this is to use Grouping’s `group.query` a few times, with 
exact to fuzzier types of queries and get it all back in one response.  So you 
_can_ run multiple searches in a single query already :)

I’ve used a similar technique with faceting and `facet.query` to give you 
counts of exact to fuzzier types of queries to get the counts.

Erik



> On Dec 5, 2016, at 9:08 AM, Charlie Hull  wrote:
> 
> On 05/12/2016 09:18, Andrea Gazzarini wrote:
>> Hi guys,
>> I developed this handler [1] while doing some work on a Magento ->  Solr
>> project.
>> 
>> If someone is interested (this is a post [2] where I briefly explain the
>> goal), or wants to contribute with some idea / improvement, feel free to
>> give me a shout or a feedback.
>> 
>> Best,
>> Andrea
>> 
>> [1] https://github.com/agazzarini/invisible-queries-request-handler
>> [2]
>> https://andreagazzarini.blogspot.it/2016/12/composing-and-reusing-request-handlers.html
>> 
> We like this idea: we've seen plenty of systems where it's hard to change 
> what the container system using Solr is doing (e.g. Hybris, Drupal...) so to 
> be able to run multiple searches in Solr itself is very useful. Nice one!
> 
> Charlie
> 
> 
> -- 
> Charlie Hull
> Flax - Open Source Enterprise Search
> 
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
> 



Re: [ANN] InvisibleQueriesRequestHandler

2016-12-05 Thread Walter Underwood
We used to run that way, with an exact search first, then a broad search if 
there were no results.

It has an interesting failure mode. If the user misspells a word (about 10% of
queries do), and the misspelling matches a misspelled document, then you
are stuck. It will never show the correctly-spelled document.

For the very popular book Campbell Biology, if you searched for “cambell”,
it would show a book with Greek plays and one misspelled author. Oops.

We integrated fuzzy search into edismax. With that, we get the popular
book for misspelled queries.

You can find that patch in SOLR-629. I first implemented it for Solr 1.3, and
I’ve been updating it for years. Very useful, especially with the fast fuzzy
introduced in 4.x.

https://issues.apache.org/jira/browse/SOLR-629 


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 5, 2016, at 9:08 AM, Charlie Hull  wrote:
> 
> On 05/12/2016 09:18, Andrea Gazzarini wrote:
>> Hi guys,
>> I developed this handler [1] while doing some work on a Magento ->  Solr
>> project.
>> 
>> If someone is interested (this is a post [2] where I briefly explain the
>> goal), or wants to contribute with some idea / improvement, feel free to
>> give me a shout or a feedback.
>> 
>> Best,
>> Andrea
>> 
>> [1] https://github.com/agazzarini/invisible-queries-request-handler
>> [2]
>> https://andreagazzarini.blogspot.it/2016/12/composing-and-reusing-request-handlers.html
>> 
> We like this idea: we've seen plenty of systems where it's hard to change 
> what the container system using Solr is doing (e.g. Hybris, Drupal...) so to 
> be able to run multiple searches in Solr itself is very useful. Nice one!
> 
> Charlie
> 
> 
> -- 
> Charlie Hull
> Flax - Open Source Enterprise Search
> 
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
> 



Re: [ANN] InvisibleQueriesRequestHandler

2016-12-05 Thread Charlie Hull

On 05/12/2016 09:18, Andrea Gazzarini wrote:

Hi guys,
I developed this handler [1] while doing some work on a Magento ->  Solr
project.

If someone is interested (this is a post [2] where I briefly explain the
goal), or wants to contribute with some idea / improvement, feel free to
give me a shout or a feedback.

Best,
Andrea

[1] https://github.com/agazzarini/invisible-queries-request-handler
[2]
https://andreagazzarini.blogspot.it/2016/12/composing-and-reusing-request-handlers.html

We like this idea: we've seen plenty of systems where it's hard to 
change what the container system using Solr is doing (e.g. Hybris, 
Drupal...) so to be able to run multiple searches in Solr itself is very 
useful. Nice one!


Charlie


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk



Re: Reserved characters in password used by Jetty (Solr)

2016-12-05 Thread Shawn Heisey
On 12/5/2016 8:10 AM, Steven White wrote:
> Hi everyone,
>
> I'm password protecting Solr using Jetty's realm.properties and noticed
> that if the password has "@" Jetty throws an error and thus I cannot access
> Solr:

> My question is, what are the reserved character list?  Are they listed
> somewhere?

The password is being included with the URL, so the restrictions are
whatever's legal in a URL.  I am guessing that what is happening here is
that the password is not being run through URI encoding.  Encoding the
string should allow *any* character to be used, as long as it's valid UTF-8.

As a possible workaround, you could try setting the password in SolrJ to
the URI encoded version, which for the password you indicated would be:

81%23Mst%23Demo%4018

If this works, which I think it probably will, then there's a bug.  I do
not know whether the bug is in SolrJ or HttpClient.  One of them is not
URI encoding the password before sending it.  It would be helpful if you
shared your SolrJ code that sets the user/password, so we can determine
where the bug is.

I got the URI encoded version of the password by using the form at this URL:

http://urldecode.org/

Thanks,
Shawn



Re: Queries regarding solr cache

2016-12-05 Thread Shawn Heisey
On 12/5/2016 6:44 AM, kshitij tyagi wrote:
>   - lookups:381
>   - hits:24
>   - hitratio:0.06
>   - inserts:363
>   - evictions:0
>   - size:345
>   - warmupTime:2932
>   - cumulative_lookups:294948
>   - cumulative_hits:15840
>   - cumulative_hitratio:0.05
>   - cumulative_inserts:277963
>   - cumulative_evictions:70078
>
>   How can I increase my hit ratio? I am not able to understand solr
>   caching mechanism clearly. Please help.

This means that out of the nearly 30 queries executed by that
handler, only five percent (15000) of them were found in the cache.  The
rest of them were not found in the cache at the moment they were made. 
Since these numbers come from the queryResultCache, this refers to the
"q" parameter.  The filterCache handles things in the fq parameter.  The
documentCache holds actual documents from your index and fills in stored
data in results so the document doesn't have to be fetched from the index.

Possible reasons:  1) Your users are rarely entering the same query more
than once.  2) Your client code is adding something unique to every
query (q parameter) so very few of them are the same.  3) You are
committing so frequently that the cache never has a chance to get large
enough to make a difference.

Here are some queryResultCache stats from one of my indexes:

class:org.apache.solr.search.FastLRUCache
version:1.0
description:Concurrent LRU Cache(maxSize=512, initialSize=512,
minSize=460, acceptableSize=486, cleanupThread=true,
autowarmCount=8,
regenerator=org.apache.solr.search.SolrIndexSearcher$3@1d172ac0)
src:$URL:
https:/​/​svn.apache.org/​repos/​asf/​lucene/​dev/​branches/​lucene_solr_4_7/​solr/​core/​src/​java/​org/​apache/​solr/​search/​FastLRUCache.java
lookups:   3496
hits:  3145
hitratio:  0.9
inserts:   335
evictions: 0
size:  338
warmupTime: 2209
cumulative_lookups:   12394606
cumulative_hits:  11247114
cumulative_hitratio:  0.91
cumulative_inserts:   1110375
cumulative_evictions: 409887

These numbers indicate that 91 percent of the queries made to this
handler were served from the cache.

Thanks,
Shawn



Reserved characters in password used by Jetty (Solr)

2016-12-05 Thread Steven White
Hi everyone,

I'm password protecting Solr using Jetty's realm.properties and noticed
that if the password has "@" Jetty throws an error and thus I cannot access
Solr:

java.lang.IllegalArgumentException: Illegal character in fragment at index
31:
SolrAdminUser:81#Mst#Demo@18
@localhost:8983/solr/demo/update?wt=xml=2.2
at java.net.URI.create(URI.java:871)
at org.apache.http.client.methods.HttpPost.(HttpPost.java:76)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.createMethod(HttpSolrClient.java:414)

The password is "81#Mst#Demo@18"

My question is, what are the reserved character list?  Are they listed
somewhere?

PS: I'm going to cross post this question on the Jetty mailing list.

Thanks in advanced,

Steve


Load solr libraries from hdfs

2016-12-05 Thread Gintautas Sulskus
Hi,

Is it possible to add libraries to solr classpath from hdfs?
E.g.  ?
I have some custom libraries that now have to be maintained across multiple
servers. Would be great to be able to store them in a single location.

Best,
Gin


Re: Queries regarding solr cache

2016-12-05 Thread kshitij tyagi
Hi Shawn,

Thanks for the reply:

here are the details for query result cache(i am not using NOW in my
queries and most of the queries are common):


   - class:org.apache.solr.search.LRUCache
   - version:1.0
   - description:LRU Cache(maxSize=1000, initialSize=1000,
   autowarmCount=10,
   regenerator=org.apache.solr.search.SolrIndexSearcher$3@73380510)
   - src:null
   - stats:
  - lookups:381
  - hits:24
  - hitratio:0.06
  - inserts:363
  - evictions:0
  - size:345
  - warmupTime:2932
  - cumulative_lookups:294948
  - cumulative_hits:15840
  - cumulative_hitratio:0.05
  - cumulative_inserts:277963
  - cumulative_evictions:70078

  How can I increase my hit ratio? I am not able to understand solr
  caching mechanism clearly. Please help.



On Thu, Dec 1, 2016 at 8:19 PM, Shawn Heisey  wrote:

> On 12/1/2016 4:04 AM, kshitij tyagi wrote:
> > I am using Solr and serving huge number of requests in my application.
> >
> > I need to know how can I utilize caching in Solr.
> >
> > As of now in  then clicking Core Selector → [core name] → Plugins /
> Stats.
> >
> > I am seeing my hit ration as 0 for all the caches. What does this mean
> and
> > how this can be optimized.
>
> If your hitratio is zero, then none of the queries related to that cache
> are finding matches.  This means that your client systems are never
> sending the same query twice.
>
> One possible reason for a zero hitratio is using "NOW" in date queries
> -- NOW changes every millisecond, and the actual timestamp value is what
> ends up in the cache.  This means that the same query with NOW executed
> more than once will actually be different from the cache's perspective.
> The solution is date rounding -- using things like NOW/HOUR or NOW/DAY.
> You could use NOW/MINUTE, but the window for caching would be quite small.
>
> 5000 entries for your filterCache is almost certainly too big.  Each
> filterCache entry tends to be quite large.  If the core has ten million
> documents in it, then each filterCache entry would be 1.25 million bytes
> in size -- the entry is a bitset of all documents in the core.  This
> includes deleted docs that have not yet been reclaimed by merging.  If a
> filterCache for an index that size (which is not all that big) were to
> actually fill up with 5000 entries, it would require over six gigabytes
> of memory just for the cache.
>
> The 1000 that you have on queryResultCache is also rather large, but
> probably not a problem.  There's also documentCache, which generally is
> OK to have sized at several thousand -- I have 16384 on mine.  If your
> documents are particularly large, then you probably would want to have a
> smaller number.
>
> It's good that your autowarmCount values are low.  High values here tend
> to make commits take a very long time.
>
> You do not need to send your message more than once.  The first repeat
> was after less than 40 minutes.  The second was after about two hours.
> Waiting a day or two for a response, particularly for a difficult
> problem, is not unusual for a mailing list.  I begain this reply as soon
> as I saw your message -- about 7:30 AM in my timezone.
>
> Thanks,
> Shawn
>
>


Re: Using DIH FileListEntityProcessor with SolrCloud

2016-12-05 Thread Erik Hatcher
Try the absolute path on your -Solr- server.   That's where DIH runs.  

   Erik

> On Dec 2, 2016, at 08:36, Chris Rogers  wrote:
> 
> Hi all,
> 
> A question regarding using the DIH FileListEntityProcessor with SolrCloud 
> (solr 6.3.0, zookeeper 3.4.8).
> 
> I get that the config in SolrCloud lives on the Zookeeper node (a different 
> server from the solr nodes in my setup).
> 
> With this in mind, where is the baseDir attribute in the 
> FileListEntityProcessor config relative to? I’m seeing the config in the Solr 
> GUI, and I’ve tried setting it as an absolute path on my Zookeeper server, 
> but this doesn’t seem to work… any ideas how this should be setup?
> 
> My DIH config is below:
> 
> 
>  
>  
>
>fileName=".*xml"
>newerThan="'NOW-5YEARS'"
>recursive="true"
>rootEntity="false"
>dataSource="null"
>baseDir="/home/bodl-zoo-svc/files/">
> 
>  
> 
>forEach="/TEI" url="${f.fileAbsolutePath}" 
> transformer="RegexTransformer" >
> xpath="/TEI/teiHeader/fileDesc/titleStmt/title"/>
> xpath="/TEI/teiHeader/fileDesc/publicationStmt/publisher"/>
> xpath="/TEI/teiHeader/fileDesc/sourceDesc/msDesc/msIdentifier/altIdentifier/idno"/>
>  
> 
>
> 
>  
> 
> 
> 
> This same script worked as expected on a single solr node (i.e. not in 
> SolrCloud mode).
> 
> Thanks,
> Chris
> 
> --
> Chris Rogers
> Digital Projects Manager
> Bodleian Digital Library Systems and Services
> chris.rog...@bodleian.ox.ac.uk


Re: Using DIH FileListEntityProcessor with SolrCloud

2016-12-05 Thread Felipe Vinturini
Hi *Chris*,

I've never used the DIH, but maybe the "*fileName*" pattern is wrong?
 fileName="*.*xml*"

Should be:
 fileName="**.xml*"

Regards,
*Felipe*.


On Mon, Dec 5, 2016 at 9:43 AM, Chris Rogers  wrote:

> Hi all,
>
> Just bumping my question again, as doesn’t seem to have been picked up by
> anyone. Any help would be much appreciated.
>
> Chris
>
> On 02/12/2016, 16:36, "Chris Rogers" 
> wrote:
>
> Hi all,
>
> A question regarding using the DIH FileListEntityProcessor with
> SolrCloud (solr 6.3.0, zookeeper 3.4.8).
>
> I get that the config in SolrCloud lives on the Zookeeper node (a
> different server from the solr nodes in my setup).
>
> With this in mind, where is the baseDir attribute in the
> FileListEntityProcessor config relative to? I’m seeing the config in the
> Solr GUI, and I’ve tried setting it as an absolute path on my Zookeeper
> server, but this doesn’t seem to work… any ideas how this should be setup?
>
> My DIH config is below:
>
> 
>   
>   
> 
>  fileName=".*xml"
> newerThan="'NOW-5YEARS'"
> recursive="true"
> rootEntity="false"
> dataSource="null"
> baseDir="/home/bodl-zoo-svc/files/">
>
>   
>
>  forEach="/TEI" url="${f.fileAbsolutePath}"
> transformer="RegexTransformer" >
>  xpath="/TEI/teiHeader/fileDesc/titleStmt/title"/>
>  xpath="/TEI/teiHeader/fileDesc/publicationStmt/publisher"/>
> 
>   
>
> 
>
>   
> 
>
>
> This same script worked as expected on a single solr node (i.e. not in
> SolrCloud mode).
>
> Thanks,
> Chris
>
> --
> Chris Rogers
> Digital Projects Manager
> Bodleian Digital Library Systems and Services
> chris.rog...@bodleian.ox.ac.uk
>
>
>


Re: Using DIH FileListEntityProcessor with SolrCloud

2016-12-05 Thread Chris Rogers
Hi all,

Just bumping my question again, as doesn’t seem to have been picked up by 
anyone. Any help would be much appreciated.

Chris

On 02/12/2016, 16:36, "Chris Rogers"  wrote:

Hi all,

A question regarding using the DIH FileListEntityProcessor with SolrCloud 
(solr 6.3.0, zookeeper 3.4.8).

I get that the config in SolrCloud lives on the Zookeeper node (a different 
server from the solr nodes in my setup).

With this in mind, where is the baseDir attribute in the 
FileListEntityProcessor config relative to? I’m seeing the config in the Solr 
GUI, and I’ve tried setting it as an absolute path on my Zookeeper server, but 
this doesn’t seem to work… any ideas how this should be setup?

My DIH config is below:


  
  



  

  



  



  



This same script worked as expected on a single solr node (i.e. not in 
SolrCloud mode).

Thanks,
Chris

--
Chris Rogers
Digital Projects Manager
Bodleian Digital Library Systems and Services
chris.rog...@bodleian.ox.ac.uk




Re: Solr custom document routing

2016-12-05 Thread SOLR4189
First of all, yes, you are right, we're trying to optimize quering, but not
"just". In our company we arrived to the limit of resources that we can set
to our servers (CPU and RAM). We need return to our example, fieldX=true is
all the documents that are indexed in the last week (like "news", it may be
first_indexed_time:[NOW/DAY-7DAY TO *]), and fieldX=false is for all the
documents that were first inserted to the system before the last 7 days (it
may be first_indexed_time:[* TO NOW/DAY-7DAY]. We also think about two
collections (first for "news" and second for "old" items), but we have
tf/idf problem between two collections ("news" collection is very small
relative to "old" collection) since we are using solr 4 and there is no
distributed IDF.

Second of all, we have already measured the perfomance. We did a naive
experiment: created two collections: one is a small collection (all the new
documents) and one is a big collection (the other documents). Also we have
created alias that units the two collections. We saw that this architecture
improved perfomance by 30% (query time and throughput) in compare to the
case when we used only one collection.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-document-routing-tp4308432p4308481.html
Sent from the Solr - User mailing list archive at Nabble.com.


[ANN] InvisibleQueriesRequestHandler

2016-12-05 Thread Andrea Gazzarini
Hi guys,
I developed this handler [1] while doing some work on a Magento ->  Solr
project.

If someone is interested (this is a post [2] where I briefly explain the
goal), or wants to contribute with some idea / improvement, feel free to
give me a shout or a feedback.

Best,
Andrea

[1] https://github.com/agazzarini/invisible-queries-request-handler
[2]
https://andreagazzarini.blogspot.it/2016/12/composing-and-reusing-request-handlers.html


Re: Solr seems to reserve facet.limit results

2016-12-05 Thread Toke Eskildsen
On Fri, 2016-12-02 at 12:17 +, Markus Jelsma wrote:
> I have not considered streaming as i am still completely unfamiliar
> with it and i don't yet know what problems it can solve.

Standard faceting requires all nodes to produce their version of the
full result and send it as one chunk, which is then merged at the
calling node (+ other stuff). For large results that comes with a
significant memory overhead.

Solr streaming is ... well, streaming: With practically the same memory
overhead if you request 10K or 10 billion entries.

> One simple solution, in my case would be, now just thinking of it,
> run the query with no facets and no rows, get the numFound, and set
> that as facet.limit for the actual query.

That would work with your solution. Still, try issuing a "*:*"-search
and see if it breaks your very large facet request.

> Are there any examples / articles about consuming streaming facets
> with SolrJ? 

Sorry, I have little experience with SolrJ.

- Toke Eskildsen, State and University Library, Denmark