Indexing gets significantly slower after every batch commit

2015-05-21 Thread Angel Todorov
hi guys,

I'm crawling a file system folder and indexing 10 million docs, and I am
adding them in batches of 5000, committing every 50 000 docs. The problem I
am facing is that after each commit, the documents per sec that are indexed
gets less and less.

If I do not commit at all, I can index those docs very quickly, and then I
commit once at the end, but once i start indexing docs _after_ that (for
example new files get added to the folder), indexing is also slowing down a
lot.

Is it normal that the SOLR indexing speed depends on the number of
documents that are _already_ indexed? I think it shouldn't matter if i
start from scratch or I index a document in a core that already has a
couple of million docs. Looks like SOLR is either doing something in a
linear fashion, or there is some magic config parameter that I am not aware
of.

I've read all perf docs, and I've tried changing mergeFactor,
autowarmCounts, and the buffer sizes - to no avail.

I am using SOLR 5.1

Thanks !
Angel


Re: Indexing gets significantly slower after every batch commit

2015-05-21 Thread Shawn Heisey
On 5/21/2015 2:07 AM, Angel Todorov wrote:
 I'm crawling a file system folder and indexing 10 million docs, and I am
 adding them in batches of 5000, committing every 50 000 docs. The problem I
 am facing is that after each commit, the documents per sec that are indexed
 gets less and less.
 
 If I do not commit at all, I can index those docs very quickly, and then I
 commit once at the end, but once i start indexing docs _after_ that (for
 example new files get added to the folder), indexing is also slowing down a
 lot.
 
 Is it normal that the SOLR indexing speed depends on the number of
 documents that are _already_ indexed? I think it shouldn't matter if i
 start from scratch or I index a document in a core that already has a
 couple of million docs. Looks like SOLR is either doing something in a
 linear fashion, or there is some magic config parameter that I am not aware
 of.
 
 I've read all perf docs, and I've tried changing mergeFactor,
 autowarmCounts, and the buffer sizes - to no avail.
 
 I am using SOLR 5.1

Have you changed the heap size?  If you use the bin/solr script to start
it and don't change the heap size with the -m option or another method,
Solr 5.1 runs with a default size of 512MB, which is *very* small.

I bet you are running into problems with frequent and then ultimately
constant garbage collection, as Java attempts to free up enough memory
to allow the program to continue running.  If that is what is happening,
then eventually you will see an OutOfMemoryError exception.  The
solution is to increase the heap size.  I would probably start with at
least 4G for 10 million docs.

Thanks,
Shawn



Search for numbers

2015-05-21 Thread Holger Rieß
Hi,

I try to search numbers with a certain deviation. My parser is ExtendedDisMax.
A possible search expression could be 'twist drill 1.23 mm'. It will not match 
any documents, because the document contains the keywords 'twist drill', '1.2' 
and 'mm'.

In order to reach my goal, I've indexed all numbers as points with the 
solr.SpatialRecursivePrefixTreeFieldType.
For example '1.2' as field name=feature_nr1.2 0.0/field.
A search with 'drill mm' and a filter query 'fq={!geofilt pt=0,1.23 
sfield=feature_nr d=5}' delivers the expected results.

Now I have two problems:
1. How can I get ExtendedDisMax, to 'replace' the value 1.2 with the 
'{!geofilt}' function?
  My first attemts were

- Build a field type in schema.xml and replace the field content with a regular 
expression
'... replacement=_query_:quot;{!geofilt pt=0,$1 sfield=feature_nr 
d=5}quot;'.
The idea was to use a nested query. But edismax searches 
'feature_nr:_query_:{!geofilt pt=0,$1 sfield=feature_nr d=5}'.
No documents are found.

- Program a new parser that analyzes the query terms, finds all numbers and 
does the geospatial stuff. Added this parser in the 'appends' section of the 
'requestHandler' definition. But I can get this parser only to filter my 
results, not to extend them.

2. I want to calculate the distance (d) of the '{!geofilt}' function relative 
to the value, for example 1%.

Could there be a simple solution? 

Thank you in advance.
Holger


Re: Need help with Nested docs situation

2015-05-21 Thread Alessandro Benedetti
This scenario is a perfect fit to play with Solr Joins [1] .

As you observed, you would prefer to go with a query time join.
THis kind of join can be done inter-collection .
You can have you deal collection  and product collection .
Every product will have one field dealId to match all the parent deals.
When you add,remove,update a new deal, you have to update in the product
index all the related products.

Then you can query over the products and get related parent deals in the
response.
Can you give me a little bit more details about your expected use case ?
Example of queries and a better explanation of the product previews ?

Cheers

[1] https://www.youtube.com/watch?v=-OiIlIijWH0feature=youtu.be ,
http://blog.griddynamics.com/2013/09/solr-block-join-support.html

2015-05-20 18:56 GMT+01:00 Mikhail Khludnev mkhlud...@griddynamics.com:

 data scale and request rate can judge between block, plain joins and field
 collapsing.

 On Thu, Apr 30, 2015 at 1:07 PM, roySolr royrutten1...@gmail.com wrote:

  Hello,
 
  I have a situation and i'm a little bit stuck on the way how to fix it.
  For example the following data structure:
 
  *Deal*
  All Coca Cola 20% off
 
  *Products*
  Coca Cola light
  Coca Cola Zero 1L
  Coca Cola Zero 20CL
  Coca Cola 1L
 
  When somebody search to Cola discount i want the result of the deal
 with
  related products.
 
  Solution #1:
  I could index it with nested docs(solr 4.9). But the problem is when a
  product has some changes(let's say Zero gets a new name Extra Light)
 i
  have to re-index every deal with these products.
 
  Solution #2:
  I could make 2 collections, one with deals and one with products. A
 Product
  will get a parentid(dealid). Then i have to do 2 queries to get the
  information? When i have a resultpage with 10 deals i want to preview the
  first 2 products. That means a lot of queries but it's doesn't have the
  update problem from solution #1.
 
  Does anyone have a good solution for this?
 
  Thanks, any help is appreciated.
  Roy
 
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Need-help-with-Nested-docs-situation-tp4203190.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Solr suggester

2015-05-21 Thread Erick Erickson
right. File-based suggestions should be much faster to build, but it's
certainly the case with large indexes that you have to build it
periodically so they won't be completely up to date.

However, this stuff is way cool. AnalyzingInfixSuggester, for
instance, suggests entire fields rather than isolated words, returning
the original case, punctuation etc.

The index-based spellcheck/suggest just reads terms from the indexed
fields which takes no time to build but suffers from reading _indexed_
terms, i.e. terms that have gone through the analysis process that may
have been stemmed, lowercased, all that.

On Thu, May 21, 2015 at 9:03 AM, jon kerling
jonkerl...@yahoo.com.invalid wrote:
 Hi Erick,
 I have read your blog and it is really helpful.I'm thinking about upgrading 
 to Solr 5.1 but it won't solve all my problems with this issue, as you said 
 each build will have to read all docs, and analyze it's fields. The only 
 advantage is that I can skip default suggest.build on start up.
 Thank you for your reply.
 Jon Kerling.



  On Thursday, May 21, 2015 6:38 PM, Erick Erickson 
 erickerick...@gmail.com wrote:


  Frankly, the suggester is rather broken in Solr 4.x with large
 indexes. Building the suggester index (or FST) requires that _all_ the
 docs get read, the stored fields analyzed and added to the suggester.
 Unfortunately, this happens _every_ time you start Solr and can take
 many minutes whether or not you have buildOnStartup set to false, see:
 https://issues.apache.org/jira/browse/SOLR-6845.

 See: http://lucidworks.com/blog/solr-suggester/

 See inline.

 On Thu, May 21, 2015 at 6:12 AM, jon kerling
 jonkerl...@yahoo.com.invalid wrote:
 Hi,

 I'm using solr 4.10 and I'm trying to add autosuggest ability to my 
 application.
 I'm currently using this kind of configuration:

  searchComponent name=suggest class=solr.SuggestComponent
lst name=suggester
  str name=namemySuggester/str
  str name=lookupImplFuzzyLookupFactory/str
  str name=storeDirsuggester_fuzzy_dir/str
  str name=dictionaryImplDocumentDictionaryFactory/str
  str name=fieldfield2/str
  str name=weightFieldweightField/str
  str name=suggestAnalyzerFieldTypetext_general/str
/lst
 /searchComponent

  requestHandler name=/suggest class=solr.SearchHandler startup=lazy
lst name=defaults
  str name=suggesttrue/str
  str name=suggest.count10/str
  str name=suggest.dictionarymySuggester/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler

 I wanted to know how the suggester Index/file is being rebuilt.
 Is it suppose to have all the terms of the desired field in the suggester?
 Yes.
 if not, is it related to this kind of lookup implementation?
 if I'll use other lookup implementation which suggest also infix terms of 
 fields,
 doesn't it has to hold all terms of the field?
 Yes.

 When i call suggest.build, does it build from scratch the suggester 
 Index/file,
 or is it just doing something like sort of delta indexing suggestions?
 Builds from scratch

 Thank You,
 Jon





Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm relying on an autocommit of 60 secs.

I just ran the same test via my SolrJ client and result was the same,
SolrCloud query always returns correct number of fields.  

Is there a way to find out which shard and replica a particular document
lives on?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206908.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reindex of document leaves old fields behind

2015-05-21 Thread Erick Erickson
My guess is that you're not committing from your SolrJ program. That's
automatic when you post.

Best,
Erick

On Thu, May 21, 2015 at 10:13 AM, tuxedomoon dancolem...@yahoo.com wrote:
 OK it is composite

 I've just used post.sh to index a test doc with 3 fields to leader 1 of my
 SolrCloud.  I then reindexed it with 1 field removed and the query on it
 shows 2 fields.   I repeated this a few times and always get the correct
 field count from Solr.

 I'm now wondering if SolrJ is somehow involved in performing an atomic
 update rather than replacement. I will  try the above test via SolrJ.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206886.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Solr suggester

2015-05-21 Thread jon kerling
Hi,

I'm using solr 4.10 and I'm trying to add autosuggest ability to my application.
I'm currently using this kind of configuration:

 searchComponent name=suggest class=solr.SuggestComponent
lst name=suggester
  str name=namemySuggester/str
  str name=lookupImplFuzzyLookupFactory/str  
  str name=storeDirsuggester_fuzzy_dir/str
  str name=dictionaryImplDocumentDictionaryFactory/str 
  str name=fieldfield2/str
  str name=weightFieldweightField/str
  str name=suggestAnalyzerFieldTypetext_general/str
/lst
/searchComponent

  requestHandler name=/suggest class=solr.SearchHandler startup=lazy
lst name=defaults
  str name=suggesttrue/str
  str name=suggest.count10/str
  str name=suggest.dictionarymySuggester/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler 

I wanted to know how the suggester Index/file is being rebuilt.
Is it suppose to have all the terms of the desired field in the suggester?
if not, is it related to this kind of lookup implementation?
if I'll use other lookup implementation which suggest also infix terms of 
fields,
doesn't it has to hold all terms of the field?

When i call suggest.build, does it build from scratch the suggester Index/file,
or is it just doing something like sort of delta indexing suggestions? 
 
Thank You,
Jon


Logic on Term Frequency Calculation : Bug or Functionality

2015-05-21 Thread ariya bala
Hi,

I am puzzled on the Term Frequency Behaviour of the DefaultSimilarity
implementation
I have suppressed the IDF by setting to 1.
TF-IDF would inturn reflect the same value as in Term Frequency

Below are the inferences:
Red coloured are expected to give a hit count(Term Frequency) of 2 but was
one.
*Is it bug or is it how the behaviour is?*

Search Query: AAA BBB
Parsed Query: PhraseQuery(Contents:\aaa bbb\~5000)

DocumentContentSlopTFslop0TFslop2TF1AAA BBB-101212BBB AAA-10-213AAA AAA BBB-
101214AAA BBB AAA-201225BBB AAA AAA-10-216AAA BBB BBB-101217BBB AAA BBB-1012
18BBB BBB AAA-10-21

*Am I missing something?!*


Cheers
*Ariya *


Re: [solr 5.1] Looking for full text + collation search field

2015-05-21 Thread Björn Keil
Thanks for the advice. I have tried the field type and it seems to do what it 
is supposed to in combination with a lower case filter.

However, that raises another slight problem:

German umlauts are supposed to be treated slightly different for the purpose of 
searching than for sorting. For sorting a normal ICUCollationField with 
standard rules should suffice*, for the purpose of searching I cannot just 
replace an ü with a u, ü is supposed to equal ue, or, in terms of 
RuleBasedCollators, there is a secondary difference.

The rules for the collator include:

 ue , ü
 ae , ä
 oe , ö
 ss , ß

(again, that applies to searching *only*, for the sorting the rule  a , ä 
would apply, which is implied in the default rules.)

I can of course program a filter that does these rudimentary replacements 
myself, at best after the lower case filter but before the ASCIIFoldingFilter, 
I am just wondering if there isn't some way to use collations keys for full 
text search.




* even though Latin script and specifically German is my primary concern, I 
want some rudimentary support for all European languages, including ones that 
use Cyrillic and Greek script, special symbols in Icelandic that are not 
strictly Latin and ligatures like Æ, which collation keys could easily 
provide.





Ahmet Arslan iori...@yahoo.com.INVALID schrieb am 22:10 Mittwoch, 20.Mai 2015:
Hi Bjorn,

solr.ICUCollationField is useful for *sorting*, and you cannot sort on 
tokenized fields.

Your example looks like diacritics insensitive search. 
Please see : ASCIIFoldingFilterFactory

Ahmet



On Wednesday, May 20, 2015 2:53 PM, Björn Keil deeph...@web.de wrote:
Hello,

might anyone suggest a field type with which I may do both a full text
search (i.e. there is an analyzer including a tokenizer) and apply a
collation?

An example for what I want to do:
There is a field composer for which I passed the value Dvořák, Antonín.

I want the following queries to match:
composer:(antonín dvořák)
composer:dvorak
composer:dvorak, antonin

the latter case is possible using a solr.ICUCollationField, but that
type does not support an Analyzer and consequently no tokenizer, thus,
it is not helpful.

Unlike former versions of solr there do not seem to be
CollationKeyFilters which you may hang into the analyzer of a
solr.TextField... so I am a bit at a loss how I get *both* a tokenizer
and a collation at the same time.

Thanks for help,
Björn


Index optimize runs in background.

2015-05-21 Thread Modassar Ather
Hi,

I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits after the
invocation of optimize and the optimization keeps on running in the
background.
Kindly let me know if it is per design and how can I make my indexer to
wait until the optimization is over. Is there a configuration/parameter I
need to set for the same.

Please note that the same indexer with cloudSolrServer.optimize(true, true,
1) on Solr-4.10 used to wait till the optimize was over before exiting.

Thanks,
Modassar


Re: solr 5.x on glassfish/tomcat instead of jetty

2015-05-21 Thread Steven White
Hi TK,

Can you share the thread you found on this WAR topic?

Thanks,

Steve

On Wed, May 20, 2015 at 8:58 PM, TK Solr tksol...@sonic.net wrote:

 Never mind. I found that thread. Sorry for the noise.


 On 5/20/15, 5:56 PM, TK Solr wrote:

 On 5/20/15, 8:21 AM, Shawn Heisey wrote:

 As of right now, there is still a .war file. Look in the server/webapps
 directory for the .war, server/lib/ext for logging jars, and
 server/resources for the logging configuration. Consult your container's
 documentation to learn where to place these things. At some point in the
 future, such deployments will no longer be possible,

 While we are still at this subject, I have been aware there has been an
 anti-WAR movement in the tech but I don't quite understand where this
 movement is coming from.  Can someone point me to some website summarizing
 why WARs are bad?

 Thanks.





Is it possible to do term Search for the filtered result set

2015-05-21 Thread Danesh Kuruppu
Hi all,

Is it possible to do term search for the filtered result set. we can do
term search for all documents. Can we do the term search only for the
specified filtered result set.

Lets says we have,

Doc1 -- type: A
 tags: T1 T2

Doc2 -- type: A
 tags: T1 T3

Doc3 -- type: B
 tags: T1 T4 T5

Can we do term search for tags only in type:A documents, So that it gives
the results as
T1 - 02
T2 - 01
T3 - 01

Is this possible? If so can you please share documents on this.
Thanks
Danesh


Price Range Faceting Based on Date Constraints

2015-05-21 Thread alexw
Hi,

I have an unique requirement to facet on product prices based on date
constraints, for which I have been thinking for a solution for a couple of
days now, but to no avail. The details are as follows:

1. Each product can have multiple prices, each price has a start-date and an
end-date.
2. At search time, we need to facet on price ranges ($0 - $5, $5-$20,
$20-$50...)
3. When faceting, a date is first determined. It can be either the current
system date or a future date (call it date X)
4. For each product, the price to be used for faceting has to meet the
following condition: start-date  date X, and date X  end-date, in other
words, date X has to fall within start-date and end-date.
5. My Solr version: 3.5

Hopefully I explained the requirement clearly. I have tried single price
field with multivalue and each price value has startdate and enddate
appended. I also tried one field per price with the field name containing
both startdate and enddate. Neither approach seems to work. Can someone
please shed some light as to how the index should be designed and what the
facet query should look like?

Thanks in advance for your help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Price-Range-Faceting-Based-on-Date-Constraints-tp4206817.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Leader Election

2015-05-21 Thread Ramkumar R. Aiyengar
This shouldn't happen, but if it does, there's no good way currently for
Solr to automatically fix it. There are a couple of issues being worked on
to do that currently. But till then, your best bet is to restart the node
which you expect to be the leader (you can look at ZK to see who is at the
head of the queue it maintains). If you can't figure that out, safest is to
just stop/start all nodes in sequence, and if that doesn't work, stop all
nodes and start them back one after the other.
On 21 May 2015 00:24, Ryan Steele ryan.ste...@pgi.com wrote:

 My SolrCloud cluster isn't reassigning the collections leaders from downed
 cores--the downed cores are still listed as the leaders. The cluster has
 been in the state for a few hours and the logs continue to report No
 registered leader was found after waiting for 4000ms. Is there a way to
 force it to reassign the leader?

 I'm running SolrCloud 5.0.
 I have 7 Solr nodes, 3 Zookeeper nodes, and 3739 collections.

 Thanks,
 Ryan

 ---
 This email has been scanned for email related threats and delivered safely
 by Mimecast.
 For more information please visit http://www.mimecast.com

 ---



Re: Confused about whether Real-time Gets must be sent to leader?

2015-05-21 Thread Yonik Seeley
On Thu, May 21, 2015 at 3:15 PM, Timothy Potter thelabd...@gmail.com wrote:
 I'm seeing that RTG requests get routed to any active replica of the
 shard hosting the doc requested by /get ... I was thinking only the
 leader should handle that request since there's a brief window of time
 where the latest update may not be on the replica (albeit usually very
 brief) and the latest update is definitely on the leader.

There are different levels of consistency.
You are guaranteed that after an update completes, a RTG will retrieve
that version of the update (or later).
The fact that a replica gets the update after the leader is not
material to this guarantee since the update has not yet completed.

What can happen is that if you are doing multiple RTG requests, you
can see a later version of a document, then see a previous version
(because you're hitting different shards).  This will only be an issue
in certain types of use-cases.  Optimistic concurrency, for example,
will *not* be bothered by this phenomenon.

In the past, we've talked about an option to route search requests to
the leader.  But really, any type of server affinity would work to
ensure a monotonic view of a document's history.  Off the top of my
head, I'm not really sure what types of apps require it, but I'd be
interested in hearing about them.

-Yonik


Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
a few further clues to this unresolved problem

1. I found one of my 5 zookeeper instances was down
2. I tried another reindex of a bad document but no change on the SOLR side
3. I deleted and reindexed the same doc, that worked (obviously, but at this
point I don't know what to expect)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206946.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread David Smiley
Another more modern option, very related to this, is to use DateRangeField in 
5.0.  You have full 64 bit precision.  More info is in the Solr Ref Guide.

If Alessandro sticks with RPT, then the best reference to give is this:
http://wiki.apache.org/solr/SpatialForTimeDurations

~ David
https://www.linkedin.com/in/davidwsmiley

 On May 21, 2015, at 11:49 AM, Holger Rieß holger.ri...@werkzeug-eylert.de 
 wrote:
 
 Give geospatial search a chance. Use the 
 'SpatialRecursivePrefixTreeFieldType' field type, set 'geo' to false.
 The date is located on the X-axis, prices on the Y axis.
 For every price you get a horizontal line between start and end date. Index a 
 rectangle with height 0.001( 1 cent) and width 'end date - start date'.
 
 Find all prices that are valid on a given day or in a given date range with 
 the 'geofilt' function.
 
 The field type could look like (not tested):
 
 fieldType name=price_date_range 
 class=solr.SpatialRecursivePrefixTreeFieldType
   geo=false distErrPct=0.025 maxDistErr=0.09 units=degrees
   worldBounds=1 0 366 1 /
 
 Faceting possibly can be done with a facet query for every of your price 
 ranges.
 For example day 20, price range 0-5$, rectangle: field name=pdr20.0 0.0 
 21.0 5.0/field.
 
 Regards Holger
 



Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread alexw
Thanks Holger and Alessandro, SpatialRecursivePrefixTreeFieldType  is a new
concept to me, and I need some time to dig into it and see how it can help
solve my problem.

Alex Wang
Technical Architect
Crossview, Inc.
C: (647) 409-3066
aw...@crossview.com

On Thu, May 21, 2015 at 11:50 AM, Holger Rieß [via Lucene] 
ml-node+s472066n4206868...@n3.nabble.com wrote:

 Give geospatial search a chance. Use the
 'SpatialRecursivePrefixTreeFieldType' field type, set 'geo' to false.
 The date is located on the X-axis, prices on the Y axis.
 For every price you get a horizontal line between start and end date.
 Index a rectangle with height 0.001( 1 cent) and width 'end date - start
 date'.

 Find all prices that are valid on a given day or in a given date range
 with the 'geofilt' function.

 The field type could look like (not tested):

 fieldType name=price_date_range
 class=solr.SpatialRecursivePrefixTreeFieldType
 geo=false distErrPct=0.025 maxDistErr=0.09
 units=degrees
 worldBounds=1 0 366 1 /

 Faceting possibly can be done with a facet query for every of your price
 ranges.
 For example day 20, price range 0-5$, rectangle: field name=pdr20.0
 0.0 21.0 5.0/field.

 Regards Holger



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Price-Range-Faceting-Based-on-Date-Constraints-tp4206817p4206868.html
  To unsubscribe from Price Range Faceting Based on Date Constraints, click
 here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4206817code=YXdhbmdAY3Jvc3N2aWV3LmNvbXw0MjA2ODE3fDE4OTQ1NzE1NTI=
 .
 NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml


-- 
 https://www.youtube.com/user/CrossViewInc1
http://www.crossview.com http://www.crossview.com
https://twitter.com/CrossView_Inc 
https://www.youtube.com/user/CrossViewInc1 
http://www.linkedin.com/company/crossview-inc./products 
https://plus.google.com/u/0/+Crossview/about http://blog.crossview.com

This message may contain confidential and/or privileged information or 
information 
related to CrossView Intellectual Property. If you are not the addressee or 
authorized to receive this for the addressee, you must not use, copy, 
disclose, or take any action based on this message or any information 
herein. If you have received this message in error, please advise the 
sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Price-Range-Faceting-Based-on-Date-Constraints-tp4206817p4206951.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is it possible to search for the empty string?

2015-05-21 Thread Chris Hostetter

: Subject: Re: Is it possible to search for the empty string?
: 
: Not out of the box.
: 
: Fields are parsed into tokens and queries search on tokens. An empty 
: string has no tokens for that field and a missing field has no tokens 
: for that field.

that's a missleading over simplification of what *normally* happens.

it is absolutely possible to have documents with fields whose indexed 
temrs consist of the empty string, and to search for those empty 
strings -- the most trivial way being with a simple StrField -- but using 
TExtField with some creative analyzers it's also very possible..


$ curl 
'http://localhost:8983/solr/techproducts/select?q=*:*facet=truefacet.field=foo_swt=jsonindent=trueomitHeader=true'
{
  response:{numFound:3,start:0,docs:[
  {
id:foo_blank,
foo_s:,
_version_:1501816569733316608},
  {
id:foo_non_blank,
foo_s:bar,
_version_:1501816583564034048},
  {
id:foo_missing,
_version_:1501816591383265280}]
  },
  facet_counts:{
facet_queries:{},
facet_fields:{
  foo_s:[
,1,
bar,1]},
facet_dates:{},
facet_ranges:{},
facet_intervals:{},
facet_heatmaps:{}}}

$ curl 
'http://localhost:8983/solr/techproducts/select?q=foo_s:wt=jsonindent=trueomitHeader=true'
{
  response:{numFound:1,start:0,docs:[
  {
id:foo_blank,
foo_s:,
_version_:1501816569733316608}]
  }}

$ curl 
'http://localhost:8983/solr/techproducts/select?q=foo_s:*wt=jsonindent=trueomitHeader=true'
{
  response:{numFound:2,start:0,docs:[
  {
id:foo_blank,
foo_s:,
_version_:1501816569733316608},
  {
id:foo_non_blank,
foo_s:bar,
_version_:1501816583564034048}]
  }}

$ curl 
'http://localhost:8983/solr/techproducts/select?q=-foo_s:*wt=jsonindent=trueomitHeader=true'
{
  response:{numFound:1,start:0,docs:[
  {
id:foo_missing,
_version_:1501816591383265280}]
  }}


-Hoss
http://www.lucidworks.com/


optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-21 Thread Matteo Grolla
Hi
I'd like some feedback on how I'd like to solve the following sharding problem


I have a collection that will eventually become big

Average document size is 1.5kb
Every year 30 Million documents will be indexed

Data come from different document producers (a person, owner of his documents) 
and queries are almost always performed by a document producer who can only 
query his own document. So shard by document producer seems a good choice

there are 3 types of doc producer
type A, 
cardinality 105 (there are 105 producers of this type)
produce 17M docs/year (the aggregated production af all type A producers)
type B
cardinality ~10k
produce 4M docs/year
type C
cardinality ~10M
produce 9M docs/year

I'm thinking about 
use compositeId ( solrDocId = producerId!docId ) to send all docs of the same 
producer to the same shards. When a shard becomes too large I can use shard 
splitting.

problems
-documents from type A producers could be oddly distributed among shards, 
because hashing doesn't work well on small numbers (105) see Appendix

As a solution I could do this when a new typeA producer (producerA1) arrives:

1) client app: generate a producer code
2) client app: simulate murmurhashing and shard assignment
3) client app: check shard assignment is optimal (producer code is assigned to 
the shard with the least type A producers) otherwise goto 1) and try with 
another code

when I add documents or perform searches for producerA1 I use it's producer 
code respectively in the compositeId or in the route parameter
What do you think?


---Appendix: murmurhash shard assignment 
simulation---

import mmh3

hashes = [mmh3.hash(str(i))16 for i in xrange(105)]

num_shards = 16
shards = [0]*num_shards

for hash in hashes:
idx = hash % num_shards
shards[idx] += 1

print shards
print sum(shards)

-

result: [4, 10, 6, 7, 8, 6, 7, 8, 11, 1, 8, 5, 6, 5, 5, 8]

so with 16 shards and 105 shard keys I can have
shards with 1 key
shards with 11 keys



Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread alexw
Hi Alex,

Thanks for the link to the presentation. I am going through the slides and
trying to figure out the time-sensitive search it talks about and how it
relates to the problem I am facing. It looks like it tries to solve the
problem of sku availability based on date, while in my case, all skus are
available, but the prices are time-sensitive, and faceting logic needs to
pick the right price for each sku when counting.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Price-Range-Faceting-Based-on-Date-Constraints-tp4206817p4206856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread Alexandre Rafalovitch
Did you look at Gilt's presentation from a while ago:
http://www.slideshare.net/trenaman/personalized-search-on-the-largest-flash-sale-site-in-america

Slides 33 on might be most relevant.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 21 May 2015 at 22:58, alexw aw...@crossview.com wrote:
 Hi,

 I have an unique requirement to facet on product prices based on date
 constraints, for which I have been thinking for a solution for a couple of
 days now, but to no avail. The details are as follows:

 1. Each product can have multiple prices, each price has a start-date and an
 end-date.
 2. At search time, we need to facet on price ranges ($0 - $5, $5-$20,
 $20-$50...)
 3. When faceting, a date is first determined. It can be either the current
 system date or a future date (call it date X)
 4. For each product, the price to be used for faceting has to meet the
 following condition: start-date  date X, and date X  end-date, in other
 words, date X has to fall within start-date and end-date.
 5. My Solr version: 3.5

 Hopefully I explained the requirement clearly. I have tried single price
 field with multivalue and each price value has startdate and enddate
 appended. I also tried one field per price with the field name containing
 both startdate and enddate. Neither approach seems to work. Can someone
 please shed some light as to how the index should be designed and what the
 facet query should look like?

 Thanks in advance for your help!



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Price-Range-Faceting-Based-on-Date-Constraints-tp4206817.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
l If it is implicit then
 you may have indexed the new document to a different shard, which means
 that it is now in your index more than once, and which one gets returned
 may not be predictable.

If a document with uniqueKey 1234 is assigned to a shard by SolrCloud,
implicit routing won't a reindex of 1234 be assigned to the same shard? 
If not you'd have dups all over the cluster. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206849.html
Sent from the Solr - User mailing list archive at Nabble.com.


Clarification on Collections API for 5.x

2015-05-21 Thread Jim . Musil
Hi,

In the guide for moving from Solr 4.x to 5.x, it states the following:

Solr 5.0 only supports creating and removing SolrCloud collections through the 
Collections 
APIhttps://cwiki.apache.org/confluence/display/solr/Collections+API, unlike 
previous versions. While not using the collections API may still work in 5.0, 
it is unsupported, not recommended, and the behavior will change in a 5.x 
release.

Currently, we launch several solr nodes with identical cores defined using the 
new Core Discovery process. These nodes are also connected to a zookeeper 
ensemble. Part of the core definition is to set the configSet to use. This 
configSet is uploaded to zookeeper separately. This effectively creates a 
Collection.

Is this method no long supported in 5.x?

Thanks!
Jim Musil



Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
 let's see the code.

simplified code and some comments

1.  solrUrl points at leader 1 of 3 leaders, each with a replica  
2.  createSolrDoc takes a full Mongo doc and returns a valid
SolrInputDocument 
3.  I have done dumps of the returned solrDoc and verified it does not have
the unwanted fields

SolrServer solrServer = new HttpSolrServer(solrUrl);   
SolrInputDocument solrDoc = solrDocFactory.createSolrDoc(mongoDoc,
dbName);
UpdateResponse uresponse  = solrServer.add(solrDoc);


 issue a query on some of the unique ids in question
SolrCloud is returning only 1 document per uniqueKey


 Did you push your schema up to Zookeeper and reload 
 (or restart) your collection before re-indexing things? 
no.  the config was pushed up to Zookeeper only once a few months ago.  The
documents in question were updated in Mongo and given an updated
create_date.  Based on this new create_date my SolrJ client detects and
reindexes them.

 are you sure the documents are actually getting indexed and that the
 update 
 is succeeding?
yes, I see a new value in the timestamp field each time I reindex  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206841.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread alexw
Thanks Alessandro. I am implementing this in the Hybris framework. It is not
easy to create nested documents during indexing using the Hybris Solr
indexer. So I am trying to avoid additional documents and cores if at all
possible.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Price-Range-Faceting-Based-on-Date-Constraints-tp4206817p4206854.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread Alessandro Benedetti
Hi Alex,
this is not a simple problem.
In your domain we can consider a Product as a document and the list of
Price nested Documents.
Ideally we would model the Product as the father and the prices as children.
Each Price will be defined by :


   -
*start_date *
   -
*end_date *
   -
*price *
   - *productId*

We can define 2 collections this way and play with Joins and faceting.
Take a look here :

http://lucene.472066.n3.nabble.com/How-do-I-get-faceting-to-work-with-Solr-JOINs-td4147785.html#a4148838

If redundancy of data is not a problem for you, you can proceed with a
simple approach where you add redundant documents.
Each document will have the start_date,end_date and price as single value
fields.
In the redundant scenario, the approach to follow is quite easy :
- always filtering by date the docs and then proceed faceting .

Cheers

2015-05-21 13:58 GMT+01:00 alexw aw...@crossview.com:

 Hi,

 I have an unique requirement to facet on product prices based on date
 constraints, for which I have been thinking for a solution for a couple of
 days now, but to no avail. The details are as follows:

 1. Each product can have multiple prices, each price has a start-date and
 an
 end-date.
 2. At search time, we need to facet on price ranges ($0 - $5, $5-$20,
 $20-$50...)
 3. When faceting, a date is first determined. It can be either the current
 system date or a future date (call it date X)
 4. For each product, the price to be used for faceting has to meet the
 following condition: start-date  date X, and date X  end-date, in other
 words, date X has to fall within start-date and end-date.
 5. My Solr version: 3.5

 Hopefully I explained the requirement clearly. I have tried single price
 field with multivalue and each price value has startdate and enddate
 appended. I also tried one field per price with the field name containing
 both startdate and enddate. Neither approach seems to work. Can someone
 please shed some light as to how the index should be designed and what the
 facet query should look like?

 Thanks in advance for your help!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Price-Range-Faceting-Based-on-Date-Constraints-tp4206817.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Logic on Term Frequency Calculation : Bug or Functionality

2015-05-21 Thread Ahmet Arslan
Hi Ariya,

DefaultSimilarity does not use raw term frequency, but instead it uses square 
root of raw term frequency.
If you want to observe raw term frequency information in explain section, I 
suggest you to play with
org.apache.lucene.search.similarities.SimilarityBase and its sub-classes.

ahmet




On Thursday, May 21, 2015 3:59 PM, ariya bala ariya...@gmail.com wrote:
Hi,

I am puzzled on the Term Frequency Behaviour of the DefaultSimilarity
implementation
I have suppressed the IDF by setting to 1.
TF-IDF would inturn reflect the same value as in Term Frequency

Below are the inferences:
Red coloured are expected to give a hit count(Term Frequency) of 2 but was
one.
*Is it bug or is it how the behaviour is?*

Search Query: AAA BBB
Parsed Query: PhraseQuery(Contents:\aaa bbb\~5000)

DocumentContentSlopTFslop0TFslop2TF1AAA BBB-101212BBB AAA-10-213AAA AAA BBB-
101214AAA BBB AAA-201225BBB AAA AAA-10-216AAA BBB BBB-101217BBB AAA BBB-1012
18BBB BBB AAA-10-21

*Am I missing something?!*


Cheers
*Ariya *


Re: Indexing gets significantly slower after every batch commit

2015-05-21 Thread Erick Erickson
bq: Which is logical as index growth and time needed to put something
to it is log(n)

Not really. Solr indexes to segments, each segment is a fully
consistent mini index.
When a segment gets flushed to disk, a new one is started. Of course
there'll be a
_little bit_ of added overyead, but it shouldn't be all that noticeable.

Furthermore, they're append only. In the past, when I've indexed the
Wiki example,
my indexing speed actually goes faster.

So on the surface this sounds very strange to me. Are you seeing
anything at all in the
Solr logs that's supsicious?

Best,
Erick

On Thu, May 21, 2015 at 12:22 PM, Sergey Shvets ser...@bintime.com wrote:
 Hi Angel

 We also noticed that kind of performance degrade in our workloads.

 Which is logical as index growth and time needed to put something to it is
 log(n)



 четверг, 21 мая 2015 г. пользователь Angel Todorov написал:

 hi Shawn,

 Thanks a bunch for your feedback. I've played with the heap size, but I
 don't see any improvement. Even if i index, say , a million docs, and the
 throughput is about 300 docs per sec, and then I shut down solr completely
 - after I start indexing again, the throughput is dropping below 300.

 I should probably experiment with sharding those documents to multiple SOLR
 cores - that should help, I guess. I am talking about something like this:


 https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

 Thanks,
 Angel


 On Thu, May 21, 2015 at 11:36 AM, Shawn Heisey apa...@elyograg.org
 javascript:; wrote:

  On 5/21/2015 2:07 AM, Angel Todorov wrote:
   I'm crawling a file system folder and indexing 10 million docs, and I
 am
   adding them in batches of 5000, committing every 50 000 docs. The
  problem I
   am facing is that after each commit, the documents per sec that are
  indexed
   gets less and less.
  
   If I do not commit at all, I can index those docs very quickly, and
 then
  I
   commit once at the end, but once i start indexing docs _after_ that
 (for
   example new files get added to the folder), indexing is also slowing
  down a
   lot.
  
   Is it normal that the SOLR indexing speed depends on the number of
   documents that are _already_ indexed? I think it shouldn't matter if i
   start from scratch or I index a document in a core that already has a
   couple of million docs. Looks like SOLR is either doing something in a
   linear fashion, or there is some magic config parameter that I am not
  aware
   of.
  
   I've read all perf docs, and I've tried changing mergeFactor,
   autowarmCounts, and the buffer sizes - to no avail.
  
   I am using SOLR 5.1
 
  Have you changed the heap size?  If you use the bin/solr script to start
  it and don't change the heap size with the -m option or another method,
  Solr 5.1 runs with a default size of 512MB, which is *very* small.
 
  I bet you are running into problems with frequent and then ultimately
  constant garbage collection, as Java attempts to free up enough memory
  to allow the program to continue running.  If that is what is happening,
  then eventually you will see an OutOfMemoryError exception.  The
  solution is to increase the heap size.  I would probably start with at
  least 4G for 10 million docs.
 
  Thanks,
  Shawn
 
 



Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm posting the fields from one of my problem document, based on this comment
I found from Shawn on Grokbase.  

 If you are trying to use a Map object as the value of a field, that is
 probably why it is interpreting your add request as an atomic update.
 If this is the case, and you're doing it because you have a multivalued
 field, you can use a List object rather than a Map.

This is just a solrDoc.toString() with linebreaks where commas were.  Maybe
some of these are being seen as map fields by SOLR.
=
SolrInputDocument[

mynamespaces_s_mv=[drama],

changedates_s_mv=[Tue May 19 17:21:26 EDT 2015, Thu Dec 30 19:00:00 EST
],

networks_t_mv=[{ abcitem-id : 288578fd-6596-47bc-af95-80daecd1f24a ,
abccontentType : Standard:SocialHandle , SocialNetwork : { $uuid :
73553c4c-4919-4ba9-b16c-fb340f3e4c31} , Handle : in my
imaginationseries}],

links_s_mv=[ { $uuid : 4d8eb47c-ce2d-4e7f-a567-d8d6692fed4e} , { $uuid
: 9fd75c26-35f2-4f48-b55a-6e82089cc3ba} , { $uuid :
150e43ed-9ebe-41b4-86cc-bdf4885a50fe} , { $uuid :
e20b0040-561f-4c34-9dd3-df85250b5a5b} , { $uuid :
0cff75d0-4f32-46c9-9092-60eec2dc847a} , { $uuid :
73553c4c-4919-4ba9-b16c-fb340f3e4c31}],

ratings_t_mv=[{ abcitem-id : 56058649-579a-4160-9439-e59448eb3dff ,
abccontentType : Standard:TVPG , Rating : { $uuid :
150e43ed-9ebe-41b4-86cc-bdf4885a50fe}}],

title_ci_t=in my imagination,

urlkey_s=in-my imagination,

title_cs_t=In My Imagination,

dp2_1_s_mv=[ { _id : { $uuid : 4d8eb47c-ce2d-4e7f-a567-d8d6692fed4e} ,
_rules : [ { _startDate : { $date : 2015-03-23T14:58:00.000Z} ,
_endDate : { $date : -12-31T00:00:00.000Z} , _r : { $uuid :
47b6b31d-d690-437a-9bab-6eeb7be3c8a4} , _p : { $uuid :
d478874f-8fc7-4b3d-97f3-f7e63222d633} , _o : { $uuid :
983b6ae9-7882-4af8-bb2f-cff342be99b3} , _a :  null }]}],

seriestype_s=e20b0040-561f-4c34-9dd3-df85250b5a5b,

shortid_s=x5jqqf, i

shorttitle_t=In My Imagination,

uuid_s=90a1fbbf-ddf8-47a7-9f00-55f05e7dc297,

status_s=DEFAULT,

updatedby_s=maceirar,

description_t=sometext,

review_s_mv=[{ abcpublished : { $date : 2015-05-19T21:21:30.930Z} ,
abcpublishedBy : jelly , abctargetEnvironment :
entertainment-staging , abcrequestId : { $uuid :
56769138-4a03-4ed6-8b29-8030d0941b08} , abcsourceEnvironment : fishing
, abcstate : true}, { abcpublished : { $date :
2015-05-19T21:21:31.731Z} , abcpublishedBy : jelly ,
abctargetEnvironment : myshow-live , abcrequestId : { $uuid :
56769138-4a03-4ed6-8b29-8030d0941b08} , abcsourceEnvironment :
myshow-staging , abcstate : true}],

sorttitle_t=In My Imagination,

images_s_mv=[ { $uuid : 9fd75c26-35f2-4f48-b55a-6e82089cc3ba} , {
$uuid : 0cff75d0-4f32-46c9-9092-60eec2dc847a}],

title_ci_s=in my imagination,

firmuuids_s_mv=[ { $uuid : 4d8eb47c-ce2d-4e7f-a567-d8d6692fed4e}],

id=mongo-v2.abcnservices.com-fishing-90a1fbbf-ddf8-47a7-9f00-55f05e7dc297,

timestamp=Thu May 21 17:29:58 EDT 2015

]




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206963.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread alexw
Thanks David. Unfortunately we are on Solr 3.5, so I am not sure whether RPT
is available. If not, is there a way to patch 3.5 to make it work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Price-Range-Faceting-Based-on-Date-Constraints-tp4206817p4207003.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread david.w.smi...@gmail.com
Indeed: https://github.com/dsmiley/SOLR-2155

On Thu, May 21, 2015 at 8:59 PM alexw aw...@crossview.com wrote:

 Thanks David. Unfortunately we are on Solr 3.5, so I am not sure whether
 RPT
 is available. If not, is there a way to patch 3.5 to make it work?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Price-Range-Faceting-Based-on-Date-Constraints-tp4206817p4207003.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud with local configs

2015-05-21 Thread Shawn Heisey
On 5/21/2015 7:24 PM, Steven Bower wrote:
 Is it possible to run in cloud mode with zookeeper managing
 collections/state/etc.. but to read all config files (solrconfig, schema,
 etc..) from local disk?
 
 Obviously this implies that you'd have to keep them in sync..
 
 My thought here is of running Solr in a docker container, but instead of
 having to manage schema changes/etc via zk I can just build the config into
 the container.. and then just produce a new docker image with a solr
 version and the new config and just do rolling restarts of the containers..

As far as I am aware, this is not possible.  As I think about it, I'm
not convinced that it's a good idea.  If you're going to be using
zookeeper for ANY purpose, the config should be centralized in zookeeper.

The ZK chroot (or new ZK ensemble, if you choose to go that route) will
be dedicated to that specific cluster.  It won't be shared with any
other cluster.  Any automation you've got that fires up a new cluster
can simply upload the cluster-specific config into the new ZK chroot as
it builds the container(s) for the cluster.  Teardown automation can
delete the chroot.

The idea is probably worth an issue in jira.  I won't veto the
implementation, but as I said above, I'm not yet convinced that it's a
good idea -- ZK is already in use for the clusterstate, using it for the
config completely eliminates the need for config synchronization.  Do
you have a larger compelling argument?

Thanks,
Shawn



Re: Index Sizes

2015-05-21 Thread Shawn Heisey
On 1/7/2014 7:48 AM, Steven Bower wrote:
 I was looking at the code for getIndexSize() on the ReplicationHandler to
 get at the size of the index on disk. From what I can tell, because this
 does directory.listAll() to get all the files in the directory, the size on
 disk includes not only what is searchable at the moment but potentially
 also files that are being created by background merges/etc.. I am wondering
 if there is an API that would give me the size of the currently
 searchable index files (doubt this exists, but maybe)..
 
 If not what is the most appropriate way to get a list of the segments/files
 that are currently in use by the active searcher such that I could then ask
 the directory implementation for the size of all those files?
 
 For a more complete picture of what I'm trying to accomplish, I am looking
 at building a quota/monitoring component that will trigger when index size
 on disk gets above a certain size. I don't want to trigger if index is
 doing a merge and ephemerally uses disk for that process. If anyone has any
 suggestions/recommendations here too I'd be interested..

Dredging up a VERY old thread here.  As I was replying to your most
recent query, I was looking through my email archive for your previous
messages and this one caught my eye, especially because it never got a
reply.  It must have escaped my notice last year.

This is a very good idea.  I imagine that the active searcher object
directly or indirectly knows exactly which files are in use for that
searcher, so I think it should be relatively easy for it to retrieve a
list, and the index size code should be able to return both the active
index size as well as the total directory size.

I've been putting a little bit of work in to get the index size code
moved out of the replication handler so that it is available even if
replication is completely disabled, but my free time has been limited.
I don't recall the issue number(s) for that work.

Thanks,
Shawn



Re: Index optimize runs in background.

2015-05-21 Thread Modassar Ather
Hi

An insight on the question will be really helpful.

Thanks,
Modassar

On Thu, May 21, 2015 at 5:51 PM, Modassar Ather modather1...@gmail.com
wrote:

 Hi,

 I am using Solr-5.1.0. I have an indexer class which invokes
 cloudSolrClient.optimize(true, true, 1). My indexer exits after the
 invocation of optimize and the optimization keeps on running in the
 background.
 Kindly let me know if it is per design and how can I make my indexer to
 wait until the optimization is over. Is there a configuration/parameter I
 need to set for the same.

 Please note that the same indexer with cloudSolrServer.optimize(true,
 true, 1) on Solr-4.10 used to wait till the optimize was over before
 exiting.

 Thanks,
 Modassar




SolrCloud with local configs

2015-05-21 Thread Steven Bower
Is it possible to run in cloud mode with zookeeper managing
collections/state/etc.. but to read all config files (solrconfig, schema,
etc..) from local disk?

Obviously this implies that you'd have to keep them in sync..

My thought here is of running Solr in a docker container, but instead of
having to manage schema changes/etc via zk I can just build the config into
the container.. and then just produce a new docker image with a solr
version and the new config and just do rolling restarts of the containers..

Thanks,

Steve


Re: Search for numbers

2015-05-21 Thread david.w.smi...@gmail.com
Hi Holger,

It’s not apparent to me why you are using the spatial field to index a
number.  Why not simply a “tfloat” or whatever numeric field?  Then you
could use {!frange} with a function to get the difference and filter it to
be in the range you want.

RE query parsing (problem #1): you should write a custom query parser…
perhaps by forking ExtendedDisMaxQParser to meet your needs.  But I think
you’ll have something cleaner / more maintainable if you write one from
scratch while looking at that QParser for tips/inspiration; not porting the
features you don’t want.

RE problem #2: I’m a little unclear on what you want to do, but it’s likely
you can express it with {!frange} on a number field (not spatial) with the
right functions.  If you can’t), you could write either a custom function
(AKA ValueSource) or if needed a frange like thing for your custom needs.

~ David
http://www.linkedin.com/in/davidwsmiley


On Thu, May 21, 2015 at 3:22 AM Holger Rieß holger.ri...@werkzeug-eylert.de
wrote:

 Hi,

 I try to search numbers with a certain deviation. My parser is
 ExtendedDisMax.
 A possible search expression could be 'twist drill 1.23 mm'. It will not
 match any documents, because the document contains the keywords 'twist
 drill', '1.2' and 'mm'.

 In order to reach my goal, I've indexed all numbers as points with the
 solr.SpatialRecursivePrefixTreeFieldType.
 For example '1.2' as field name=feature_nr1.2 0.0/field.
 A search with 'drill mm' and a filter query 'fq={!geofilt pt=0,1.23
 sfield=feature_nr d=5}' delivers the expected results.

 Now I have two problems:
 1. How can I get ExtendedDisMax, to 'replace' the value 1.2 with the
 '{!geofilt}' function?
   My first attemts were

 - Build a field type in schema.xml and replace the field content with a
 regular expression
 '... replacement=_query_:quot;{!geofilt pt=0,$1 sfield=feature_nr
 d=5}quot;'.
 The idea was to use a nested query. But edismax searches
 'feature_nr:_query_:{!geofilt pt=0,$1 sfield=feature_nr d=5}'.
 No documents are found.

 - Program a new parser that analyzes the query terms, finds all numbers
 and does the geospatial stuff. Added this parser in the 'appends' section
 of the 'requestHandler' definition. But I can get this parser only to
 filter my results, not to extend them.

 2. I want to calculate the distance (d) of the '{!geofilt}' function
 relative to the value, for example 1%.

 Could there be a simple solution?

 Thank you in advance.
 Holger



Applying gzip compression in Solr 5.1

2015-05-21 Thread Zheng Lin Edwin Yeo
Hi,

I'm trying to apply gzip compression in Solr 5.1. I understand that Running
Solr on Tomcat is no longer supported from Solr 5.0, so I've tried to
implement it in Solr.

I've downloaded jetty-servlets-9.3.0.RC0.jar and placed it in my
webapp\WEB-INF folder, and have added the following in
webapp\WEB-INF\web.xml

  filter
filter-nameGzipFilter/filter-name
filter-classorg.eclipse.jetty.servlets.GzipFilter/filter-class
init-param
  param-namemethods/param-name
  param-valueGET,POST/param-value
  param-namemimeTypes/param-name

param-valuetext/html;charset=UTF-8,text/plain,text/xml,text/json,text/javascript,text/css,text/plain;charset=UTF-8,application/xhtml+xml,application/javascript,image/svg+xml,application/json,application/xml;
charset=UTF-8/param-value
/init-param
  /filter
  filter-mapping
filter-nameGzipFilter/filter-name
url-pattern/*/url-pattern
  /filter-mapping


However, when I start Solr and check the browser, there's no gzip
compression. Is there anything which I configure wrongly or might have
missed out? I'm also running zookeeper-3.4.6.


Regards,
Edwin


Re: Java upgrade for solr in master-slave configuration

2015-05-21 Thread Kamal Kishore Aggarwal
Hi,

Anybody tried upgrading master first prior to slave Java upgrade. Please
suggest.




On Tue, May 19, 2015 at 6:50 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/19/2015 12:21 AM, Kamal Kishore Aggarwal wrote:
  I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. The solr
  configuration has slave  master architecture. I am looking forward to
  upgrade Java from 1.7 to 1.8 version in order to take advantage of memory
  optimization done in latest version.
 
  So, I am confused if I should upgrade java first on master server and
 then
  on slave server or the other way round. What should be the ideal steps,
 so
  that existing solr index and other things should not get corrupted .
 Please
  suggest.

 I am not aware of any changes in index format resulting from changing
 your Java version.  It should not matter which machines you upgrade first.

 Thanks,
 Shawn




Confused about whether Real-time Gets must be sent to leader?

2015-05-21 Thread Timothy Potter
I'm seeing that RTG requests get routed to any active replica of the
shard hosting the doc requested by /get ... I was thinking only the
leader should handle that request since there's a brief window of time
where the latest update may not be on the replica (albeit usually very
brief) and the latest update is definitely on the leader. Am I
overthinking this since we've always maintained that Solr is
eventually consistent or ???

Cheers,
Tim


Re: solr uima and opennlp

2015-05-21 Thread Tommaso Teofili
Hi Andreaa,

2015-05-21 18:12 GMT+02:00 hossmaa andreea.hossm...@gmail.com:

 Hi everyone

 I'm trying to plug in a new UIMA annotator into solr. What is necessary for
 this? Is is enough to build a Jar similarly to the ones from the
 uima-addons
 package?


yes, exactly. Actually you just need a jar containing the Annotator class
(and dependencies) that you reference from within the
UIMAUpdateRequestProcessor.


 More specifically, are the uima-addona Jars identical to the ones
 found in solr's contrib folder?


they are the 2.3.1 versions of those jars.

Regards,
Tommaso



 Thanks!
 Andreea



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-uima-and-opennlp-tp4206873.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing gets significantly slower after every batch commit

2015-05-21 Thread Sergey Shvets
Hi Angel

We also noticed that kind of performance degrade in our workloads.

Which is logical as index growth and time needed to put something to it is
log(n)



четверг, 21 мая 2015 г. пользователь Angel Todorov написал:

 hi Shawn,

 Thanks a bunch for your feedback. I've played with the heap size, but I
 don't see any improvement. Even if i index, say , a million docs, and the
 throughput is about 300 docs per sec, and then I shut down solr completely
 - after I start indexing again, the throughput is dropping below 300.

 I should probably experiment with sharding those documents to multiple SOLR
 cores - that should help, I guess. I am talking about something like this:


 https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

 Thanks,
 Angel


 On Thu, May 21, 2015 at 11:36 AM, Shawn Heisey apa...@elyograg.org
 javascript:; wrote:

  On 5/21/2015 2:07 AM, Angel Todorov wrote:
   I'm crawling a file system folder and indexing 10 million docs, and I
 am
   adding them in batches of 5000, committing every 50 000 docs. The
  problem I
   am facing is that after each commit, the documents per sec that are
  indexed
   gets less and less.
  
   If I do not commit at all, I can index those docs very quickly, and
 then
  I
   commit once at the end, but once i start indexing docs _after_ that
 (for
   example new files get added to the folder), indexing is also slowing
  down a
   lot.
  
   Is it normal that the SOLR indexing speed depends on the number of
   documents that are _already_ indexed? I think it shouldn't matter if i
   start from scratch or I index a document in a core that already has a
   couple of million docs. Looks like SOLR is either doing something in a
   linear fashion, or there is some magic config parameter that I am not
  aware
   of.
  
   I've read all perf docs, and I've tried changing mergeFactor,
   autowarmCounts, and the buffer sizes - to no avail.
  
   I am using SOLR 5.1
 
  Have you changed the heap size?  If you use the bin/solr script to start
  it and don't change the heap size with the -m option or another method,
  Solr 5.1 runs with a default size of 512MB, which is *very* small.
 
  I bet you are running into problems with frequent and then ultimately
  constant garbage collection, as Java attempts to free up enough memory
  to allow the program to continue running.  If that is what is happening,
  then eventually you will see an OutOfMemoryError exception.  The
  solution is to increase the heap size.  I would probably start with at
  least 4G for 10 million docs.
 
  Thanks,
  Shawn
 
 



Re: Solr suggester

2015-05-21 Thread jon kerling
Hi Erick,
I have read your blog and it is really helpful.I'm thinking about upgrading to 
Solr 5.1 but it won't solve all my problems with this issue, as you said each 
build will have to read all docs, and analyze it's fields. The only advantage 
is that I can skip default suggest.build on start up.
Thank you for your reply. 
Jon Kerling.
   


 On Thursday, May 21, 2015 6:38 PM, Erick Erickson 
erickerick...@gmail.com wrote:
   

 Frankly, the suggester is rather broken in Solr 4.x with large
indexes. Building the suggester index (or FST) requires that _all_ the
docs get read, the stored fields analyzed and added to the suggester.
Unfortunately, this happens _every_ time you start Solr and can take
many minutes whether or not you have buildOnStartup set to false, see:
https://issues.apache.org/jira/browse/SOLR-6845.

See: http://lucidworks.com/blog/solr-suggester/

See inline.

On Thu, May 21, 2015 at 6:12 AM, jon kerling
jonkerl...@yahoo.com.invalid wrote:
 Hi,

 I'm using solr 4.10 and I'm trying to add autosuggest ability to my 
 application.
 I'm currently using this kind of configuration:

  searchComponent name=suggest class=solr.SuggestComponent
    lst name=suggester
      str name=namemySuggester/str
      str name=lookupImplFuzzyLookupFactory/str
      str name=storeDirsuggester_fuzzy_dir/str
      str name=dictionaryImplDocumentDictionaryFactory/str
      str name=fieldfield2/str
      str name=weightFieldweightField/str
      str name=suggestAnalyzerFieldTypetext_general/str
    /lst
 /searchComponent

  requestHandler name=/suggest class=solr.SearchHandler startup=lazy
    lst name=defaults
      str name=suggesttrue/str
      str name=suggest.count10/str
      str name=suggest.dictionarymySuggester/str
    /lst
    arr name=components
      strsuggest/str
    /arr
  /requestHandler

 I wanted to know how the suggester Index/file is being rebuilt.
 Is it suppose to have all the terms of the desired field in the suggester?
Yes.
 if not, is it related to this kind of lookup implementation?
 if I'll use other lookup implementation which suggest also infix terms of 
 fields,
 doesn't it has to hold all terms of the field?
Yes.

 When i call suggest.build, does it build from scratch the suggester 
 Index/file,
 or is it just doing something like sort of delta indexing suggestions?
Builds from scratch

 Thank You,
 Jon


  

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm doing all my index to leader 1 and have not specified any router
configuration.  But there is an equal distribution of 240M docs across 5
shards.  I think I've been stating I have 3 shards in these posts, I have 5,
sorry.

How do I know what kind of routing I am using?  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206869.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-21 Thread Erick Erickson
I question your base assumption:

bq: So shard by document producer seems a good choice

 Because what this _also_ does is force all of the work for a query
onto one node and all indexing for a particular producer ditto. And
will cause you to manually monitor your shards to see if some of them
grow out of proportion to others. And

I think it would be much less hassle to just let Solr distribute the
docs as it may based on the uniqueKey and forget about it. Unless you
want, say, to do joins etc There will, of course, be some overhead
that you pay here, but unless you an measure it and it's a pain I
wouldn't add the complexity you're talking about, especially at the
volumes you're talking.

Best,
Erick

On Thu, May 21, 2015 at 3:20 AM, Matteo Grolla matteo.gro...@gmail.com wrote:
 Hi
 I'd like some feedback on how I'd like to solve the following sharding problem


 I have a collection that will eventually become big

 Average document size is 1.5kb
 Every year 30 Million documents will be indexed

 Data come from different document producers (a person, owner of his 
 documents) and queries are almost always performed by a document producer who 
 can only query his own document. So shard by document producer seems a good 
 choice

 there are 3 types of doc producer
 type A,
 cardinality 105 (there are 105 producers of this type)
 produce 17M docs/year (the aggregated production af all type A producers)
 type B
 cardinality ~10k
 produce 4M docs/year
 type C
 cardinality ~10M
 produce 9M docs/year

 I'm thinking about
 use compositeId ( solrDocId = producerId!docId ) to send all docs of the same 
 producer to the same shards. When a shard becomes too large I can use shard 
 splitting.

 problems
 -documents from type A producers could be oddly distributed among shards, 
 because hashing doesn't work well on small numbers (105) see Appendix

 As a solution I could do this when a new typeA producer (producerA1) arrives:

 1) client app: generate a producer code
 2) client app: simulate murmurhashing and shard assignment
 3) client app: check shard assignment is optimal (producer code is assigned 
 to the shard with the least type A producers) otherwise goto 1) and try with 
 another code

 when I add documents or perform searches for producerA1 I use it's producer 
 code respectively in the compositeId or in the route parameter
 What do you think?


 ---Appendix: murmurhash shard assignment 
 simulation---

 import mmh3

 hashes = [mmh3.hash(str(i))16 for i in xrange(105)]

 num_shards = 16
 shards = [0]*num_shards

 for hash in hashes:
 idx = hash % num_shards
 shards[idx] += 1

 print shards
 print sum(shards)

 -

 result: [4, 10, 6, 7, 8, 6, 7, 8, 11, 1, 8, 5, 6, 5, 5, 8]

 so with 16 shards and 105 shard keys I can have
 shards with 1 key
 shards with 11 keys



Re: Reindex of document leaves old fields behind

2015-05-21 Thread Shawn Heisey
On 5/21/2015 9:02 AM, tuxedomoon wrote:
 l If it is implicit then
 you may have indexed the new document to a different shard, which means
 that it is now in your index more than once, and which one gets returned
 may not be predictable.
 
 If a document with uniqueKey 1234 is assigned to a shard by SolrCloud,
 implicit routing won't a reindex of 1234 be assigned to the same shard? 
 If not you'd have dups all over the cluster. 

The implicit router basically means manual routing.  Whatever shard
actually receives the request will be the one that indexes it.

If you want documents automatically routed according to their hash, you
need the compositeId router.

Thanks,
Shawn



Re: Is it possible to do term Search for the filtered result set

2015-05-21 Thread Erick Erickson
Have you tried

fq=type:A

Best,
Erick

On Thu, May 21, 2015 at 5:49 AM, Danesh Kuruppu dknkuru...@gmail.com wrote:
 Hi all,

 Is it possible to do term search for the filtered result set. we can do
 term search for all documents. Can we do the term search only for the
 specified filtered result set.

 Lets says we have,

 Doc1 -- type: A
  tags: T1 T2

 Doc2 -- type: A
  tags: T1 T3

 Doc3 -- type: B
  tags: T1 T4 T5

 Can we do term search for tags only in type:A documents, So that it gives
 the results as
 T1 - 02
 T2 - 01
 T3 - 01

 Is this possible? If so can you please share documents on this.
 Thanks
 Danesh


Re: Solr suggester

2015-05-21 Thread Erick Erickson
Frankly, the suggester is rather broken in Solr 4.x with large
indexes. Building the suggester index (or FST) requires that _all_ the
docs get read, the stored fields analyzed and added to the suggester.
Unfortunately, this happens _every_ time you start Solr and can take
many minutes whether or not you have buildOnStartup set to false, see:
https://issues.apache.org/jira/browse/SOLR-6845.

See: http://lucidworks.com/blog/solr-suggester/

See inline.

On Thu, May 21, 2015 at 6:12 AM, jon kerling
jonkerl...@yahoo.com.invalid wrote:
 Hi,

 I'm using solr 4.10 and I'm trying to add autosuggest ability to my 
 application.
 I'm currently using this kind of configuration:

  searchComponent name=suggest class=solr.SuggestComponent
 lst name=suggester
   str name=namemySuggester/str
   str name=lookupImplFuzzyLookupFactory/str
   str name=storeDirsuggester_fuzzy_dir/str
   str name=dictionaryImplDocumentDictionaryFactory/str
   str name=fieldfield2/str
   str name=weightFieldweightField/str
   str name=suggestAnalyzerFieldTypetext_general/str
 /lst
 /searchComponent

   requestHandler name=/suggest class=solr.SearchHandler startup=lazy
 lst name=defaults
   str name=suggesttrue/str
   str name=suggest.count10/str
   str name=suggest.dictionarymySuggester/str
 /lst
 arr name=components
   strsuggest/str
 /arr
   /requestHandler

 I wanted to know how the suggester Index/file is being rebuilt.
 Is it suppose to have all the terms of the desired field in the suggester?
Yes.
 if not, is it related to this kind of lookup implementation?
 if I'll use other lookup implementation which suggest also infix terms of 
 fields,
 doesn't it has to hold all terms of the field?
Yes.

 When i call suggest.build, does it build from scratch the suggester 
 Index/file,
 or is it just doing something like sort of delta indexing suggestions?
Builds from scratch

 Thank You,
 Jon


Re: Is it possible to do term Search for the filtered result set

2015-05-21 Thread Upayavira
and then facet on the tags field.

facet=onfacet.field=tags

Upayavira

On Thu, May 21, 2015, at 04:34 PM, Erick Erickson wrote:
 Have you tried
 
 fq=type:A
 
 Best,
 Erick
 
 On Thu, May 21, 2015 at 5:49 AM, Danesh Kuruppu dknkuru...@gmail.com
 wrote:
  Hi all,
 
  Is it possible to do term search for the filtered result set. we can do
  term search for all documents. Can we do the term search only for the
  specified filtered result set.
 
  Lets says we have,
 
  Doc1 -- type: A
   tags: T1 T2
 
  Doc2 -- type: A
   tags: T1 T3
 
  Doc3 -- type: B
   tags: T1 T4 T5
 
  Can we do term search for tags only in type:A documents, So that it gives
  the results as
  T1 - 02
  T2 - 01
  T3 - 01
 
  Is this possible? If so can you please share documents on this.
  Thanks
  Danesh


AW: Price Range Faceting Based on Date Constraints

2015-05-21 Thread Holger Rieß
Give geospatial search a chance. Use the 'SpatialRecursivePrefixTreeFieldType' 
field type, set 'geo' to false.
The date is located on the X-axis, prices on the Y axis.
For every price you get a horizontal line between start and end date. Index a 
rectangle with height 0.001( 1 cent) and width 'end date - start date'.

Find all prices that are valid on a given day or in a given date range with the 
'geofilt' function.

The field type could look like (not tested):

fieldType name=price_date_range 
class=solr.SpatialRecursivePrefixTreeFieldType
geo=false distErrPct=0.025 maxDistErr=0.09 units=degrees
worldBounds=1 0 366 1 /

Faceting possibly can be done with a facet query for every of your price ranges.
For example day 20, price range 0-5$, rectangle: field name=pdr20.0 0.0 
21.0 5.0/field.

Regards Holger



solr uima and opennlp

2015-05-21 Thread hossmaa
Hi everyone 

I'm trying to plug in a new UIMA annotator into solr. What is necessary for
this? Is is enough to build a Jar similarly to the ones from the uima-addons
package? More specifically, are the uima-addona Jars identical to the ones
found in solr's contrib folder? 

Thanks! 
Andreea



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-uima-and-opennlp-tp4206873.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing gets significantly slower after every batch commit

2015-05-21 Thread Angel Todorov
hi Shawn,

Thanks a bunch for your feedback. I've played with the heap size, but I
don't see any improvement. Even if i index, say , a million docs, and the
throughput is about 300 docs per sec, and then I shut down solr completely
- after I start indexing again, the throughput is dropping below 300.

I should probably experiment with sharding those documents to multiple SOLR
cores - that should help, I guess. I am talking about something like this:

https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

Thanks,
Angel


On Thu, May 21, 2015 at 11:36 AM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/21/2015 2:07 AM, Angel Todorov wrote:
  I'm crawling a file system folder and indexing 10 million docs, and I am
  adding them in batches of 5000, committing every 50 000 docs. The
 problem I
  am facing is that after each commit, the documents per sec that are
 indexed
  gets less and less.
 
  If I do not commit at all, I can index those docs very quickly, and then
 I
  commit once at the end, but once i start indexing docs _after_ that (for
  example new files get added to the folder), indexing is also slowing
 down a
  lot.
 
  Is it normal that the SOLR indexing speed depends on the number of
  documents that are _already_ indexed? I think it shouldn't matter if i
  start from scratch or I index a document in a core that already has a
  couple of million docs. Looks like SOLR is either doing something in a
  linear fashion, or there is some magic config parameter that I am not
 aware
  of.
 
  I've read all perf docs, and I've tried changing mergeFactor,
  autowarmCounts, and the buffer sizes - to no avail.
 
  I am using SOLR 5.1

 Have you changed the heap size?  If you use the bin/solr script to start
 it and don't change the heap size with the -m option or another method,
 Solr 5.1 runs with a default size of 512MB, which is *very* small.

 I bet you are running into problems with frequent and then ultimately
 constant garbage collection, as Java attempts to free up enough memory
 to allow the program to continue running.  If that is what is happening,
 then eventually you will see an OutOfMemoryError exception.  The
 solution is to increase the heap size.  I would probably start with at
 least 4G for 10 million docs.

 Thanks,
 Shawn




Re: Reindex of document leaves old fields behind

2015-05-21 Thread Shawn Heisey
On 5/21/2015 9:54 AM, tuxedomoon wrote:
 I'm doing all my index to leader 1 and have not specified any router
 configuration.  But there is an equal distribution of 240M docs across 5
 shards.  I think I've been stating I have 3 shards in these posts, I have 5,
 sorry.
 
 How do I know what kind of routing I am using?  

If all your indexing is going to the same place and the docs are
distributed evenly, then quite possibly your router is compositeId.

To see for sure, go to the admin UI, click on Cloud, then Tree.  Click
the little arrow next to collections, then click on the collection
name.  In the far right pane, there will be a small snippet of JSON
below the other attributes, defining the configName and router.

Thanks,
Shawn



Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread Alessandro Benedetti
Just thinking a little bit on it, I should investigate more the .
SpatialRecursivePrefixTreeFieldType .

Each value of that field is it a Point ?
Actually each of our values must be  the rectangle.
Because the time frame and the price are a single value ( not only the
duration of the price 'end date - start date').
Could you give an example of the indexing as well ?

Cheers

2015-05-21 17:28 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com
:

 The geo-spatial idea is brilliant !
 Do you think translating the date into ms ?
 Alex, you should try that approach, it can work !

 Cheers

 2015-05-21 16:49 GMT+01:00 Holger Rieß holger.ri...@werkzeug-eylert.de:

 Give geospatial search a chance. Use the
 'SpatialRecursivePrefixTreeFieldType' field type, set 'geo' to false.
 The date is located on the X-axis, prices on the Y axis.
 For every price you get a horizontal line between start and end date.
 Index a rectangle with height 0.001( 1 cent) and width 'end date - start
 date'.

 Find all prices that are valid on a given day or in a given date range
 with the 'geofilt' function.

 The field type could look like (not tested):

 fieldType name=price_date_range
 class=solr.SpatialRecursivePrefixTreeFieldType
 geo=false distErrPct=0.025 maxDistErr=0.09
 units=degrees
 worldBounds=1 0 366 1 /

 Faceting possibly can be done with a facet query for every of your price
 ranges.
 For example day 20, price range 0-5$, rectangle: field name=pdr20.0
 0.0 21.0 5.0/field.

 Regards Holger




 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Price Range Faceting Based on Date Constraints

2015-05-21 Thread Alessandro Benedetti
The geo-spatial idea is brilliant !
Do you think translating the date into ms ?
Alex, you should try that approach, it can work !

Cheers

2015-05-21 16:49 GMT+01:00 Holger Rieß holger.ri...@werkzeug-eylert.de:

 Give geospatial search a chance. Use the
 'SpatialRecursivePrefixTreeFieldType' field type, set 'geo' to false.
 The date is located on the X-axis, prices on the Y axis.
 For every price you get a horizontal line between start and end date.
 Index a rectangle with height 0.001( 1 cent) and width 'end date - start
 date'.

 Find all prices that are valid on a given day or in a given date range
 with the 'geofilt' function.

 The field type could look like (not tested):

 fieldType name=price_date_range
 class=solr.SpatialRecursivePrefixTreeFieldType
 geo=false distErrPct=0.025 maxDistErr=0.09
 units=degrees
 worldBounds=1 0 366 1 /

 Faceting possibly can be done with a facet query for every of your price
 ranges.
 For example day 20, price range 0-5$, rectangle: field name=pdr20.0
 0.0 21.0 5.0/field.

 Regards Holger




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
OK it is composite

I've just used post.sh to index a test doc with 3 fields to leader 1 of my
SolrCloud.  I then reindexed it with 1 field removed and the query on it
shows 2 fields.   I repeated this a few times and always get the correct
field count from Solr.  

I'm now wondering if SolrJ is somehow involved in performing an atomic
update rather than replacement. I will  try the above test via SolrJ.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206886.html
Sent from the Solr - User mailing list archive at Nabble.com.