Re: Upgrade SOLR version - facets perfomance regression

2017-01-26 Thread billnbell
Are you using docvalues ? Try that it might help.

Bill Bell
Sent from mobile


> On Jan 26, 2017, at 10:38 AM, Bhawna Asnani  wrote:
> 
> Hi,
> I am experiencing a similar issue. We have tried method uif but that didn't
> help much. There is still some performance degradation.
> Perhaps some underlying changes in the lucene version its using.
> 
> Will switching to JSON facet API help in this case? We have 5 nodes/single
> shard in our production setup.
> 
> On Tue, Jan 24, 2017 at 4:34 AM, alessandro.benedetti > wrote:
> 
>> Hi Solr,
>> I admit the issue you mentioned has not been transparently solved, and
>> indeed you would need to explicitly use the method=uif to get 4.10.1
>> behavior.
>> 
>> This is valid if you were using  fc/fcs approaches with high cardinality
>> fields.
>> 
>> In the case you facet method is enum ( Term Enumeration), the issue has
>> been
>> transparently solved (
>> https://issues.apache.org/jira/browse/SOLR-9176 )
>> 
>> Cheers
>> 
>> 
>> 
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/Upgrade-SOLR-version-facets-perfomance-
>> regression-tp4315027p4315512.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 


Re: Joining Across Collections

2017-01-18 Thread billnbell
Great question 

Bill Bell
Sent from mobile


> On Jan 18, 2017, at 1:32 AM, nabil Kouici  wrote:
> 
> Hi All,
> I'm using  join across collection feature to do an inner join between 2 
> collections. It works fine.
> Is it possible to use this feature to compare between fields from different 
> collections. For exemple:
> Collection1 Field1Collection2 Field2
> search document from Collection1 where Field1 != Field2
> In sql, this will translated to:
> Select A.* From Collection1 A inner join Collection2 B on  A.id=B.idWhere 
> A.Field1<>B.Field2
> 
> Thank you.
> Regards,NKI.


Re: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread billnbell
Set mincount to 1

Bill Bell
Sent from mobile


> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer  wrote:
> 
> Pardon me, 
> the second search should have been this: 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0=json
>  
> (or in other words, give me all documents having value "1" for field 
> "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the facet.fields 
> result-count list?
> 
> 
> 
> Hi,
> 
> Please help me understand: 
> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json
>  returns:
> 
> "facet_counts":{
>"facet_queries":{},
>"facet_fields":{
>  "m_mediaType_s":[
>"2",25561,
>"3",19027,
>"10",1966,
>"11",1705,
>"12",1067,
>"4",1056,
>"5",291,
>"8",68,
>"13",2,
>"6",2,
>"7",1,
>"9",1,
>"1",0]},
>"facet_ranges":{},
>"facet_intervals":{},
>"facet_heatmaps":{}}}
> 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22=on=*:*=0=0=json
> 
> 
> ?  "response":{"numFound":25561,"start":0,"docs":[]
> 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22=on=*:*=0=0=json
> 
> 
> ?  "response":{"numFound":0,"start":0,"docs":[]
> 
> So why does the search for facet.field even contain the value "1", if it does 
> not exist?
> 
> And why does it e.g. not contain 
> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero"
>  : 0
> 
> Best regards,
> Sebastian
> 
> Additional info, field m_mediaType_s is a string;
>  stored="true" />
> 
> 


Re: SolrCloud and LVM

2017-01-09 Thread billnbell
Yeah we normally take the number of GB on a machine for the index size on disk 
and then double it for memory...

For example we have 28gb on disk and we see great perf at 64gb ram.

If you can do that you will probably get good results. Remember to not give 
Java much memory. We set it at 12gb. We call it starving Java and it reduces 
the time to garbage collect to small increments.


Bill Bell
Sent from mobile


> On Jan 9, 2017, at 7:56 AM, Chris Ulicny  wrote:
> 
> That's good to hear. I didn't think there would be any reason that using
> lvm would impact solr's performance but wanted to see if there was anything
> I've missed.
> 
> As far as other performance goes, we use pcie and sata solid state drives
> since the indexes are mostly too large to cache entirely in memory, and we
> haven't had any performance problems so far. So I'm not expecting that to
> change too much when moving the cloud architecture.
> 
> Thanks again.
> 
> 
>> On Thu, Jan 5, 2017 at 7:55 PM Shawn Heisey  wrote:
>> 
>>> On 1/5/2017 3:12 PM, Chris Ulicny wrote:
>>> Is there any known significant performance impact of running solrcloud
>> with
>>> lvm on linux?
>>> 
>>> While migrating to solrcloud we don't have the storage capacity for our
>>> expected final size, so we are planning on setting up the solrcloud
>>> instances on a logical volume that we can grow when hardware becomes
>>> available.
>> 
>> Nothing specific.  Whatever the general performance impacts for LVM are
>> is what Solr would encounter when it reads and writes data to/from the
>> disk.
>> 
>> If your system has enough memory for good performance, then disk reads
>> will be rare, so the performance of the storage volume wouldn't matter
>> much.  If you don't have enough memory, then the disk performance would
>> matter ...although Solr's performance at that point would probably be
>> bad enough that you'd be looking for ways to improve it.
>> 
>> Here's some information:
>> 
>> https://wiki.apache.org/solr/SolrPerformanceProblems
>> 
>> Exactly how much memory is enough depends on enough factors that there's
>> no good general advice.  The only thing we can say in general is to
>> recommend the ideal setup -- where you have enough spare memory that
>> your OS can cache the ENTIRE index.  The ideal setup is usually not
>> required for good performance.
>> 
>> Thanks,
>> Shawn
>> 
>> 


Available

2017-01-09 Thread billnbell
I am available for consulting projects if your project needs help.

Been doing Solr work for 6 years...

Bill Bell
Sent from mobile



Re: Question about Lucene FieldCache

2017-01-09 Thread billnbell
Try disabling and perf may get better 

Bill Bell
Sent from mobile


> On Jan 9, 2017, at 6:41 AM, Yago Riveiro  wrote:
> 
> The documentation says that the only caches configurable are:
> 
> - filterCache
> - queryResultCache
> - documentCache
> - user defined caches
> 
> There is no entry for fieldValueCache and in my case all of list in the 
> documentation are disable ...
> 
> --
> 
> /Yago Riveiro
> 
>> On 9 Jan 2017 13:20 +, Mikhail Khludnev , wrote:
>>> On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro  wrote:
>>> 
>>> Thanks for re reply Mikhail,
>>> 
>>> Do you know if the 1 value is configurable?
>> 
>> yes. in solrconfig.xml
>> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
>> iirc you cant' fully disable it setting size to 0.
>> 
>> 
>>> My insert rate is so high
>>> (5000 docs/s) that the cache it's quite useless.
>>> 
>>> In the case of the Lucene field cache, it's possible "clean" it in some
>>> way?
>>> 
>>> Even it would be possible, the first sorting query or so loads it back.
>> 
>>> Some cache is eating my memory heap.
>>> 
>> Probably you need to dedicate master which won't load FieldCache.
>> 
>> 
>>> 
>>> 
>>> 
>>> -
>>> Best regards
>>> 
>>> /Yago
>>> --
>>> View this message in context: http://lucene.472066.n3.
>>> nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread billnbell
Can you set Solr config segments to a higher number, don't optimize and you 
will get smaller files after a new index is created.

Can you reindex ?

Bill Bell
Sent from mobile


> On Jan 9, 2017, at 7:15 AM, Narsimha Reddy CHALLA  
> wrote:
> 
> No, it does not work by splitting. First of all lucene index files are not
> text files. There is a segment_NN file which will refer index files in a
> commit. So, when we split a large index file into smaller ones, the
> corresponding segment_NN file also needs to be updated with new index files
> OR a new segment_NN file should be created, probably.
> 
> Can someone who is familiar with lucene index files please help us in this
> regard?
> 
> Thanks
> NRC
> 
> On Mon, Jan 9, 2017 at 7:38 PM, Manan Sheth 
> wrote:
> 
>> Is this really works for lucene index files?
>> 
>> Thanks,
>> Manan Sheth
>> 
>> From: Moenieb Davids 
>> Sent: Monday, January 9, 2017 7:36 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Help needed in breaking large index file into smaller ones
>> 
>> Hi,
>> 
>> Try split on linux or unix
>> 
>> split -l 100 originalfile.csv
>> this will split a file into 100 lines each
>> 
>> see other options for how to split like size
>> 
>> 
>> -Original Message-
>> From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com]
>> Sent: 09 January 2017 12:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Help needed in breaking large index file into smaller ones
>> 
>> Hi All,
>> 
>>  My solr server has a few large index files (say ~10G). I am looking
>> for some help on breaking them it into smaller ones (each < 4G) to satisfy
>> my application requirements. Are there any such tools available?
>> 
>> Appreciate your help.
>> 
>> Thanks
>> NRC
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ===
>> GPAA e-mail Disclaimers and confidential note
>> 
>> This e-mail is intended for the exclusive use of the addressee only.
>> If you are not the intended recipient, you should not use the contents
>> or disclose them to any other person. Please notify the sender immediately
>> and delete the e-mail. This e-mail is not intended nor
>> shall it be taken to create any legal relations, contractual or otherwise.
>> Legally binding obligations can only arise for the GPAA by means of
>> a written instrument signed by an authorised signatory.
>> 
>> ===
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>> 


Re: Solr 6 Default Core URL

2016-12-13 Thread billnbell
Yes with Nginx in front of it 

Bill Bell
Sent from mobile


> On Dec 13, 2016, at 2:54 PM, Max Bridgewater  
> wrote:
> 
> I have one Solr core on my solr 6 instance and I can query it with:
> 
> http://localhost:8983/solr/mycore/search?q=*:*
> 
> Is there a way to configure solr 6 so that I can simply query it with this
> simple URL?
> 
> http://localhost:8983/search?q=*:*
> 
> 
> Thanks.
> Max,


Re: Memory leak in Solr

2016-12-03 Thread billnbell
What tool is that ? The stats I would like to run on my Solr instance 

Bill Bell
Sent from mobile


> On Dec 2, 2016, at 4:49 PM, Shawn Heisey  wrote:
> 
>> On 12/2/2016 12:01 PM, S G wrote:
>> This post shows some stats on Solr which indicate that there might be a
>> memory leak in there.
>> 
>> http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr
>> 
>> Can someone please help to debug this?
>> It might be a very good step in making Solr stable if we can fix this.
> 
> +1 to what Walter said.
> 
> I replied earlier on the stackoverflow question.
> 
> FYI -- your 95th percentile request time of about 16 milliseconds is NOT
> something that I would characterize as "very high."  I would *love* to
> have statistics that good.
> 
> Even your 99th percentile request time is not much more than a full
> second.  If a search takes a couple of seconds, most users will not
> really care, and some might not even notice.  It's when a large
> percentage of queries start taking several seconds that complaints start
> coming in.  On your system, 99 percent of your queries are completing in
> 1.3 seconds or less, and 95 percent of them are less than 17
> milliseconds.  That sounds quite good to me.
> 
> In my experience, the time it takes for the browser to receive the
> search result page and render it is a significant part of the total time
> to see results, and often dwarfs the time spent getting info from Solr.
> 
> Here's some numbers from Solr in my organization:
> 
> requests:   4102054
> errors: 364894
> timeouts:   49
> totalTime:  799446287.45041
> avgRequestsPerSecond:   1.2375565828793849
> 5minRateReqsPerSecond:  0.8444329508327961
> 15minRateReqsPerSecond: 0.8631197328073346
> avgTimePerRequest:  194.88926460997587
> medianRequestTime:  20.8566605
> 75thPcRequestTime:  85.5132884999
> 95thPcRequestTime:  2202.27746654
> 99thPcRequestTime:  5280.375381280002
> 999thPcRequestTime: 6866.020122961001
> 
> The numbers above come from a distributed index that contains 167
> million documents and takes up about 200GB of disk space across two
> machines.
> 
> requests:   192683
> errors: 124
> timeouts:   0
> totalTime:  199380421.985073
> avgRequestsPerSecond0.04722771354554
> 5minRateReqsPerSecon0.00800545427600684
> 15minRateReqsPerSecond: 0.017521222412364163
> avgTimePerRequest:  1034.7587591280653
> medianRequestTime:  541.591858
> 75thPcRequestTime:  1683.83246125
> 95thPcRequestTime:  5644.542019949997
> 99thPcRequestTime:  9445.592394760004
> 999thPcRequestTime: 14602.166640771007
> 
> These numbers are from an index with about 394 million documents, taking
> up nearly 500GB of disk space.  This index is also distributed on
> multiple machines.
> 
> Are you experiencing any problems other than what you perceive as slow
> queries?  I asked some other questions on stackoverflow.  In particular,
> I'd like to know the total memory on the server, the total number of
> documents (maxDoc and numDoc) you're handling with this server, as well
> as the total index size.  What do your queries look like?  What version
> and vendor of Java are you using?  Can you share your config/schema?
> 
> A memory leak is very unlikely, unless your Java or your operating
> system is broken.  I can't say for sure that it's not happening, but
> it's just not something we see around here.
> 
> Here's what I have collected on performance issues in Solr.  This page
> does mostly concern itself with memory, though it touches briefly on
> other topics:
> 
> https://wiki.apache.org/solr/SolrPerformanceProblems
> 
> Thanks,
> Shawn
> 


Re: How to implement a custom boost function

2016-10-04 Thread billnbell
You can pahinaye and sort as long as it is the same each time. Sort can be a 
function value too. I.e. Sort=geodist() asc...

bq can also boost based on a field name 



Bill Bell
Sent from mobile


> On Oct 3, 2016, at 11:28 PM, Lucas Cotta  wrote:
> 
> Hi Walter, unfortunately I use pagination so that would not be possible..
> 
> Thanks
> 
> 2016-10-04 0:51 GMT-03:00 Walter Underwood :
> 
>> How about sorting them after you get them back from Solr?
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Oct 3, 2016, at 6:45 PM, Lucas Cotta  wrote:
>>> 
>>> I actually could also use a custom similarity class that always returns
>> 1.0
>>> then I could use small boost factors such as ^1, ^2, ^3, etc.
>>> 
>>> But I want to do this only in some specific queries (that may contain
>> other
>>> fields besides studentId)
>>> 
>>> How could I do this, use the custom similarity class only for some
>> queries?
>>> Is it possible?
>>> 
>>> Thanks!
>>> 
>>> 2016-10-03 19:49 GMT-03:00 Lucas Cotta :
>>> 
 Hello,
 
 I'm new in Solr (4.7.2) and I was given the following requirement:
 
 Given a query such as:
 
 studentId:(875141 OR 873071 OR 875198 OR 108142 OR 918841 OR 870688 OR
 107920 OR 870637 OR 870636 OR 870635 OR 918792 OR 107721 OR 875078 OR
 875166 OR 875151 OR 918829 OR 918808)
 
 I want the results to be ordered by the same order the elements were
 informed in the query. This would be similar to MySQL's ORDER BY
 FIELD(id, 3,2,5,7,8,1).
 
 I have tried to use term boosting
 > Boosting_Ranking_Terms>
 in the query but that only works when I use big factors like this:
>> 875078^10
 OR 875166^1 OR 875151^1000 OR 918829^100OR 918808^10
 
 But that would cause the query to be too big in case I have 200 ids for
 instance.
 
 So it seems I need to implement a custom FunctionQuery.
 I'm a little lost on how to do that. Could someone please give me an
>> idea?
 Which classes should my custom class extend from? Where should I place
>> this
 class? Should I add to Solr project it self and regenerate the JAR?
 
 Thanks
 
>> 
>> 


Re: Miserable Experience Using Solr. Again.

2016-09-12 Thread billnbell
Interested for sure

Bill Bell
Sent from mobile


> On Sep 12, 2016, at 4:05 PM, John Bickerstaff  
> wrote:
> 
> For what it's worth - I found enough frustration upgrading that I decided
> to "upgrade by replacement"
> 
> Now, I suppose if you've got a huge dataset to re-index that could be a
> problem, but just in case an option like that helps you, I'll suggest this.
> 
> 1. Install 6.x on a new machine using the "install for production"
> instructions
> 2. Use the configs from one of the sample projects to create an
> appropriately-named collection
> 3. Use the ability to "include" your configs into the other configs (they
> live in separate files)
>  I can provide more help here if you're interested
> 4. Re-index all your data into the new version of SOLR...
> 
> I have rough, but useable docs on this if you are interested in attempting
> this approach.
> 
> On Mon, Sep 12, 2016 at 3:48 PM, Aaron Greenspan <
> aaron.greens...@plainsite.org> wrote:
> 
>> Hi,
>> 
>> I have been on this list for some time because I know that any time I try
>> to do anything related to Solr I’m going to have to spend hours on it,
>> wondering why everything has to be so awful, and I just want somewhere to
>> provide feedback with the dim hope that the product might improve one day.
>> (So far, for my purposes, it hasn’t.) Sure enough, I still absolutely hate
>> using Solr, and I have more feedback.
>> 
>> I started with a confusing error on the web console, which I still can’t
>> figure out how to password protect without going through an insanely
>> process involving "ZooKeeper," which I don’t know anything about, or have,
>> to the best of my knowledge:
>> 
>> Problem accessing /solr/. Reason:
>> 
>>Forbidden
>> 
>> According to logs, this apparently meant that a MySQL query had failed due
>> to a field name change. Since I would have to change my XML configuration
>> files, I decided to use the opportunity to upgrade from Solr 5.1.4 to
>> 6.2.0. It broke everything.
>> 
>> First I was getting errors about "Unsupported major.minor version 52.0",
>> so I needed to install the Linux x64 JRE 1.8.0, which I managed on CentOS 6
>> with...
>> 
>> yum install openjdk-1.8.0
>> 
>> ...going to Oracle’s web site, downloading the latest JRE 1.8 build, and
>> then running...
>> 
>> yum localinstall jre-8u101-linux-x64.rpm
>> 
>> So far so good. But I didn’t have JAVA_HOME set properly apparently, so I
>> needed to do the not-exactly-intuitive…
>> 
>> export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.
>> el6_8.x86_64/jre/
>> 
>> As usual, I manually moved over my mysql-connector-java-5.1.38-bin.jar
>> file from the dist/ folder in the old version to the new one. Then after
>> stopping the old process (with kill -9, since there seems to be no graceful
>> way to shut down Solr—README.txt doesn’t mention bin/solr stop) I moved
>> over my two core folders from the old server/solr/ folder. I tried to start
>> it up with bin/solr start, and watched the errors roll in.
>> 
>> There was some kind of problem with StopFilterFactory and the text_general
>> field type. Thanks to Stack Overflow I was able to determine that the
>> apparent problem was that there was a parameter, previously fine, which was
>> no longer fine. So I removed all instances of 
>> enablePositionIncrements="true".
>> That helped, but then I ran into a broader error: "Plugin Initializing
>> failure for [schema.xml] fieldType". It didn’t say which field type. Buried
>> in the logs I found a reference in the Java stack trace—which *disappears*
>> (and distorts the viewing window horribly) after a few seconds when you try
>> to view it in the web log UI—to the string "units="degrees"". Sure enough,
>> this string appeared in my schema.xml for a class called "solr.
>> SpatialRecursivePrefixTreeFieldType" that I’m pretty sure I never use. I
>> removed that parameter, and moved on to the next set of errors.
>> 
>> Apparently there is some aspect of the Thai text field type that Solr
>> 6.2.0 doesn’t like. So I disabled it. I don’t use Thai text.
>> 
>> Now Solr was complaining about "Error loading class
>> 'solr.admin.AdminHandlers'". So I found the reference to
>> solr.admin.AdminHandlers in solrconfig.xml for each of my cores and
>> commented it out. Only then did Solr work again.
>> 
>> This was not a smooth process. It took about two hours. The user interface
>> is still as buggy as an early alpha of most products, the errors are
>> difficult to understand when they don’t actually specify what’s wrong (and
>> they almost never do), and there should have been an automatic process to
>> highlight and fix problems in old (pre-6) configuration files. Never mind
>> the fact that the XML-based configuration process is an antiquated
>> nightmare when the rest of the world has long since moved onto databases.
>> 
>> Maybe this will help someone else out there.
>> 
>> Aaron
>> 
>> PlainSite | http://www.plainsite.org


Re: Cache problem

2016-04-11 Thread billnbell
You do need to optimize to get rid of the deleted docs probably...

That is a lot of deleted docs

Bill Bell
Sent from mobile


> On Apr 11, 2016, at 7:39 AM, Bastien Latard - MDPI AG 
>  wrote:
> 
> Dear Solr experts :),
> 
> I read this very interesting post 'Understanding and tuning your Solr caches' 
> !
> This is the only good document that I was able to find after searching for 1 
> day!
> 
> I was using Solr for 2 years without knowing in details what it was 
> caching...(because I did not need to understand it before).
> I had to take a look since I needed to restart (regularly) my tomcat in order 
> to improve performances...
> 
> But I now have 2 questions: 
> 1) How can I know how much RAM is my solr using in real (especially for 
> caching)?
> 2) Could you have a quick look into the following images and tell me if I'm 
> doing something wrong?
> 
> Note: my index contains 66 millions of articles with several text fields 
> stored.
> 
> 
> My solr contains several cores (all together are ~80Gb big), but almost only 
> the one below is used.
> 
> I have the feeling that a lot of data is always stored in RAM...and getting 
> bigger and bigger all the time...
> 
> 
> 
> 
> (after restart)
> $ sudo tail -f /var/log/tomcat7/catalina.out | grep GC
> 
> [...] after a few minutes
> 
> 
> Here are some images, that can show you some stats about my Solr 
> performances...
> 
> 
> 
> 
> 
> 
> Kind regards,
> Bastien Latard
> 
> 


Re: Soft commit does not affecting query performance

2016-04-11 Thread billnbell
Why do you think it would ?

Bill Bell
Sent from mobile


> On Apr 11, 2016, at 7:48 AM, Bhaumik Joshi  wrote:
> 
> Hi All,
> 
> We are doing query performance test with different soft commit intervals. In 
> the test with 1sec of soft commit interval and 1min of soft commit interval 
> we didn't notice any improvement in query timings.
> 
> 
> 
> We did test with SolrMeter (Standalone java tool for stress tests with Solr) 
> for 1sec soft commit and 1min soft commit.
> 
> Index stats of test solr cloud: 0.7 million documents and 1 GB index size.
> 
> Solr cloud has 2 shard and each shard has one replica.
> 
> 
> 
> Please find below detailed test readings: (all timings are in milliseconds)
> 
> 
> Soft commit - 1sec
> Queries per sec Updates per sec   Total Queries   
>   Total Q time   Avg Q Time Total Client time 
>   Avg Client time
> 1  5  
> 100 44340 
>443 48834
> 488
> 5  5  
> 101 128914
>   1276   143239  1418
> 10   5
>   104 295325  
> 2839   330931  3182
> 25   5
>   102 675319  
> 6620   793874  7783
> 
> Soft commit - 1min
> Queries per sec Updates per sec   Total Queries   
>   Total Q time   Avg Q Time Total Client time 
>   Avg Client time
> 1  5  
> 100 44292 
>442 48569
> 485
> 5  5  
> 105 131389
>   1251   147174  1401
> 10   5
>   102 299518  
> 2936   337748  3311
> 25   5
>   108 742639  
> 6876   865222  8011
> 
> As theory suggests soft commit affects query performance but in my case it 
> doesn't. Can you put some light on this?
> Also suggest if I am missing something here.
> 
> Regards,
> Bhaumik Joshi
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> [Asite]
> 
> The Hyperloop Station Design Competition - A 48hr design collaboration, from 
> mid-day, 23rd May 2016.
> REGISTER HERE http://www.buildearthlive.com/hyperloop
> 
> [Build Earth Live Hyperloop]
> 
> [CC Award Winners 2015]


Re: Sorting question

2016-04-01 Thread billnbell
Put the match into 2 separate fields and index it. Then sort in Solr by the 2 
fields is one way 

Bill Bell
Sent from mobile


> On Apr 1, 2016, at 11:15 AM, John Bickerstaff  
> wrote:
> 
> Just to be clear - I don't mean who requests the list (application or user)
> I mean what "rule" determines the ordering of the list?
> 
> Or, is there even a rule of any kind?
> 
> In other words, does a user arbitrarily decide that documentA, documentF,
> and documentW should be on a list of their own?  For reasons known only to
> the user?
> 
> Or - does the ordering of the list depend on some piece of data?  (like a
> date, or a manufacturer, or a price range or any other piece of "hard" data)
> 
> ===
> 
> To give an example from what I'm working on right now --
> 
> My subject matter experts have given me a rule that says:
> 
> *Documents of  content_type "bar" should come higher in the results than
> documents of content_type "foo".*
> 
> PsuedoCode: If (content_type == bar) then put this doc highest in the
> results.  If (content_type == foo) put those docs after the "bar"
> content_type docs.
> 
> 
> This is an example of the ordering being tied to a specific piece of data
> which I can manipulate in a "sub query"  (that's probably the wrong term...)
> 
> 
> This isn't exactly what you're doing, but it's close -- IF you have rules
> you can express clearly in this way...
> 
> ---
> 
> Also, I'm confused a little by your statement that SOLR does the filtering
> and pagination, thus you can't sort the documents after Solr returns them...
> 
> My mental model is that you ask Solr for all the documents that match a
> certain criteria.  Solr returns that "set" of documents and then for your
> list, you sort those document titles or ID's according to some rule --
> possibly in the javascript on the web page...  But perhaps I'm not
> understanding your situation well enough...
> 
> Oh - are you perhaps saying that your ONLY criteria for getting these
> documents is the list number?  That would make sense, although there may
> still be room for sorting based on some kind of logic / data point outside
> of SOlR.  You could get all the documents associated to list #4, and then
> sort them based on some hard data point they all contain.  At the very
> least, your listpos "array" becomes simpler...
> 
> What does your query currently look like?
> 
>> On Fri, Apr 1, 2016 at 10:51 AM, Tamás Barta  wrote:
>> 
>> Some of the lists are created by users and some are generated by
>> applications, it doesn't matter.
>> 
>> It would be fine to solve it in Solr because Solr does the work of
>> filtering and pagination. If sorting were done outside than I would have to
>> read every document from Solr to sort them. It is not an option, I have to
>> query onle one page.
>> 
>> I don't understand how to solve it using subqueries.
>> 2016. ápr. 1. 18:42 ezt írta ("John Bickerstaff" >> ):
>> 
>>> Specifically, what drives the position in the list?  Is it arbitrary or
>> is
>>> it driven by some piece of data?
>>> 
>>> If data-driven - code could do the sorting based on that data...
>> separate
>>> from SOLR...
>>> 
>>> Alternatively, if the data point exists in SOLR, a "sub-query" might be
>>> used to get the right sort order on the items returned by the "main"
>>> search...  Possibly without having to resort to the clunky-feeling
>> listpos
>>> multivalued field...
>>> 
 On Fri, Apr 1, 2016 at 10:32 AM, Tamás Barta 
>>> wrote:
>>> 
 For example I have to display sellable products which are in list X in
>>> the
 correct order.
 
 If I add a "status" and "list" (multivalued) fields to every document
 (products), then I can execute a query: status:sellable AND list:X,
>>> where X
 is the ID of the list. The list field contains IDs of the list in which
>>> the
 product is in.
 
 The problem is that I can't sort the result. A product has different
>>> index
 for every list.
 
 Is it clear now?
 
 Earlier I added a "listpos" field with multivalue content, for example:
 
 1:23
 2:4
 
 Which means that this product is in position 23 in list 1 and it is in
 position 4 in list 2. After that I created a custom comparator which
>>> parses
 field values to get index for the specified list and sorts by that
>> index.
 
 But I didn't like that solution much. I wish there would be a better
 solution. In SolrJ unfortunately I can't find an API to set custom
 comparator like I did in Lucene. So I don't know how to solve this
>>> problem
 in Solr.
 
 Thanks,
 Tamás
 2016. ápr. 1. 17:25 ezt írta ("Alessandro Benedetti" <
 abenede...@apache.org
> ):
 
> I think this is a classic XY Problem , you are trying to solve X with
>>> Y ,
> and you are asking us about Y .
> Could you describe us what is your X problem ? 

Re: Solrcloud for Java 1.6

2016-01-07 Thread billnbell
Run it on 2 separate boxes

Bill Bell
Sent from mobile


> On Jan 7, 2016, at 3:11 PM, Aswath Srinivasan (TMS) 
>  wrote:
> 
> Hi fellow developers,
> 
> I have a situation where the search front-end application is using java 1.6. 
> Upgrading Java version is out of the question.
> 
> Planning to use Solrcloud 5.x version for the search implementation. The show 
> stopper here is, solrj for solrcloud needs atleast java1.7
> 
> What best can be done to use the latest version of solrcloud and solrj for a 
> portal that runs on java 1.6?
> 
> I was thinking, in solrj, instead of using zookeeper (which also acts as the 
> load balancer) I can mention the exact replica's http://solr-cloud-HOST:PORT 
> pairs using some kind of round-robin with some external load balancer.
> 
> Any suggestion is highly appreciated.
> 
> Aswath NS
> 


Re: Issue with if() statement

2015-12-31 Thread billnbell
Solr 5.3.1

Bill Bell
Sent from mobile


> On Dec 31, 2015, at 4:50 PM, William Bell  wrote:
> 
> We are getting weird results with if(exists(a),b,c). We are getting b+c!!
> 
> http://localhost:8983/solr/providersearch/select?q=*:*=json=state:%22CO%22=state:%22NY%22=if(exists(query($state1)),{!lucene%20v=$state1},{!lucene%20v=$state})
> 
> I am getting NY and CO!
> 
> I only want $state1, which is NY.
> 
> Any other ways to craft this?
> 
> 
> -- 
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076


Re: fl=value equals?

2015-11-12 Thread billnbell
fl=$b tells me it works. Or I can do a sort=$b asc

The idea is to calculate a score but only include geo if it is not a national 
search. Do we want to send in a parameter into the QT which allows us to omit 
geo from national searches


Bill Bell
Sent from mobile

> On Nov 11, 2015, at 1:15 AM, Upayavira  wrote:
> 
> I concur with Jan - what does b= do? 
> 
> Also asking, how did you identify that it worked?
> 
> Upayavira
> 
>> On Wed, Nov 11, 2015, at 02:58 AM, William Bell wrote:
>> I was able to get it to work kinda with a map().
>> 
>> http://localhost:8983/solr/select?q=*:*=1=
>> 
>> map($radius,1,1,0,geodist())
>> 
>> Where 1= National
>> 
>> Do you have an example of a SearchComponent? It would be pretty easy to
>> copy map() and develop an equals() right?
>> 
>> if(equals($radius, 'national'), 0, geodist())
>> 
>> This would probably be useful for everyone.
>> 
>> On Tue, Nov 10, 2015 at 4:05 PM, Jan Høydahl 
>> wrote:
>> 
>>> Where is your “b” parameter used? I think that instead of trying to set a
>>> new “b” http param (which solr will not evaluate as a function), you should
>>> instead try to insert your function or switch qParser directly where the
>>> “b” param is used, e.g. in a bq or similar.
>>> 
>>> A bit heavy weight, but you could of course write a custom SearchComponent
>>> to construct your “b” parameter...
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
 10. nov. 2015 kl. 23.52 skrev William Bell :
 
 We are trying to look at a value, and change another value based on that.
 
 For example, for national search we want to pass in radius=national, and
 then set another variable equal to 0, else set the other variable = to
 geodist() calculation.
 
 We tried {!switch} but this only appears to work on fq/q. There is no
 function for constants for equals
>>> http://localhost:8983/solr/select?q=*:*=national=if(equals($radius,'national'),0,geodist())
 
 This does not work:
 
 http://localhost:8983/solr/select?q=*:*=national={!switch
 case.national=0 default=geodist() v=$radius}
 
 Ideas?
 
 
 
 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076
>> 
>> 
>> -- 
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076


Re: Different ports for search and upload request

2015-09-24 Thread billnbell
Scary stuff 

If you did that you better reload the core 

Bill Bell
Sent from mobile


> On Sep 24, 2015, at 5:05 PM, Siddhartha Singh Sandhu  
> wrote:
> 
> Thank you so much.
> 
> Safe to ignore the following(not a query):-
> 
> *Never did this. *But how about this crazy idea:
> 
> Take an Amazon EFS and share it between 2 EC2. Use one EC2 endpt to update
> the index on EFS while the other reads from it. This way each EC2 can use
> its own compute and not share its resources among-st solr threads.
> 
> Regards,
> Sid.
> 
>> On Thu, Sep 24, 2015 at 5:17 PM, Shawn Heisey  wrote:
>> 
>>> On 9/24/2015 2:01 PM, Siddhartha Singh Sandhu wrote:
>>> I wanted to know if we can configure different ports as end points for
>>> uploading and searching API. Also, if someone could point me in the right
>>> direction.
>> 
>> From our perspective, no.
>> 
>> I have no idea whether it is possible at all ... it might be something
>> that a servlet container expert could figure out, or it might require
>> code changes to Solr itself.
>> 
>> You probably need another mailing list specifically for the container.
>> For virtually all 5.x installs, the container is Jetty.  In earlier
>> versions, it could be any container.
>> 
>> Another possibility would be putting an intelligent proxy in front of
>> Solr and having it only accept certain handler paths on certain ports,
>> then forward them to the common port on the Solr server.
>> 
>> If you did manage to do this, it would require custom client code.  None
>> of the Solr clients for programming languages have a facility for
>> separate ports.
>> 
>> Thanks,
>> Shawn
>> 
>> 


Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread billnbell
Can we add it back with a parameter at least ?

Bill Bell
Sent from mobile


> On Sep 24, 2015, at 8:58 AM, Yonik Seeley  wrote:
> 
>> On Mon, Sep 21, 2015 at 8:09 AM, Uwe Reh  wrote:
>> our bibliographic index (~20M entries) runs fine with Solr 4.10.3
>> With Solr 5.3 faceted searching is constantly incredibly slow (~ 20 seconds)
> [...]
>> 
>> The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
>> 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
>> cumulative_hitratio of 1.
> 
> 
> Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
> removed as part of LUCENE-5666, causing these performance regressions.
> 
> This code had been evolved over years to be very fast for specific use
> cases.  No one facet algorithm is going to be optimal for everyone, so
> it's important we have multiple.  But use of the UnInvertedField was
> removed without any notification or discussion whatsoever (and
> obviously no benchmarking), and was only discovered later by Solr devs
> in SOLR-7190 that it was essentially dead code.
> 
> 
> When I brought back my "JSON Facet API" work to Solr (which was based
> on 4.10.x) it came with a heavily modified version of UnInvertedField
> that is available via the JSON Facet API.  It might currently work
> better for your usecase.
> 
> On your normal (non-docValues) index, you can try something like the
> following to see what the performance would be:
> 
> $ curl http://yxz/solr/hebis/query -d 'q=darwin&
> json.facet={
>  authors : { type:terms, field:author_facet, limit:30 },
>  material_access : { type:terms, field:material_access, limit:30 },
>  material_brief : { type:terms, field:material_brief, limit:30 },
>  rvk : { type:terms, field:rvk_facet, limit:30 },
>  lang : { type:terms, field:language, limit:30 },
>  dept : { type:terms, field:department_3, limit:30 }
> }'
> 
> There were other changes in LUCENE-5666 that will probably slow down
> faceting on the single valued fields as well (so this may still be a
> little slower than 4.10.x), but hopefully it would be more
> competitive.
> 
> -Yonik


Re: Dismax and StandardTokenizer: OR queries despite mm=100%

2015-09-23 Thread billnbell
Use fq 

Bill Bell
Sent from mobile


> On Sep 23, 2015, at 1:00 PM, Andreas Hubold  
> wrote:
> 
> Hi,
> 
> we're using Solr 4.10.4 and the dismax query parser to search across multiple 
> fields. One of the fields is configured with a StandardTokenizer (type 
> "text_general"). I set mm=100% to only get hits that match all terms.
> 
> This does not seem to work for queries that are split into multiple tokens. 
> For example a query for "CC-WAV-001" (tokenized to "cc", "wav", "001") 
> returns documents that only have "cc" in it. I need a result with documents 
> that contains all tokens - as returned by the /select handler.
> 
> Is there a way to force AND semantics for such dismax queries? I also tried 
> to set q.op=AND but it did not help.
> 
> The query is parsed as:
> 
> (+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) | 
> productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav 
> 001")~0.1))/no_coord
> 
> Thanks in advance!
> 
> Regards,
> Andreas


Easier way to do this?

2013-04-11 Thread Billnbell
I would love for the SOLR spatial 4 to support pt so that I can run # of
results around a central point easily like in 3.6. How can I pass parameters
to a Circle() ? I would love to send PT to this query since the pt is the
same across multiple areas

For example:

http://localhost:8983/solr/core/select?rows=0q=*:*facet=truefacet.query={!
key=.5}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.0072369))%22facet.query={!
key=1}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.01447))%22facet.query={!
key=5}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.0723))%22facet.query={!
key=10}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.1447))%22{!
key=25}facet.query=store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.361846))%22facet.query={!
key=50}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.72369))%22facet.query={!
key=100}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=1.447))%22



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Easier-way-to-do-this-tp4055474.html
Sent from the Solr - User mailing list archive at Nabble.com.


Support old syntax including geodist

2013-04-11 Thread Billnbell
Since Spatial Lucene 4 does not seem to support geodist(), even sending
d,pt,fq={!geofilt}does not help me = I need to sort. So I end up having to
set up the sortsq. Any other ideas on how to support the old syntax on the
new spatial? Can I create a transform or something ?

Convert

http://localhost:8983/solr/providersearch/select?rows=20q=*:*fq={!geofilt}pt=26.012156,-80.311943d=50sfield=store_geohashsort=geodist()
asc

To
 

http://localhost:8983/solr/providersearch/select?rows=20q=*:*fq={!%20v=$geoq}sortsq={!%20score=distance%20v=$geoq}geoq=store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.72369))%22fl=store_lat_lon,distance:mul(query($sortsq),69.09)sort=query($sortsq)%20asc
 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-old-syntax-including-geodist-tp4055476.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: recommended SSD

2012-09-01 Thread Billnbell
Ssd does not always improve performance of Solr. We tried several SSD and we
saw an improvement of only a few percentages in query qtime, etc...

It is definitely not always a slam dunk



--
View this message in context: 
http://lucene.472066.n3.nabble.com/recommended-SSD-tp4002975p4004898.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: recommended SSD

2012-09-01 Thread Billnbell
Ssd does not always improve performance of Solr. We tried several SSD and we
saw an improvement of only a few percentages in query qtime, etc...

It is definitely not always a slam dunk



--
View this message in context: 
http://lucene.472066.n3.nabble.com/recommended-SSD-tp4002975p4004899.html
Sent from the Solr - User mailing list archive at Nabble.com.