Re: SolrJ getHighlighting() does not return results in order

2017-03-22 Thread Bryan Bende
Hello,

I believe getHighlighting() returns Map>> .

Generally Maps are not expected to iterate in order unless you know
the underlying implementation of the Map, for example LinkedHashMap
will iterate in the insertion order and HashMap will not.

You should be able to take the doc id from one of the results in the
document list and then do getHighlighting().get(docid) to get the
Map> for the given
document.

Hope that helps.

-Bryan


On Wed, Mar 22, 2017 at 8:54 AM, leoperezpulido
 wrote:
> Hi,
>
> Implementing highlighting with *SolrJ* does not return results in the proper
> order while I "page" through results. This not seems to be a problem with
> the RESTful API.
>
> // ...
> query.setQuery("text");
> /*
> The problem is when I set start to get different "pages",
> the results returned by getHighlighting() are disordered.
> */
> query.setStart(0);
> query.setSort("score", SolrQuery.ORDER.desc);
> query.setIncludeScore(true);
>
> query.setHighlight(true);
> query.addHightlightField("content");
> // ...
>
> Take the example of a simple index with a field named content and field's
> values like:
> Document 1
> Document 2
> Document 3
> etc.
>
> With the results returned by SolrDocumentList and with the RESTful API, I
> can paginate in the normal way, and the results remain ordered. This is not
> the case when I get results from getHighlighting().
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrJ-getHighlighting-does-not-return-results-in-order-tp4326218.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr6.3.0 SolrJ API for Basic Authentication

2017-02-16 Thread Bryan Bende
Hello,

The QueryRequest was just an example, it will work with any request
that extends SolrRequest.

How are you indexing your documents?

I am going to assume you are doing something like this:

SolrClient client = ...
client.add(solrInputDocument);

Behind the scenes this will do something like the following:

UpdateRequest req = new UpdateRequest();
req.add(doc);
req.setCommitWithin(commitWithinMs);
req.process(client, collection);

So you can do that your self and first set the basic auth credentials
on the UpdateRequest which extends SolrRequest.

Thanks,

Bryan

On Thu, Feb 16, 2017 at 5:45 AM, vrindavda  wrote:
> Hi Bryan,
>
> Thanks for your quick response.
>
> I am trying to ingest data into SolrCloud,  Hence I will not have any solr
> query. Will it be right approach to use QueryRequest to index data ? Do I
> need to put any dummy solrQuery instead ?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr6-3-0-SolrJ-API-for-Basic-Authentication-tp4320238p4320675.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr6.3.0 SolrJ API for Basic Authentication

2017-02-14 Thread Bryan Bende
Hello,

The exception you are getting looks more like you can't connect to the
IP address from where your SolrJ code is running, but not sure.

For the basic credentials, rather than trying to do something with the
http client, you can provide them on the request like this:

QueryRequest req = new QueryRequest(solrQuery);
req.setBasicAuthCredentials(username, password);

The setBasicAuthCredentials method is part of SolrRequest, so any
request that extends it should allow the credentials to be set.

-Bryan


On Tue, Feb 14, 2017 at 6:27 AM, vrindavda  wrote:
> Hello ,
>
> I am trying to connect SolrCloud using SolrJ API using following code :
>
>   String zkHostString = "localhost:9983";
>   String USER = "solr";
>   String PASSWORD = "SolrRocks";
>
>
>   CredentialsProvider credentialsProvider = new
> BasicCredentialsProvider();
>   credentialsProvider.setCredentials(AuthScope.ANY, new
> UsernamePasswordCredentials(USER, PASSWORD));
>   CloseableHttpClient httpClient =
> HttpClientBuilder.create().setDefaultCredentialsProvider(credentialsProvider).build();
>
>   CloudSolrClient solr = new
> CloudSolrClient.Builder().withZkHost(zkHostString).withHttpClient(httpClient).build();
>   ((CloudSolrClient)solr).setDefaultCollection("gettingstarted");
>
>
>
>
> But getting Error As :
>
>
> Exception in thread "main"
> org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException:
> IOException occured when talking to server at:
> http://192.168.0.104:8983/solr/gettingstarted_shard2_replica1
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:767)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1173)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1062)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1004)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:190)
> at com.app.graphiti.TextParser.main(TextParser.java:92)
> Caused by: org.apache.solr.client.solrj.SolrServerException: IOException
> occured when talking to server at:
> http://192.168.0.104:8983/solr/gettingstarted_shard2_replica1
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
> at
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:435)
> at
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:387)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.lambda$directUpdate$0(CloudSolrClient.java:742)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.http.client.ClientProtocolException
> at
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:498)
> ... 10 more
> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity.
> at
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:225)
> at
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
> at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
> at
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
> at
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
> ... 13 more
> 16:55:40.289 [main-SendThread(0:0:0:0:0:0:0:1:9983)] DEBUG
> org.apache.zookeeper.ClientCnxn - Got ping response for sessionid:
> 0x15a3bc76e1f000e after 1ms
> 16:55:43.624 [main-SendThread(0:0:0:0:0:0:0:1:9983)] DEBUG
> org.apache.zookeeper.ClientCnxn - Got ping response for sessionid:
> 0x15a3bc76e1f000e after 1ms
> 16:55:46.958 [main-SendThread(0:0:0:0:0:0:0:1:9983)] DEBUG
> org.apache.zookeeper.ClientCnxn - Got ping response for 

Re: creating collection using collection API with SSL enabled SolrCloud

2017-02-09 Thread Bryan Bende
You should be able to start your Solr instances with "-h ".

On Thu, Feb 9, 2017 at 12:09 PM, Xie, Sean  wrote:
> Thank you Hrishikesh,
>
> The cluster property solved the issue.
>
> Now we need to figure out a way to give the instance a host name to solve the 
> SSL error that IP not matching the SSL name.
>
> Sean
>
>
>
> On 2/9/17, 11:35 AM, "Hrishikesh Gadre"  wrote:
>
> Hi Sean,
>
> Have you configured the "urlScheme" cluster property (i.e. 
> urlScheme=https)
> ?
> 
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CLUSTERPROP:ClusterProperties
>
> Thanks
> Hrishikesh
>
>
>
> On Thu, Feb 9, 2017 at 8:23 AM, Xie, Sean  wrote:
>
> > Hi All,
> >
> > When trying to create the collection using the API when there are a few
> > replicas, I’m getting error because the call seems to trying to use HTTP
> > for the replicas.
> >
> > https://IP_1:8983/solr/admin/collections?action=CREATE&;
> > name=My_COLLECTION&numShards=1&replicationFactor=1&
> > collection.configName=my_collection_conf
> >
> > Here is the error:
> >
> > org.apache.solr.client.solrj.SolrServerException:IOException occured 
> when
> > talking to server at: http://IP_2:8983/solr
> >
> >
> > Is there something need to be configured for that?
> >
> > Thanks
> > Sean
> >
> > Confidentiality Notice::  This email, including attachments, may include
> > non-public, proprietary, confidential or legally privileged information.
> > If you are not an intended recipient or an authorized agent of an 
> intended
> > recipient, you are hereby notified that any dissemination, distribution 
> or
> > copying of the information contained in or transmitted with this e-mail 
> is
> > unauthorized and strictly prohibited.  If you have received this email 
> in
> > error, please notify the sender by replying to this message and 
> permanently
> > delete this e-mail, its attachments, and any copies of it immediately.  
> You
> > should not retain, copy or use this e-mail or any attachment for any
> > purpose, nor disclose all or any part of the contents to any other 
> person.
> > Thank you.
> >
>
>


Re: Solr 6.x : howto create a core in (embedded) CoreConatiner

2017-01-04 Thread Bryan Bende
I had success doing something like this, which I found in some of the Solr
tests...

SolrResourceLoader loader = new SolrResourceLoader(solrHomeDir.toPath());
Path configSetPath = Paths.get(configSetHome).toAbsolutePath();

final NodeConfig config = new
NodeConfig.NodeConfigBuilder("embeddedSolrServerNode", loader)
.setConfigSetBaseDirectory(configSetPath.toString()) .build();

EmbeddedSolrServer embeddedSolrServer = new EmbeddedSolrServer(config,
coreName);
CoreAdminRequest.Create createRequest = new CoreAdminRequest.Create();
createRequest.setCoreName(coreName);
createRequest.setConfigSet(coreName);
embeddedSolrServer.request(createRequest);

The setup was to have a config set located at src/test/resources/configsets
so configSetHome was src/test/resources/configsets, the coreName was the
name of a configset in that directory, and solrHome was a path to
target/solr.

https://github.com/bbende/embeddedsolrserver-example/blob/master/src/test/java/org/apache/solr/EmbeddedSolrServerFactory.java
https://github.com/bbende/embeddedsolrserver-example/blob/master/src/test/java/org/apache/solr/TestEmbeddedSolrServerFactory.java

On Fri, Dec 30, 2016 at 3:27 AM, Clemens Wyss DEV 
wrote:

> I am still using 5.4.1 and have the following code to create a new core:
> ...
> Properties coreProperties = new Properties();
> coreProperties.setProperty( CoreDescriptor.CORE_CONFIGSET, configsetToUse
> );
> CoreDescriptor coreDescriptor = new CoreDescriptor( container, coreName,
> coreFolder, coreProperties );
> coreContainer.create( coreDescriptor );
> coreContainer.getCoresLocator().create( coreContainer, coreDescriptor );
> ...
>
> What is the equivalent Java snippet in Solr 6.x (latest greatest)?
>
> Thx & a successful 2017!
> Clemens
>


Re: solrj Https problem

2016-10-31 Thread Bryan Bende
A possible problem might be that your certificate was generated for
"localhost" which is why it works when you go to https://localhost:8985/solr
in your browser, but when SolrJ gets the cluster information from ZooKeeper
the hostnames of the Solr nodes might be using an IP address which won't
work when the SSL/TLS negotiation happens.

If this is the problem you will want to specify the hostname for Solr to
use when starting each node by passing "-h localhost".

-Bryan

On Mon, Oct 31, 2016 at 1:05 PM, sandeep mukherjee <
wiredcit...@yahoo.com.invalid> wrote:

> I followed the steps to make the solr SSL enabled. I'm able to hit solr
> at: https://localhost:8985/solr/problem/select?indent=on&q=*:*&wt=json And
> for accessing it through Solr Client I created it as
> follows:System.setProperty("javax.net.ssl.keyStore",
> "/path/to/solr/server/etc/solr-ssl.keystore.jks");
> System.setProperty("javax.net.ssl.keyStorePassword", "secret");
> System.setProperty("javax.net.ssl.trustStore", "/path/to/solr/server/etc/
> solr-ssl.keystore.jks");
> System.setProperty("javax.net.ssl.trustStorePassword", "secret");
> return new CloudSolrClient.Builder()
> .withZkHost(solrConfig.getConnectString()).build(); The path to
> the keystore and truststore is correct.  However I still get the following
> error:Caused by: javax.net.ssl.SSLHandshakeException:
> sun.security.validator.ValidatorException: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable to
> find valid certification path to requested target
> at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) ~[na:1.8.0_45]
> at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1937)
> ~[na:1.8.0_45]
> at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302) ~[na:1.8.0_45]
> at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) ~[na:1.8.0_45]
> at 
> sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1478)
> ~[na:1.8.0_45]
> at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:212)
> ~[na:1.8.0_45]
> at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
> ~[na:1.8.0_45]
> at sun.security.ssl.Handshaker.process_record(Handshaker.java:914)
> ~[na:1.8.0_45]
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1050)
> ~[na:1.8.0_45]
> at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1363)
> ~[na:1.8.0_45]
> at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1391)
> ~[na:1.8.0_45]
> at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1375)
> ~[na:1.8.0_45]
> at 
> org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543)
> ~[httpclient-4.5.1.jar:4.5.1]
> at 
> org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409)
> ~[httpclient-4.5.1.jar:4.5.1]
> at org.apache.http.impl.conn.DefaultClientConnectionOperato
> r.openConnection(DefaultClientConnectionOperator.java:177)
> ~[httpclient-4.5.1.jar:4.5.1]
> at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(
> ManagedClientConnectionImpl.java:304) ~[httpclient-4.5.1.jar:4.5.1]
> at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(
> DefaultRequestDirector.java:611) ~[httpclient-4.5.1.jar:4.5.1]
> at org.apache.http.impl.client.DefaultRequestDirector.execute(
> DefaultRequestDirector.java:446) ~[httpclient-4.5.1.jar:4.5.1]
> at 
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> ~[httpclient-4.5.1.jar:4.5.1]
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> ~[httpclient-4.5.1.jar:4.5.1]
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
> ~[httpclient-4.5.1.jar:4.5.1]
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> ~[httpclient-4.5.1.jar:4.5.1]
> at org.apache.solr.client.solrj.impl.HttpSolrClient.
> executeMethod(HttpSolrClient.java:495) ~[solr-solrj-6.1.0.jar:6.1.0
> 4726c5b2d2efa9ba160b608d46a977d0a6b83f94 - jpountz - 2016-06-13 09:46:59]
> ... 26 common frames omitted
> Caused by: sun.security.validator.ValidatorException: PKIX path building
> failed: sun.security.provider.certpath.SunCertPathBuilderException:
> unable to find valid certification path to requested target
> at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
> ~[na:1.8.0_45]
> at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
> ~[na:1.8.0_45]
> at sun.security.validator.Validator.validate(Validator.java:260)
> ~[na:1.8.0_45]
> at 
> sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
> ~[na:1.8.0_45]
> at 
> sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
> ~[na:1.8.0_45]
> at 
> sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
> ~[na:1.8.0_45]
> at 
> sun.security.ssl.ClientHands

Re: Solr Hit Highlighting

2016-10-26 Thread Bryan Bende
Hello,

I think part of the problem is the mis-match between what you are
highlighting on and what you are searching on.

Your query has no field specified so it must be searching a default field
field which looks like it would be _text_ since the copyField was setup to
copy everything to that field.

So you are searching against _text_ and then highlighting on content. These
two fields are also different types, one is text_en_splitting and one is
text_general, I suspect that could cause a difference in finding results vs
highlighting them.

Some things I would try...
- See what happens if your query is content:(What is lactose intolerance?)
and hl.fl = content  that way you are searching on what you are
highlighting on
- See what happens if you made content and _text_ the same type of field
(either both text_en_splitting or both text_general)
- You could make _text_ a stored field and set hl.fl =*  or hl.fl=_text_
and that should get you highlighting results from _text_  and allow you to
still use unfielded queries... normally this adds a lot of size to your
index if you are copying lots of fields to _text_ but you said it is only
content so maybe its fine

-Bryan


On Mon, Oct 24, 2016 at 11:51 PM, Al Hudson 
wrote:

> Hello All,
>
> I’m new to the world of Solr and hoping someone on this list can help me
> hit highlighting in solr.
>
> I am trying to set up a hit highlighting in Solr and have been seeing some
> strange issues.
>
> My core.xml file has a single tag   which houses all
> the text in a document.
>
> Using the Solr web interface I submit the following query : What is milk?
> – I get back many answers and in addition, just by selecting the hl box and
> entering ‘content’ in the hl.fl box I get hit highlighted portions of text.
>
> However things stop working when I change the query to : What is lactose
> intolerance? I still get valid results but the highlighting section is full
> of empty arrays.
>
> I’ve tried different combinations of commenting out the copyField, making
> content multivalued, but to be honest I’m trying things and hoping some
> configuration will work.
>
> required="false" multiValued="false" />
> 
>  docValues="false" />
>  multiValued="true"/>
>
> 
> 
>
>  multiValued="false" />
>
>  stored="true" multiValued="false" />
>
> Can someone help?
>
> Thank you,
> Al
>
>
> Sent from Mail for
> Windows 10
>
>


Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Bryan Bende
After some more debugging, I think putting the dataDir in the
Map of properties is actually working, but still running
into a couple of issues with the setup...

I created an example project that demonstrates the scenario:
https://github.com/bbende/solrcore-datdir-test/blob/master/src/test/java/org/apache/solr/EmbeddedSolrServerFactory.java

When calling coreContainer.create(String coreName, Path instancePath,
Map parameters)...

If instancePath is relative, then the core is loaded with no errors, but it
ends up writing a core.properties relative to Solr home, so it writes:
src/test/resources/solr/src/test/resources/exampleCollection/core.properties

If instancePath is absolute, then it fails to start up because there is
already a core.properties at
/full/path/to/src/test/resources/exampleCollection, the exception is thrown
from:

at
org.apache.solr.core.CorePropertiesLocator.create(CorePropertiesLocator.java:66)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:778)


Since everything from src/test/resources is already being put under
target/test-classes as part of the build, I'm thinking a better approach
would be to reference those paths for the Solr home and instancePath.

If I remove the core.properties from src/test/resources/exampleCollection,
then it can write a new one to target/test-classes/exampleCollection, and
will even put the dataDir there by default.



On Mon, Oct 3, 2016 at 7:00 PM, Bryan Bende  wrote:

> Yea I'll try to put something together and report back.
>
> On Mon, Oct 3, 2016 at 6:54 PM, Alan Woodward  wrote:
>
>> Ah, I see what you mean.  Putting the dataDir property into the Map
>> certainly ought to work - can you write a test case that shows what’s
>> happening?
>>
>> Alan Woodward
>> www.flax.co.uk
>>
>>
>> > On 3 Oct 2016, at 23:50, Bryan Bende  wrote:
>> >
>> > Alan,
>> >
>> > Thanks for the response. I will double-check, but I believe that is
>> going
>> > to put the data directory for the core under coreHome/coreName.
>> >
>> > What I am trying to setup (and did a poor job of explaining) is
>> something
>> > like the following...
>> >
>> > - Solr home in src/test/resources/solr
>> > - Core home in src/test/resources/myCore
>> > - dataDir for the myCore in target/myCore (or something not in the
>> source
>> > tree).
>> >
>> > This way the unit tests can use the Solr home and core config that is
>> under
>> > version control, but the data from testing would be written somewhere
>> not
>> > under version control.
>> >
>> > in 5.x I was specifying the dataDir through the properties object... I
>> > would calculate the path to the target dir in Java code relative to the
>> > class file, and then pass that as dataDir to the following:
>> >
>> > Properties props = new Properties();
>> > props.setProperty("dataDir", dataDir + "/" + coreName);
>> >
>> > In 6.x it seems like Properties has been replaced with the
>> > Map ? and I tried putting dataDir in there, but didn't
>> seem
>> > to do anything.
>> >
>> > For now I have just been using RAMDirectoryFactory so that no data ever
>> > gets written to disk.
>> >
>> > I'll keep trying different things, but if you have any thoughts let me
>> know.
>> >
>> > Thanks,
>> >
>> > Bryan
>> >
>> >
>> > On Mon, Oct 3, 2016 at 2:07 PM, Alan Woodward  wrote:
>> >
>> >> This should work:
>> >>
>> >> SolrCore solrCore
>> >>= coreContainer.create(coreName, Paths.get(coreHome).resolve(co
>> reName),
>> >> Collections.emptyMap());
>> >>
>> >>
>> >> Alan Woodward
>> >> www.flax.co.uk
>> >>
>> >>
>> >>> On 3 Oct 2016, at 18:41, Bryan Bende  wrote:
>> >>>
>> >>> Curious if anyone knows how to create an EmbeddedSolrServer in Solr
>> 6.x,
>> >>> with a core where the dataDir is located somewhere outside of where
>> the
>> >>> config is located.
>> >>>
>> >>> I'd like to do this without system properties, and all through Java
>> code.
>> >>>
>> >>> In Solr 5.x I was able to do this with the following code:
>> >>>
>> >>> CoreContainer coreContainer = new CoreContainer(solrHome);
>> >>> coreContainer.load();
>> >>>
>> >>> Properties props = new Properties();
>> >>> props.setProperty("dataDir", dataDir + "/" + coreName);
>> >>>
>> >>> CoreDescriptor descriptor = new CoreDescriptor(coreContainer,
>> coreName,
>> >>> new File(coreHome, coreName).getAbsolutePath(), props);
>> >>>
>> >>> SolrCore solrCore = coreContainer.create(descriptor);
>> >>> new EmbeddedSolrServer(coreContainer, coreName);
>> >>>
>> >>>
>> >>> The CoreContainer API changed a bit in 6.x and you can no longer pass
>> in
>> >> a
>> >>> descriptor. I've tried a couple of things with the current API, but
>> >> haven't
>> >>> been able to get it working.
>> >>>
>> >>> Any ideas are appreciated.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Bryan
>> >>
>> >>
>>
>>
>


Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Bryan Bende
Yea I'll try to put something together and report back.

On Mon, Oct 3, 2016 at 6:54 PM, Alan Woodward  wrote:

> Ah, I see what you mean.  Putting the dataDir property into the Map
> certainly ought to work - can you write a test case that shows what’s
> happening?
>
> Alan Woodward
> www.flax.co.uk
>
>
> > On 3 Oct 2016, at 23:50, Bryan Bende  wrote:
> >
> > Alan,
> >
> > Thanks for the response. I will double-check, but I believe that is going
> > to put the data directory for the core under coreHome/coreName.
> >
> > What I am trying to setup (and did a poor job of explaining) is something
> > like the following...
> >
> > - Solr home in src/test/resources/solr
> > - Core home in src/test/resources/myCore
> > - dataDir for the myCore in target/myCore (or something not in the source
> > tree).
> >
> > This way the unit tests can use the Solr home and core config that is
> under
> > version control, but the data from testing would be written somewhere not
> > under version control.
> >
> > in 5.x I was specifying the dataDir through the properties object... I
> > would calculate the path to the target dir in Java code relative to the
> > class file, and then pass that as dataDir to the following:
> >
> > Properties props = new Properties();
> > props.setProperty("dataDir", dataDir + "/" + coreName);
> >
> > In 6.x it seems like Properties has been replaced with the
> > Map ? and I tried putting dataDir in there, but didn't
> seem
> > to do anything.
> >
> > For now I have just been using RAMDirectoryFactory so that no data ever
> > gets written to disk.
> >
> > I'll keep trying different things, but if you have any thoughts let me
> know.
> >
> > Thanks,
> >
> > Bryan
> >
> >
> > On Mon, Oct 3, 2016 at 2:07 PM, Alan Woodward  wrote:
> >
> >> This should work:
> >>
> >> SolrCore solrCore
> >>= coreContainer.create(coreName, Paths.get(coreHome).resolve(
> coreName),
> >> Collections.emptyMap());
> >>
> >>
> >> Alan Woodward
> >> www.flax.co.uk
> >>
> >>
> >>> On 3 Oct 2016, at 18:41, Bryan Bende  wrote:
> >>>
> >>> Curious if anyone knows how to create an EmbeddedSolrServer in Solr
> 6.x,
> >>> with a core where the dataDir is located somewhere outside of where the
> >>> config is located.
> >>>
> >>> I'd like to do this without system properties, and all through Java
> code.
> >>>
> >>> In Solr 5.x I was able to do this with the following code:
> >>>
> >>> CoreContainer coreContainer = new CoreContainer(solrHome);
> >>> coreContainer.load();
> >>>
> >>> Properties props = new Properties();
> >>> props.setProperty("dataDir", dataDir + "/" + coreName);
> >>>
> >>> CoreDescriptor descriptor = new CoreDescriptor(coreContainer, coreName,
> >>> new File(coreHome, coreName).getAbsolutePath(), props);
> >>>
> >>> SolrCore solrCore = coreContainer.create(descriptor);
> >>> new EmbeddedSolrServer(coreContainer, coreName);
> >>>
> >>>
> >>> The CoreContainer API changed a bit in 6.x and you can no longer pass
> in
> >> a
> >>> descriptor. I've tried a couple of things with the current API, but
> >> haven't
> >>> been able to get it working.
> >>>
> >>> Any ideas are appreciated.
> >>>
> >>> Thanks,
> >>>
> >>> Bryan
> >>
> >>
>
>


Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Bryan Bende
Alan,

Thanks for the response. I will double-check, but I believe that is going
to put the data directory for the core under coreHome/coreName.

What I am trying to setup (and did a poor job of explaining) is something
like the following...

- Solr home in src/test/resources/solr
- Core home in src/test/resources/myCore
- dataDir for the myCore in target/myCore (or something not in the source
tree).

This way the unit tests can use the Solr home and core config that is under
version control, but the data from testing would be written somewhere not
under version control.

in 5.x I was specifying the dataDir through the properties object... I
would calculate the path to the target dir in Java code relative to the
class file, and then pass that as dataDir to the following:

Properties props = new Properties();
props.setProperty("dataDir", dataDir + "/" + coreName);

In 6.x it seems like Properties has been replaced with the
Map ? and I tried putting dataDir in there, but didn't seem
to do anything.

For now I have just been using RAMDirectoryFactory so that no data ever
gets written to disk.

I'll keep trying different things, but if you have any thoughts let me know.

Thanks,

Bryan


On Mon, Oct 3, 2016 at 2:07 PM, Alan Woodward  wrote:

> This should work:
>
> SolrCore solrCore
> = coreContainer.create(coreName, 
> Paths.get(coreHome).resolve(coreName),
> Collections.emptyMap());
>
>
> Alan Woodward
> www.flax.co.uk
>
>
> > On 3 Oct 2016, at 18:41, Bryan Bende  wrote:
> >
> > Curious if anyone knows how to create an EmbeddedSolrServer in Solr 6.x,
> > with a core where the dataDir is located somewhere outside of where the
> > config is located.
> >
> > I'd like to do this without system properties, and all through Java code.
> >
> > In Solr 5.x I was able to do this with the following code:
> >
> > CoreContainer coreContainer = new CoreContainer(solrHome);
> > coreContainer.load();
> >
> > Properties props = new Properties();
> > props.setProperty("dataDir", dataDir + "/" + coreName);
> >
> > CoreDescriptor descriptor = new CoreDescriptor(coreContainer, coreName,
> > new File(coreHome, coreName).getAbsolutePath(), props);
> >
> > SolrCore solrCore = coreContainer.create(descriptor);
> > new EmbeddedSolrServer(coreContainer, coreName);
> >
> >
> > The CoreContainer API changed a bit in 6.x and you can no longer pass in
> a
> > descriptor. I've tried a couple of things with the current API, but
> haven't
> > been able to get it working.
> >
> > Any ideas are appreciated.
> >
> > Thanks,
> >
> > Bryan
>
>


EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Bryan Bende
Curious if anyone knows how to create an EmbeddedSolrServer in Solr 6.x,
with a core where the dataDir is located somewhere outside of where the
config is located.

I'd like to do this without system properties, and all through Java code.

In Solr 5.x I was able to do this with the following code:

CoreContainer coreContainer = new CoreContainer(solrHome);
coreContainer.load();

Properties props = new Properties();
props.setProperty("dataDir", dataDir + "/" + coreName);

CoreDescriptor descriptor = new CoreDescriptor(coreContainer, coreName,
new File(coreHome, coreName).getAbsolutePath(), props);

SolrCore solrCore = coreContainer.create(descriptor);
new EmbeddedSolrServer(coreContainer, coreName);


The CoreContainer API changed a bit in 6.x and you can no longer pass in a
descriptor. I've tried a couple of things with the current API, but haven't
been able to get it working.

Any ideas are appreciated.

Thanks,

Bryan


Re: Ignoring the Document Cache per query

2015-05-29 Thread Bryan Bende
Thanks Erik. I realize this really makes no sense, but I was looking to
work around a problem. Here is the scenario...

Using Solr 5.1 we have a service that utilizes the new mlt query parser to
get recommendations. So we start up the application,
ask for recommendations for a document, and everything works.

Another feature is to "dislike" a document, and once it is "disliked" it
shouldn't show up as a recommended document. It
does this by looking up the disliked documents for a user and adding a
filter query to the recommendation call which excludes
the disliked documents.

So now we dislike a document that was in the original list of
recommendations above, then ask for the recommendations again,
and now we get nothing back. If we restart Solr, or reload the collection,
then we can get it to work, but as soon as we dislike another
document we get back into a weird state.

Through trial and error I narrowed down that if we set the documentCache
size to 0, then this problem doesn't happen. Since we can't
really figure out why this is happening in Solr, we were hoping there was
some way to not use the document cache on the call where
we use the mlt query parser.

On Thu, May 28, 2015 at 5:44 PM, Erick Erickson 
wrote:

> First, there isn't that I know of. But why would you want to do this?
>
> On the face of it, it makes no sense to ignore the doc cache. One of its
> purposes is to hold the document (read off disk) for successive
> search components _in the same query_. Otherwise, each component
> might have to do a disk seek.
>
> So I must be missing why you want to do this.
>
> Best,
> Erick
>
> On Thu, May 28, 2015 at 1:23 PM, Bryan Bende  wrote:
> > Is there a way to the document cache on a per-query basis?
> >
> > It looks like theres {!cache=false} for preventing the filter cache from
> > being used for a given query, looking for the same thing for the document
> > cache.
> >
> > Thanks,
> >
> > Bryan
>


Ignoring the Document Cache per query

2015-05-28 Thread Bryan Bende
Is there a way to the document cache on a per-query basis?

It looks like theres {!cache=false} for preventing the filter cache from
being used for a given query, looking for the same thing for the document
cache.

Thanks,

Bryan


SolrJ Exceptions

2015-04-16 Thread Bryan Bende
I'm trying to identify the difference between an exception when Solr is in
a bad state/down vs. when it is up but an invalid request was made (maybe
some bad data sent in).

The JavaDoc for SolrRequest process() says:


*@throws SolrServerException if there is an error on the Solr server@throws
IOException if there is a communication error*

So I expected IOException when Solr was down, but it looks like it actually
throws a SolrServerException which has a cause of an IOException.

I'm also not sure how SolrException fits into all of this...

Is anyone familiar with when to generally expect these types of exceptions?

I'm interested in both cloud and stand-alone scenarios, and using Solr 5.0
or 5.1.

Thanks,

Bryan


Recommendations based on MoreLikeThis & user likes/dislikes

2015-03-04 Thread Bryan Bende
Does anyone have experience tracking documents that a user "liked" /
"disliked" and then incorporating that into a MoreLikeThis query?

The idea would be to exclude any document a user disliked from ever
returning as a similar document, and to boost any document a user liked so
it shows up higher in the similar documents.

The biggest question seems to be how and where to store the
likes/dislikes...

- A multi-valued field on each document that stores the usernames of who
liked it (same for dislikes). This will cause a lot of updates to this
document, and the document can be fairly large with a lot of stored fields.

- A nested child document that stores the user, whether it was a like or
dislike, and any other necessary information, and then somehow using
block-join. If I'm correct this still requires updating the whole block
when a new child document is inserted.

- A separate solr document that stores the same information as the nested
child would, but would have to be joined together at query time. Not sure
of the performance impact here.

- Store the likes/dislikes completely outside Solr and somehow pass this
information in to a query, not sure if this is feasible if there are
thousands of likes and dislikes for a single user.

Any thoughts or best practices for implementing something like this would
be appreciated.

Thanks,

Bryan


Indexing Custom JSON with SolrJ

2015-02-08 Thread Bryan Bende
Does SolrJ have anything that allows you to change the update handler and
add something besides a SolrInputDocument?

I'm trying to figure out how to add JSON documents using the custom JSON
update handler (http://lucidworks.com/blog/indexing-custom-json-data/), but
doing it through SolrJ in order to leverage CloudSolrServer.

Thanks,

Bryan


Re: Optimize during indexing

2014-11-21 Thread Bryan Bende
When I've run an optimize with Solr 4.8.1 (by clicking optimize from the
collection overview in the admin ui) it goes replica by replica, so it is
never doing more than one shard or replica at the same time.

It also significantly slows down operations that hit the replica being
optimized. I've seen clients hanging for minutes waiting on on add document
to return.

On Fri, Nov 21, 2014 at 2:17 PM, Erick Erickson 
wrote:

> bq: if I can optimize one shard at a time
>
> Not sure. Try putting &distrib=false on the URL, but I don't know
> for sure whether that'd work or not. If this works at all, it'll work
> on  one _replica_ at a time, not shard.
>
> Bu why would you want to? Each optimization is local and runs
> in the background anyway. Or are you running an older master/slave
> setup? In which case I guess you might want to throttle replication,
> which you can do by enabling/disabling replication with the core admin
> API.
>
> Best,
> Erick
>
> On Fri, Nov 21, 2014 at 8:53 AM, Yago Riveiro 
> wrote:
> > It’s the "Deleted Docs” metric in the statistic core.
> >
> >
> >
> >
> > I now that eventually the merges will expunge this deletes but I will
> run out of space soon and I want to know the _real_ space that I have.
> >
> >
> >
> >
> > Actually I have space enough (about 3.5x the size of the index) to do
> the optimize.
> >
> >
> >
> >
> > Other question that I have is if I can optimize one shard at a time
> instead of do an optimize over the full collection (this give me more
> control about space used, I have more than one shard of the same collection
> in each node of the cluster).
> >
> >
> > —
> > /Yago Riveiro
> >
> > On Fri, Nov 21, 2014 at 4:29 PM, Erick Erickson  >
> > wrote:
> >
> >> Yes, should be no problem.
> >> Although this should be happening automatically, the percentage
> >> of documents in a segment weighs quite heavily when the decision
> >> is made to merge segments in the background.
> >> You say you have "millions of deletes". Is this the difference between
> >> numDocs and maxDoc on the admin page for the core in question?
> >> Or is it just that you've issued millions of updates (or deletes)?
> Because
> >> if the latter, I'd advise monitoring the numDocs/maxDoc pair to see
> >> if the problem goes away on its own.
> >> bq: ...and need free space
> >> This is a red flag. If you're talking about disk space, before you get
> the
> >> free space forceMerge will copy the _entire_ index so you'll need at
> >> least 2x the current index size.
> >> Best,
> >> Erick
> >> On Fri, Nov 21, 2014 at 6:40 AM, yriveiro 
> wrote:
> >>> Hi,
> >>>
> >>> It´s possible perform an optimize operation and continuing indexing
> over a
> >>> collection?
> >>>
> >>> I need to force expunge deletes from the index I have millions os
> deletes
> >>> and need free space.
> >>>
> >>>
> >>>
> >>> -
> >>> Best regards
> >>> --
> >>> View this message in context:
> http://lucene.472066.n3.nabble.com/Optimize-during-indexing-tp4170261.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Expunging Deletes

2014-09-29 Thread Bryan Bende
You can try lowering the mergeFactor in solrconfig.xml to cause more merges
to happen during normal indexing, which should result in more deleted
documents being removed from the index, but there is a trade-off

http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor

On Mon, Sep 29, 2014 at 2:14 PM, Eric Katherman  wrote:

> Thanks for replying!  Is there anything I could be doing to help prevent
> the 14GB collection with 700k deleted docs before it tries removing them
> and at that point running out of memory?  Maybe just scheduled off-peak
> optimize calls with expungeDeletes?  Or is there some other config option I
> could be using to help manage that a little better?
>
> Thanks!
> Eric
>
>
> On Sep 29, 2014, at 9:35 AM, Shalin Shekhar Mangar 
> wrote:
>
> > Yes, expungeDeletes=true will remove all deleted docs from the disk but
> it
> > also requires merging all segments that have any deleted docs which, in
> > your case, could mean a re-write of the entire index. So it'd be an
> > expensive operation. Usually deletes are removed in the normal course of
> > indexing as segments are merged together.
> >
> > On Sat, Sep 27, 2014 at 8:42 PM, Eric Katherman 
> wrote:
> >
> >> I'm running into memory issues and wondering if I should be using
> >> expungeDeletes on commits.  The server in question at the moment has
> 450k
> >> documents in the collection and represents 15GB on disk.  There are also
> >> 700k+ "Deleted Docs" and I'm guessing that is part of the disk space
> >> consumption but I am not having any luck getting that cleared out.  I
> >> noticed the expungeDeletes=false in some of the log output related to
> >> commit but didn't try setting it to true yet. Will this clear those
> deleted
> >> documents and recover that space?  Or should something else already be
> >> managing that but maybe isn't configured correctly?
> >>
> >> Our data is user specific data, each customer has their own database
> >> structure so it varies with each user.  They also add/remove data fairly
> >> frequently in many cases.  To compare another collection of the same
> data
> >> type, there are 1M documents and about 120k deleted docs but disk space
> is
> >> only 6.3GB.
> >>
> >> Hoping someone can share some advice about how to manage this.
> >>
> >> Thanks,
> >> Eric
> >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>
>


Re: Solr Exceptions -- "immense terms"

2014-09-15 Thread Bryan Bende
I ran into this problem as well when upgrading to Solr 4.8.1...

We had a somewhat large binary field that was "indexex=false stored=true",
but because of the copyField copying "*" to "text" it would hit the immense
term issue.

In our case we didn't need this field to be indexed (parts of it were
already indexed in other fields) so we worked around it by breaking out
individual copyField directives for only the fields we needed.

On Mon, Sep 15, 2014 at 1:52 PM, Chris Hostetter 
wrote:

>
> : SCHEMA:
> :  : required="true"/>
> :
> : LOGS:
> : Caused by: java.lang.IllegalArgumentException: Document contains at least
> : one immense term in field="content" (whose UTF8 encoding is longer than
> the
>
> I don't think you are using the schema.xml you think you are ... that
> exception is *very* specific to the *INDEXED* terms.  It has nothing to do
> with the stored value.
>
>
> This change in behavior (from silently ignoring massive terms, to
> propogating an error) was explicitly noted in the upgrade steps for 4.8...
>
>
> https://lucene.apache.org/solr/4_8_0/changes/Changes.html#v4.8.0.upgrading_from_solr_4.7
>
> In previous versions of Solr, Terms that exceeded Lucene's MAX_TERM_LENGTH
> were silently ignored when indexing documents. Begining with Solr 4.8, a
> document an error will be generated when attempting to index a document
> with a term that is too large. If you wish to continue to have large terms
> ignored, use "solr.LengthFilterFactory" in all of your Analyzers. See
> LUCENE-5472 for more details.
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: solr query gives different numFound upon refreshing

2014-08-27 Thread Bryan Bende
Theoretically this shouldn't happen, but is it possible that the two
replicas for a given shard are not fully in sync?

Say shard1 replica1 is missing a document that is in shard1 replica2... if
you run a query that would hit on that document and run it a bunch of
times, sometimes replica 1 will handle the request and sometimes replica 2
will handle it, and it would change your number of results if one of them
is missing a document. You could write a program that compares each
replica's documents by querying them with distrib=false.

If there was a replica out of sync, I would think it would detect that on a
restart when comparing itself against the leader for that shard, but I'm
not sure.


On Wed, Aug 27, 2014 at 11:37 AM, Joshi, Shital  wrote:

> Hi,
>
> We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. We have
> three collections. We recently upgraded from 4.4.0 from 4.8. We have ~850
> mil documents.
>
> We are facing an issue where refreshing a Solr query may give different
> results (number of documents returned). This issue is seen in all three
> collections.
>
> We found that Solr admin would report Solr instance states as not
> “current”.  Is it indicative of the above issue?
>
> We checked logs and found various errors/warnings, but they don’t seem to
> be indicative of the above issue (or if they are – it’s not yet
> clear/obvious or maybe indirectly related). The error message is like this:
> 8/27/2014 2:01:24 AMERROR   SolrCmdDistributor
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error
> opening new searcher. exceeded limit of maxWarmingSearchers=2, try again
> later.
>
> This is our autocommit setting.
>
> 
>15000
>10
>false
>  
>  
>30
> 
> The searcher takes less than 1.5 minutes and the soft commit setting is
> set for every 5 minutes. So there is no way to end up with more than two
> searchers.
>
> The searcher registeredAttime and openedAttime are sometimes 12-13 hours
> old and we end up bouncing could.
>
> Any help to solve this issue is appreciated.
>
>
>
>
>
>
>
>
>


Re: Incorrect group.ngroups value

2014-08-22 Thread Bryan Bende
Turns out there are in fact documents for the same group in different
shards which must be causing this problem. It looks like we have a slight
flaw in how we were trying to use the composite id routing.

Thanks for putting me down the right path.


On Fri, Aug 22, 2014 at 11:14 AM, Andrew Shumway 
wrote:

> The Co-location section of this document
> http://searchhub.org/2013/06/13/solr-cloud-document-routing/ might be of
> interest to you.  It mentions the need for using Solr Cloud routing to
> group documents in the same core so that grouping can work properly.
>
> --Andrew Shumway
>
>
> -Original Message-
> From: Bryan Bende [mailto:bbe...@gmail.com]
> Sent: Friday, August 22, 2014 9:01 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Incorrect group.ngroups value
>
> Thanks Jim.
>
> We've been using the composite id approach where we put group value as the
> leading portion of the id (i.e. groupValue!documentid), so I was expecting
> all of the documents for a given group to be in the same shard, but at
> least this gives me something to look into. I'm still suspicious of
> something changing between 4.6.1 and 4.8.1, because we've had the grouping
> implemented this way for a while, and only on the exact day we upgraded did
> someone bring this problem forward. I will keep investigating, thanks.
>
>
> On Fri, Aug 22, 2014 at 9:18 AM, jim ferenczi 
> wrote:
>
> > Hi Bryan,
> > This is a known limitations of the grouping.
> > https://wiki.apache.org/solr/FieldCollapsing#RequestParameters
> >
> > group.ngroups:
> >
> >
> > *WARNING: If this parameter is set to true on a sharded environment,
> > all the documents that belong to the same group have to be located in
> > the same shard, otherwise the count will be incorrect. If you are
> > using SolrCloud <https://wiki.apache.org/solr/SolrCloud>, consider
> > using "custom hashing"*
> >
> > Cheers,
> > Jim
> >
> >
> >
> > 2014-08-21 21:44 GMT+02:00 Bryan Bende :
> >
> > > Is there any known issue with using group.ngroups in a distributed
> > > Solr using version 4.8.1 ?
> > >
> > > I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing
> > several
> > > queries where ngroups will be more than the actual groups returned
> > > in the response. For example, ngroups will say 5, but then there
> > > will be 3
> > groups
> > > in the response. It is not happening on all queries, only some.
> > >
> >
>


Re: Incorrect group.ngroups value

2014-08-22 Thread Bryan Bende
Thanks Jim.

We've been using the composite id approach where we put group value as the
leading portion of the id (i.e. groupValue!documentid), so I was expecting
all of the documents for a given group to be in the same shard, but at
least this gives me something to look into. I'm still suspicious of
something changing between 4.6.1 and 4.8.1, because we've had the grouping
implemented this way for a while, and only on the exact day we upgraded did
someone bring this problem forward. I will keep investigating, thanks.


On Fri, Aug 22, 2014 at 9:18 AM, jim ferenczi 
wrote:

> Hi Bryan,
> This is a known limitations of the grouping.
> https://wiki.apache.org/solr/FieldCollapsing#RequestParameters
>
> group.ngroups:
>
>
> *WARNING: If this parameter is set to true on a sharded environment, all
> the documents that belong to the same group have to be located in the same
> shard, otherwise the count will be incorrect. If you are using SolrCloud
> <https://wiki.apache.org/solr/SolrCloud>, consider using "custom hashing"*
>
> Cheers,
> Jim
>
>
>
> 2014-08-21 21:44 GMT+02:00 Bryan Bende :
>
> > Is there any known issue with using group.ngroups in a distributed Solr
> > using version 4.8.1 ?
> >
> > I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing
> several
> > queries where ngroups will be more than the actual groups returned in the
> > response. For example, ngroups will say 5, but then there will be 3
> groups
> > in the response. It is not happening on all queries, only some.
> >
>


Incorrect group.ngroups value

2014-08-21 Thread Bryan Bende
Is there any known issue with using group.ngroups in a distributed Solr
using version 4.8.1 ?

I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing several
queries where ngroups will be more than the actual groups returned in the
response. For example, ngroups will say 5, but then there will be 3 groups
in the response. It is not happening on all queries, only some.


ComplexPhraseQuery and Date ranges

2014-08-14 Thread Bryan Bende
Does anyone know if it is possible to get data ranges working with the
ComplexPhraseQueryParser?

I'm using Solr 4.8.1 and seeing the same behavior described in this post:

http://stackoverflow.com/questions/19402268/solr-4-2-1-and-solr-1604-complexphrase-and-date-range-queries-do-not-work-toge

I don't know much about the query parsing code in Solr/Lucene, but from
looking at ComplexPhraseQueryParser... it overrides getRangeQuery(...)  but
then calls suprer.getRangeQuery(..) which calls the getRangeQuery() from
QueryParserBase that looks like it has logic regarding dates, but it
doesn't seem to pick up any results when the query has a date range in it.


Re: Issue paging when sorting on a Date field

2014-05-20 Thread Bryan Bende
This is using the solr.TrieDateField, it is the field type "date" from the
example schema in solr 4.6.1:


After further testing I was only able to reproduce this in a sharded &
replicated environment (numShards=3, replicationFactor=2) and I think I
have narrowed down the issue, and at this point it may be expected
behavior...

I took a query like q=create_date:[2014-05-19T00:00:00Z TO
2014-05-19T23:59:59Z]&sort=create_date DESC&start=0&rows=1 which should
get all the documents for yesterday sorted by create date, and then added
distrib=false and ran it against shard1_replica1 and shard1_replica2. Then
I diff'd the files and it showed 5 occurrences where two consecutive rows
in one replica were reversed in the other replica, and in all 5 cases the
flipped flopped rows had the exact same create_date value, which happened
to only go down to the minute.

As an example:

shard1_replica1:
...
docX, 2014-05-19T20:15:00Z
docY, 2014-05-19T20:15:00Z
...

shard1_replica2:
...
docY, 2014-05-19T20:15:00Z
docX, 2014-05-19T20:15:00Z
...

So I think when I was paging through the results, if the query for page N
was handled by replica1 and page N+1 handled by replica2, and the page
boundary happened to be where the reversed rows were, this would produce
the behavior I was seeing where the last row from the previous page was
also the first row from the next page.

I guess the obvious solution is to ensure the date field is always more
granular than minutes, or add another field to the sort order to
consistently break ties.


On Mon, May 19, 2014 at 4:19 PM, Chris Hostetter
wrote:

>
> : Using Solr 4.6.1 and in my schema I have a date field storing the time a
> : document was added to Solr.
>
> what *exactly* does your schema look like?  are you using "solr.DateField"
> or "solr.TrieDateField" ? what field options do you have specified?
>
> : I have a utility program which:
> : - queries for all of the documents in the previous day sorted by create
> date
> : - pages through the results keeping track of the unique document ids
> : - compare the total number of unique doc ids to the numFound to see if it
> : they match
>
> what *exactly* do your queries look like?  show us some examples please
> (URL & results).  Are you using distributed searching across multiple
> nodes, or a single node?  do you have concurrent updates going on during
> your test?
>
> : It is not consistent between tests, the number of occurrences changes and
> : the locations of the occurrences can change as well. The larger the
> result
> : set, and smaller the page size, the more frequent the occurrences are.
>
> if you bring up a test instance of Solr using your current configs, can
> you reproduce (even occasionally) with some synthetic data you can share
> with us?  If so please provide your full configs & sample data (ie: create
> a Jira & attach all the neccessary files i na ZIP)
>
>
> -Hoss
> http://www.lucidworks.com/
>


Issue paging when sorting on a Date field

2014-05-19 Thread Bryan Bende
Using Solr 4.6.1 and in my schema I have a date field storing the time a
document was added to Solr.

I have a utility program which:
- queries for all of the documents in the previous day sorted by create date
- pages through the results keeping track of the unique document ids
- compare the total number of unique doc ids to the numFound to see if it
they match

I've noticed that if I use a page size larger than the number of documents
for the given day (aka get everything in one query), then everything works
as expected (results sorted correctly, unique doc ids size == numFound).

However, when I use a smaller page say, say 10 rows per page, I randomly
see cases where the last document of a page will be duplicated as the first
document of the next page, even though the "start" and "rows" parameters
increased correctly. So I might see something like numFound=100 but unique
doc ids is 97, and then I see three occurrences where the last doc id on a
page was also the first on the next page.

It is not consistent between tests, the number of occurrences changes and
the locations of the occurrences can change as well. The larger the result
set, and smaller the page size, the more frequent the occurrences are.

The only thing I have noticed is that if I change the sorting of the
initial query to use a non-date field, then this doesn't happen anymore.

Are there any know issues/limitations sorting/paging on a date field ?

The only mention I can find is this thread:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200909.mbox/%3c57912a0644b6ab4381816de07cb1c38d02a00...@s2na1excluster.na1.ad.group%3E


expungeDeletes vs optimize

2014-02-05 Thread Bryan Bende
Does calling commit with expungeDeletes=true result in a full rewrite of
the index like an optimize does? or does it only merge away the documents
that were "deleted" by commit?

Every two weeks or so we run a process to rebuild our index from the
original documents resulting in a large amount of deleted docs still on
disk, and basically doubling the amount of disk space used by the index. We
are trying determine if it is best to just run an optimize at the end of
this process, or if there is a better solution. This is with solr 4.3.


autoCommit and autoSoftCommit

2013-08-26 Thread Bryan Bende
I'm running Solr 4.3 with:


  6
  false



  5000


When I start Solr and send in a couple of hundred documents, I am able to
retrieve documents after 5 seconds using SolrJ. However, from the Solr
admin console if I query for *:* it will show that there are docs in the
numFound attribute, but none of the results have the stored fields present.

As a test I also tried modifying the autoCommit to add maxDocs like this:

  100
  6
  false


It seems like with this configuration something different happens... if I
send in 150 docs then the first 100 will show up correctly through Solr
admin, but the last 50 that didn't hit the maxDocs threshold still don't
show the stored fields.

Is it expected that maxDocs and maxTime do something different when
commiting ?

If using autoCommit with openSearcher=false and autoSoftCommit, does the
client ever have to send a hard commit with openSearcher=true ?

- Bryan


Re: Field Query After Collapse.Field?

2013-06-28 Thread Bryan Bende
Can you just use two queries to achieve the desired results ?

Query1 to get all actions where !entry_read:1 for some range of rows (your
page size)
Query2 to get all the entries with an entry_id in the results of Query1

The second query would be very direct and only query for a set of entries
equal to your page size.


On Fri, Jun 28, 2013 at 12:51 PM, Erick Erickson wrote:

> Well, now I'm really puzzled. The link you referenced was from when
> grouping/field collapsing was under development. I did a quick look
> through the entire 4x code base fo "collapse" and there's no place
> I saw that looks like it accepts that parameter. Of course I may have
> just missed it.
>
> What version of Solr are you using? Have you done anything special
> to it? Can you cut/paste your response, or at least the relevant bits that
> show the effects of specifying collapse.field?
>
> Best
> Erick
>
>
> On Fri, Jun 28, 2013 at 12:19 PM, slevytam  >wrote:
>
> > Hi Erick,
> >
> > I actually did mean collapse.field, as per:
> >
> >
> http://blog.trifork.com/2009/10/20/result-grouping-field-collapsing-with-solr/
> >
> > On high level I am trying to avoid the use of a join between a list of
> > entries and a list of actions that users have performed on a entry (since
> > it's not supported by distributed search).
> >
> > So I have a list of entries
> > ie. entry_id, entry_content, etc
> >
> > And a list of actions users have performed on the entry
> > ie. entry_id, entry_read, entry_starred
> >
> > I'm trying to combine these for pagination purposes.  By doing a search
> for
> > entry_id across the two cores (indexes) and then doing a collapse.field,
> I
> > am able to get this nice list of results.  However, I cannot figure out a
> > way to then filter that list since q and fq happen before the collapse.
> >
> > Thanks,
> >
> > Shalom
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Field-Query-After-Collapse-Field-tp4073691p4073928.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: Result Grouping

2013-06-26 Thread Bryan Bende
The field I am grouping on is a single-valued string.

It looks like in non-distributed mode if I use group=true, sort,
group.sort, and
group.limit=1, it will..

- group the results
- sort with in each group
- limit down to 1 result per group
- apply the sort between groups using the single result of each group

When I run with numShards >= 1...

- group the results
- apply the sort between groups using the document from each group based
on the sort, for example if sort= popularity desc then it uses the highest
popularity from each group
- sort with in the group
- limit down to 1 result per group

I was trying to confirm if this was the expected behavior, or if there is
something I could do to get the first behavior in a distributed configuration.

I posted this a few days ago describing the scenario in more detail if
you are interested...
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCALo_M18WVoLKvepJMu0wXk_x2H8cv3UaX9RQYtEh4-mksQHLBA%40mail.gmail.com%3E


> What type of field are you grouping on? What happens when you distribute
> ?it? I.e. what specifically goes wrong?

> Upayavira

On Tue, Jun 25, 2013, at 09:12 PM, Bryan Bende wrote:
> I was reading this documentation on Result Grouping...
> http://docs.lucidworks.com/display/solr/Result+Grouping
>
> which says...
>
> sort - sortspec - Specifies how Solr sorts the groups relative to each
> other. For example, sort=popularity desc will cause the groups to be
> sorted
> according to the highest popularity document in each group. The default
> value is score desc.
>
> group.sort - sort.spec - Specifies how Solr sorts documents within a
> single
> group. The default value is score desc.
>
> Is it possible to use these parameters such that group.sort would first
> sort with in each group, and then the overall sort would be applied
> according to the first element of each sorted group ?
>
> For example, using the scenario above where it has "sort=popularity
> desc",
> could you also have "group.sort=date asc" resulting in the the most
> recent
> document of each group being sorted by decreasing popularity ?
>
> It seems to work the way I described when running a single node Solr 4.3
> instance, but in a 2 shard configuration it appears to work differently.
>
> -Bryan


Result Grouping

2013-06-25 Thread Bryan Bende
I was reading this documentation on Result Grouping...
http://docs.lucidworks.com/display/solr/Result+Grouping

which says...

sort - sortspec - Specifies how Solr sorts the groups relative to each
other. For example, sort=popularity desc will cause the groups to be sorted
according to the highest popularity document in each group. The default
value is score desc.

group.sort - sort.spec - Specifies how Solr sorts documents within a single
group. The default value is score desc.

Is it possible to use these parameters such that group.sort would first
sort with in each group, and then the overall sort would be applied
according to the first element of each sorted group ?

For example, using the scenario above where it has "sort=popularity desc",
could you also have "group.sort=date asc" resulting in the the most recent
document of each group being sorted by decreasing popularity ?

It seems to work the way I described when running a single node Solr 4.3
instance, but in a 2 shard configuration it appears to work differently.

-Bryan


Grouping and Sorting with shards

2013-06-21 Thread Bryan Bende
I'm wondering what the expected behavior is for the following scenario...

We receive the same document in multiple formats and we handle this by
grouping, sorting the group by date received, and limiting the group to 1,
resulting in getting the most recent version of a document.

Here is an example, the id field is something like "identifier!date_format"

doc {
 id: doc1!20130618_formatX
 docId: doc1
 dateReceived: 20130620
}

doc {
 id: doc1!20130621_formatY
 docId: doc1
 dateReceived: 20130621
}

doc {
 id: doc2!20130619_formatX
 docId: doc2
 dateReceived: 20130619
}


So in this case we would want to group on docId so all the doc1 docs were
together and all doc2 docs together, sort with in the groups on
dateReceived descending and limit the groups to 1 to get the most recent
doc in the group, then sort the whole result set on dateReceived ascending.

So we expect to get:
doc2!20130619_formatX
doc1!20130621_formatY

In a regular single node Solr instance, running Solr 4.3, everything I
described above works perfectly fine. When running on a sharded
configuration with two nodes, the results are different. It will still do
the grouping, sorting with in group, and limiting as expected, but the
overall sort on dateReceived is not the same.

The results end up being:
doc1!20130621_formatY
doc2!20130619_formatX

It seems like this is because the doc1 group has another document with
dateReceived of 0618 which is somehow being used for the overall sort, and
then the group.sort and group.limit is being applied after this ???

I realize there could be limitations of grouping and sorting in a sharded
setup, but I wanted to know if this is correct behavior, or if there is
something I am doing wrong.

Any help would be appreciated.

Thanks,

Bryan