Re: Solr SSL setup with bought certificate

2017-05-09 Thread Sebastjanas
I don't know why this script worked on CentOS, but it did.

Also I managed to make it work with uncommenting following two lines:

SOLR_SSL_TRUST_STORE=/opt/solr/server/etc/solr-ssl.keystore.jks
SOLR_SSL_TRUST_STORE_PASSWORD=[password]

I don't know where did I find that these need to be uncommented only if
it's self-signed certificate, now it looks like they have to be uncommented
in both cases.


On Tue, May 9, 2017 at 4:40 PM, Steve Rowe  wrote:

> Hi,
>
> AFAICT the Solr 5.5.4 install_solr_service.sh doesn’t support Centos
> (support was added in 6.3: SOLR-9475).  How did you make it work?
>
> I’m guessing there are permissions problems in your installation
> directory, such that the account being used to start Solr doesn’t have
> execute and/or read permission somewhere under /opt/solr-5.5.4/.
>
> The install script sets up permissions like this:
>
> -
> chown -R root: "$SOLR_INSTALL_DIR"
> find "$SOLR_INSTALL_DIR" -type d -print0 | xargs -0 chmod 0755
> find "$SOLR_INSTALL_DIR" -type f -print0 | xargs -0 chmod 0644
> chmod -R 0755 "$SOLR_INSTALL_DIR/bin”
> -
>
> --
> Steve
> www.lucidworks.com
>
> > On May 9, 2017, at 4:17 AM, Sebastjanas  wrote:
> >
> > Hello,
> >
> > I installed 5.5.4 on Centos to /opt/solr. Also I installed init script
> > using install_solr_service.sh. I've imported bought certificate to
> keystore
> > and now trying to start it up with SSL, using following settings in
> > /etc/default/solr.in.sh:
> >
> > SOLR_SSL_ENABLED=true
> > SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.jks
> > SOLR_SSL_KEY_STORE_PASSWORD=[password]
> > #SOLR_SSL_TRUST_STORE=etc/keystore.jks
> > #SOLR_SSL_TRUST_STORE_PASSWORD=[password]
> > SOLR_SSL_NEED_CLIENT_AUTH=false
> > SOLR_SSL_WANT_CLIENT_AUTH=false
> >
> > But it doesn't start with following error:
> >
> >1629 WARN  (main) [   ] o.e.j.u.c.AbstractLifeCycle FAILED
> > SslContextFactory@564fabc8(etc/solr-ssl.keystore.jks,):
> > java.io.FileNotFoundException: /opt/solr-5.5.4/server (Is a directory)
> > java.io.FileNotFoundException: /opt/solr-5.5.4/server (Is a directory)
> >at java.io.FileInputStream.open0(Native Method)
> >at java.io.FileInputStream.open(FileInputStream.java:195)
> >at java.io.FileInputStream.(FileInputStream.java:138)
> >at org.eclipse.jetty.util.resource.FileResource.
> getInputStream(FileResource.java:290)
> >at org.eclipse.jetty.util.security.CertificateUtils.
> getKeyStore(CertificateUtils.java:43)
> >at org.eclipse.jetty.util.ssl.SslContextFactory.loadTrustStore(
> SslContextFactory.java:884)
> >at org.eclipse.jetty.util.ssl.SslContextFactory.doStart(
> SslContextFactory.java:274)
> >at org.eclipse.jetty.util.component.AbstractLifeCycle.
> start(AbstractLifeCycle.java:68)
> >at org.eclipse.jetty.util.component.ContainerLifeCycle.
> start(ContainerLifeCycle.java:132)
> >at org.eclipse.jetty.util.component.ContainerLifeCycle.
> doStart(ContainerLifeCycle.java:114)
> >at org.eclipse.jetty.server.SslConnectionFactory.doStart(
> SslConnectionFactory.java:64)
> >at org.eclipse.jetty.util.component.AbstractLifeCycle.
> start(AbstractLifeCycle.java:68)
> >at org.eclipse.jetty.util.component.ContainerLifeCycle.
> start(ContainerLifeCycle.java:132)
> >at org.eclipse.jetty.util.component.ContainerLifeCycle.
> doStart(ContainerLifeCycle.java:114)
> >at org.eclipse.jetty.server.AbstractConnector.doStart(
> AbstractConnector.java:256)
> >at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(
> AbstractNetworkConnector.java:81)
> >at org.eclipse.jetty.server.ServerConnector.doStart(
> ServerConnector.java:236)
> >at org.eclipse.jetty.util.component.AbstractLifeCycle.
> start(AbstractLifeCycle.java:68)
> >at org.eclipse.jetty.server.Server.doStart(Server.java:366)
> >at org.eclipse.jetty.util.component.AbstractLifeCycle.
> start(AbstractLifeCycle.java:68)
> >at org.eclipse.jetty.xml.XmlConfiguration$1.run(
> XmlConfiguration.java:1255)
> >at java.security.AccessController.doPrivileged(Native Method)
> >at org.eclipse.jetty.xml.XmlConfiguration.main(
> XmlConfiguration.java:1174)
> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> >at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> >at java.lang.reflect.Method.invoke(Method.java:498)
> >at org.eclipse.jetty.start.Main.invokeMain(Main.java:321)
> >at org.eclipse.jetty.start.Main.start(Main.java:817)
> >at org.eclipse.jetty.start.Main.main(Main.java:112)
> > 1631 INFO  (coreLoadExecutor-6-thread-1) [   ] o.a.s.c.SolrConfig
> > Adding specified lib dirs to ClassLoader
> > 1634 WARN  (main) [   ] o.e.j.u.c.AbstractLifeCycle FAILED
> > 

Status on updating log4j to log4j2

2017-05-09 Thread Antelmo Aguilar
Hi,

I noticed that you guys are working on upgrading log4j to log4j2:
https://issues.apache.org/jira/browse/SOLR-7887

I was wondering if there is any priority on doing this since it has been
several months since the last comment.  It would be nice since it seems
log4j2 makes it easier to make the logs be GELF compliant.

Thank you!


Re: CDCR Alias support?

2017-05-09 Thread Webster Homer
Still no answer to this. I've been investigating using the collections API
for backup and restore. If CDCR supports collection aliases this would make
things much smoother as we would restore to a new collection and then
switch the alias to reference the new collection.

On Tue, Jan 10, 2017 at 10:53 AM, Webster Homer 
wrote:

> Looking at the cdcr API and documentation I wondered if the source and
> target collection names could be aliases. This is not discussed in the cdcr
> documentation, when I have time I was going to test this, but if someone
> knows for certain it might save some time.
>
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Search substring in field

2017-05-09 Thread jnobre
Hello,

Thanks for your response.

I realize the concept, but I do not know which one to use in my case. Not
exactly the difference between the analyzes.

1- At this moment I search for
"source": * "hello word" * or url =
http://:8983/solr/AWP10/select?Indent=on=source:*%22hello%20world%22*=json

For example, one line of the answer:
   "source":
["http://www.gravatar.com/avatar/ad516503a11cd5ca435acc9bb6523536?s=32;]

The expression does not appear and even then the line is returned.

2 - My idea was to identify a url in the middle of a string with regex, for
example, as it does in Java:
Eur-lex.europa.eu eur-lex.europa.eu eur-lex.europa.eu Eur-lex.europa.eu
eur-lex.europa.eu
I do not know what the syntax is for entering regex in the search.

3- I can use the multiplication function, but not the search syntax to
evaluate its return.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-substring-in-field-tp4333553p4334316.html
Sent from the Solr - User mailing list archive at Nabble.com.


cursormark pagination inconsistency

2017-05-09 Thread moscovig
Hi!

We are running on solrj 6.2.0, server 6.2.1 
and trying to fetch 100 records at a time, with nextCursorMark, 
while* sorting on: score desc,* key asc


The collection is made of 2 shards, with 3 replicas each.

We get inconsistent results when not specifying specific replica for each
shard.

Sometimes the 3rd, and sometime the 10th fetch will contain results that we
expected to see in the 15th batch.
Something went wrong with the score sorting. 

When we specify a replica for each shard to query from with
shards=solr1:8983/solr/tweets_shard1_replica2/,solr26:8983/solr/tweets_shard2_replica3

It is working as expected.

It seems as if the cursor doesn't keep the sort between different replicas
of each shard.

Thank you for the help!








--
View this message in context: 
http://lucene.472066.n3.nabble.com/cursormark-pagination-inconsistency-tp4334312.html
Sent from the Solr - User mailing list archive at Nabble.com.


[ANNOUNCEMENT] Luke 6.5.0 released

2017-05-09 Thread Tomoko Uchida
Download the release zip here:
https://github.com/DmitryKey/luke/releases/tag/luke-6.5.0

Also, tested with Lucene 6.5.1.

#86 
and, other changes in this release:


Thanks to respective contributors!

-- 
Tomoko Uchida


Could not initialize class JdbcSynonymFilterFactory

2017-05-09 Thread sajjad karimi
http://stackoverflow.com/questions/43857712/could-not-initialize-class-jdbcsynonymfilterfactory
:

I'm new to solr, I want to add a field type with JdbcSynonymFilter and
JdbcStopFilter to solr schema. I added my data source same as instruction
in this link: [Loading stopwords from Postgresql to Solr6][1]

then i configured managed-schema with code below:


 



 


I added solr-jdbc to dist folder, postgressql driver, beanutils and
dbutils to contrib/jdbc/lib folder. Then, I included libs in solrconfig.xml
of data_driven_schema_configs:


  

I encountered the following error when I was trying to start SolrCloud.

> "Could not initialize class
com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory,trace=java.lang.NoClassDefFoundError:
Could not initialize class
com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory"


  [1]:
http://stackoverflow.com/questions/43724758/loading-stopwords-from-postgresql-to-solr6?noredirect=1#comment74559858_43724758


Re: distribution of leader and replica in SolrCloud

2017-05-09 Thread Erick Erickson
Bernd:

You rarely have to worry about who the leader is unless and until you
get many 100s of shards. The extra work a leader does is usually
minimal and spending time trying to control where the leaders live is
usually time wasted. Leaders will shift from replica to replica
anyway. Say your leader for shard1 is on instance1, shard1_replica1.
Then you shut instance1 down. The leader will shift to some other
replica in the shard, say shard1_replica4.

If you insist you can use the collections API BALANCESHARDUNIQUE and
REBALANCELEADERS. The former assigns a "preferredLeader" role to one
replica for each shard and the latter tries to make those replicas the
real leader. If you really want to go all-out you can use
ADDREPLICAPROP to make the replica of your choice the preferredLeader.

But this is generally a waste of time and energy. Those abilities were
added for a case where 100s of leaders wound up being in the same JVM
and the performance impact was noticeable. And even if you do assign
the preferredLeader role, that is just a hint, not a requirement. The
collection will tend to have the specified replicas be the leaders,
but only "tend".

Best,
Erick

On Tue, May 9, 2017 at 5:35 AM, Shawn Heisey  wrote:
> On 5/9/2017 1:44 AM, Bernd Fehling wrote:
>> From my point of view it is a good solution to have 5 virtual 64GB
>> servers on 5 different huge physical machines and start 2 instances on
>> each virtual server.
>
> If the total amount of memory in the virtual machine is 64GB, then I
> would run one Solr node on it with a heap size between 8 and 16GB.  The
> rest of the memory in the virtual machine would then be available to
> cache whatever index data exists.  That caching is extremely important
> for good performance.
>
> If the *heap* size is what would be 64GB (and you actually do need that
> much heap), then it *does* make sense to split that into two instances,
> each with a 31GB heap.  I would argue that it's better to have those two
> instances on separate machines.
>
> Assuming that you have a bare metal server with 256GB of RAM, you would
> *not* want to divide that up into five virtual machines each with 64GB.
> The physical host would not have enough memory for all five virtual
> machines.  It would have the option of using its disk space as extra
> memory, but as soon as you start swapping memory to disk, performance of
> ANY software becomes unacceptable.  Solr in particular requires actual
> real memory.  Oversubscribing memory on VMs might work for some
> workloads, but it won't work for Solr.
>
> If all your virtual machines are running on the same physical host, then
> you have no redundancy.  Modern servers have redundant power supplies,
> redundant hard drives, and other kinds of fault tolerance.  Even so,
> there are many components in a server that have no redundancy, like the
> motherboard, or the backplane.  If one of those components were to die,
> all of the virtual machines would go down.
>
> Thanks,
> Shawn
>


Re: Solr Query Limits

2017-05-09 Thread Alexandre Rafalovitch
I am not aware of any limits in Solr itself. However, if you are using
a GET request to do the query, you may be running into browser
limitations regarding URL length.

It may be useful to know that Solr can accept the query parameters in
the POST body as well.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 9 May 2017 at 10:19, Adnan Shaikh  wrote:
> Hello Team,
>
> Have a query  pertaining to how many values are we able to pass in a Solr 
> query.
>
> Can we please find out if:
>
> 1. There is a limit to the number of characters that we can pass in a
> Solr query field?
> 2. Is there a limit to how many values we can pass for the one key?
>
> Thanks,
> Mohammad Adnan Shaikh


Re: SOLR [child] - ChildDocTransformerFactory very slow in my usecase...

2017-05-09 Thread Erick Erickson
Pulling stored fields out involved reading the file from disk,
decompressing it and then adding it to the output packet. The QTime
doesn't include assembling (and transmitting of course) the packet.
I'd guess you'll see pretty heavy I/O usage during the remainder.

Not sure what I'd recommend, here...

Best,
Erick

On Tue, May 9, 2017 at 7:24 AM, Mihai Bucica  wrote:
> I have around 60 millions documents in my XYZ SOLR 6.5 core (20 GB total
> size)
>
> 6 million of them are root documents and each of them have on average
> 10(ten) Nested Documents (_childDocuments_ stuff..representing the named
> entities from the root document )
>
> I need to get for a list of 1 to 300 root documents 'ids' both the root
> document and their Nested Documents.
>
> I ve used for this  ChildDocTransformerFactory but although the 'qtime' for
> the query is always small (maximum 50 ms) it takes between 5-10 seconds for
> the response to be completed.
>
> The response size from solr is between 100kbytes and 2mb and its not a
> matter of network latency since the first time i run the query is slow
> and after is almost instant.
>
> I guess i might have the wrong architecture for my application or solr is
> not good for my usecase?


SOLR [child] - ChildDocTransformerFactory very slow in my usecase...

2017-05-09 Thread Mihai Bucica
I have around 60 millions documents in my XYZ SOLR 6.5 core (20 GB total
size)

6 million of them are root documents and each of them have on average
10(ten) Nested Documents (_childDocuments_ stuff..representing the named
entities from the root document )

I need to get for a list of 1 to 300 root documents 'ids' both the root
document and their Nested Documents.

I ve used for this  ChildDocTransformerFactory but although the 'qtime' for
the query is always small (maximum 50 ms) it takes between 5-10 seconds for
the response to be completed.

The response size from solr is between 100kbytes and 2mb and its not a
matter of network latency since the first time i run the query is slow
and after is almost instant.

I guess i might have the wrong architecture for my application or solr is
not good for my usecase?


Solr Query Limits

2017-05-09 Thread Adnan Shaikh
Hello Team,

Have a query  pertaining to how many values are we able to pass in a Solr query.

Can we please find out if:

1. There is a limit to the number of characters that we can pass in a
Solr query field?
2. Is there a limit to how many values we can pass for the one key?

Thanks,
Mohammad Adnan Shaikh


Re: Solr SSL setup with bought certificate

2017-05-09 Thread Steve Rowe
Hi,

AFAICT the Solr 5.5.4 install_solr_service.sh doesn’t support Centos (support 
was added in 6.3: SOLR-9475).  How did you make it work?

I’m guessing there are permissions problems in your installation directory, 
such that the account being used to start Solr doesn’t have execute and/or read 
permission somewhere under /opt/solr-5.5.4/.

The install script sets up permissions like this:

-
chown -R root: "$SOLR_INSTALL_DIR"
find "$SOLR_INSTALL_DIR" -type d -print0 | xargs -0 chmod 0755
find "$SOLR_INSTALL_DIR" -type f -print0 | xargs -0 chmod 0644
chmod -R 0755 "$SOLR_INSTALL_DIR/bin”
-

--
Steve
www.lucidworks.com

> On May 9, 2017, at 4:17 AM, Sebastjanas  wrote:
> 
> Hello,
> 
> I installed 5.5.4 on Centos to /opt/solr. Also I installed init script
> using install_solr_service.sh. I've imported bought certificate to keystore
> and now trying to start it up with SSL, using following settings in
> /etc/default/solr.in.sh:
> 
> SOLR_SSL_ENABLED=true
> SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.jks
> SOLR_SSL_KEY_STORE_PASSWORD=[password]
> #SOLR_SSL_TRUST_STORE=etc/keystore.jks
> #SOLR_SSL_TRUST_STORE_PASSWORD=[password]
> SOLR_SSL_NEED_CLIENT_AUTH=false
> SOLR_SSL_WANT_CLIENT_AUTH=false
> 
> But it doesn't start with following error:
> 
>1629 WARN  (main) [   ] o.e.j.u.c.AbstractLifeCycle FAILED
> SslContextFactory@564fabc8(etc/solr-ssl.keystore.jks,):
> java.io.FileNotFoundException: /opt/solr-5.5.4/server (Is a directory)
> java.io.FileNotFoundException: /opt/solr-5.5.4/server (Is a directory)
>at java.io.FileInputStream.open0(Native Method)
>at java.io.FileInputStream.open(FileInputStream.java:195)
>at java.io.FileInputStream.(FileInputStream.java:138)
>at 
> org.eclipse.jetty.util.resource.FileResource.getInputStream(FileResource.java:290)
>at 
> org.eclipse.jetty.util.security.CertificateUtils.getKeyStore(CertificateUtils.java:43)
>at 
> org.eclipse.jetty.util.ssl.SslContextFactory.loadTrustStore(SslContextFactory.java:884)
>at 
> org.eclipse.jetty.util.ssl.SslContextFactory.doStart(SslContextFactory.java:274)
>at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
>at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
>at 
> org.eclipse.jetty.server.SslConnectionFactory.doStart(SslConnectionFactory.java:64)
>at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
>at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
>at 
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:256)
>at 
> org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:81)
>at 
> org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
>at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>at org.eclipse.jetty.server.Server.doStart(Server.java:366)
>at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>at 
> org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1255)
>at java.security.AccessController.doPrivileged(Native Method)
>at 
> org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1174)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:498)
>at org.eclipse.jetty.start.Main.invokeMain(Main.java:321)
>at org.eclipse.jetty.start.Main.start(Main.java:817)
>at org.eclipse.jetty.start.Main.main(Main.java:112)
> 1631 INFO  (coreLoadExecutor-6-thread-1) [   ] o.a.s.c.SolrConfig
> Adding specified lib dirs to ClassLoader
> 1634 WARN  (main) [   ] o.e.j.u.c.AbstractLifeCycle FAILED
> SslConnectionFactory@74fe5c40{SSL-http/1.1}:
> java.io.FileNotFoundException: /opt/solr-5.5.4/server (Is a directory)
> java.io.FileNotFoundException: /opt/solr-5.5.4/server (Is a directory)
>at java.io.FileInputStream.open0(Native Method)
>at java.io.FileInputStream.open(FileInputStream.java:195)
>at java.io.FileInputStream.(FileInputStream.java:138)
>at 
> org.eclipse.jetty.util.resource.FileResource.getInputStream(FileResource.java:290)
>at 
> org.eclipse.jetty.util.security.CertificateUtils.getKeyStore(CertificateUtils.java:43)
>at 
> 

Re: distribution of leader and replica in SolrCloud

2017-05-09 Thread Shawn Heisey
On 5/9/2017 1:44 AM, Bernd Fehling wrote:
> From my point of view it is a good solution to have 5 virtual 64GB
> servers on 5 different huge physical machines and start 2 instances on
> each virtual server. 

If the total amount of memory in the virtual machine is 64GB, then I
would run one Solr node on it with a heap size between 8 and 16GB.  The
rest of the memory in the virtual machine would then be available to
cache whatever index data exists.  That caching is extremely important
for good performance.

If the *heap* size is what would be 64GB (and you actually do need that
much heap), then it *does* make sense to split that into two instances,
each with a 31GB heap.  I would argue that it's better to have those two
instances on separate machines.

Assuming that you have a bare metal server with 256GB of RAM, you would
*not* want to divide that up into five virtual machines each with 64GB. 
The physical host would not have enough memory for all five virtual
machines.  It would have the option of using its disk space as extra
memory, but as soon as you start swapping memory to disk, performance of
ANY software becomes unacceptable.  Solr in particular requires actual
real memory.  Oversubscribing memory on VMs might work for some
workloads, but it won't work for Solr.

If all your virtual machines are running on the same physical host, then
you have no redundancy.  Modern servers have redundant power supplies,
redundant hard drives, and other kinds of fault tolerance.  Even so,
there are many components in a server that have no redundancy, like the
motherboard, or the backplane.  If one of those components were to die,
all of the virtual machines would go down.

Thanks,
Shawn



Re: Add new Solr Node to existing Solr setup

2017-05-09 Thread Shawn Heisey
On 5/9/2017 6:01 AM, Venkateswarlu Bommineni wrote:
> But I don't see replication factor is increased in Solr, It's still
> showing as 2 after adding third also. 

The replicationFactor parameter is ONLY used at collection creation.  It
has zero purpose after that ... unless you have indexes in HDFS and Solr
is explicitly aware of this through the use of HDFSDirectoryFactory.

Adding a replica does NOT change replicationFactor.  SolrCloud will
correctly manage the new replica, but that number in the collection
definition doesn't change.  It is set at creation time.  It can be
manually changed later, but unless you're using HDFS, there's never any
reason to change it.

Thanks,
Shawn



Re: Could not initialize class JdbcSynonymFilterFactory

2017-05-09 Thread Shawn Heisey
On 5/9/2017 6:06 AM, sajjad karimi wrote:
> http://stackoverflow.com/questions/43857712/could-not-initialize-class-jdbcsynonymfilterfactory
>
> I'm new to solr, I want to add a field type with JdbcSynonymFilter and
> JdbcStopFilter to solr schema. I added my data source same as instruction
> in this link: [Loading stopwords from Postgresql to Solr6][1]

You asked this same question an hour ago.  An hour is not enough time
for a mailing list to respond.  You should wait several days for an answer.

This is third party software.  Your best bet is to ask the authors of
that software how to get it working.  If they believe that the problems
are not related to their software, then you can bring the information
they give you back to this list.

Thanks,
Shawn



Could not initialize class JdbcSynonymFilterFactory

2017-05-09 Thread sajjad karimi
http://stackoverflow.com/questions/43857712/could-not-initialize-class-jdbcsynonymfilterfactory
:

I'm new to solr, I want to add a field type with JdbcSynonymFilter and
JdbcStopFilter to solr schema. I added my data source same as instruction
in this link: [Loading stopwords from Postgresql to Solr6][1]

then i configured managed-schema with code below:


 



 


I added solr-jdbc to dist folder, postgressql driver, beanutils and
dbutils to contrib/jdbc/lib folder. Then, I included libs in solrconfig.xml
of data_driven_schema_configs:


  

I encountered the following error when I was trying to start SolrCloud.

> "Could not initialize class
com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory,trace=java.lang.NoClassDefFoundError:
Could not initialize class
com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory"


  [1]:
http://stackoverflow.com/questions/43724758/loading-stopwords-from-postgresql-to-solr6?noredirect=1#comment74559858_43724758


Re: Solr licensing for commercial product.

2017-05-09 Thread Shawn Heisey
On 5/9/2017 5:50 AM, vrindavda wrote:
> One more question. I found below snippet in license file. Do I need to
> mention my product owner details in highlighted section?
>
>   APPENDIX: How to apply the Apache License to your work.

Those requirements only come into play if you want YOUR software
licensed under the Apache license.

You said you wanted to use Solr in a commercial product.  Most
commercial software will NOT have an open source license.

I strongly recommend that you get your company's lawyer involved when
you are working out how your own software will be licensed.

Thanks,
Shawn



Re: Add new Solr Node to existing Solr setup

2017-05-09 Thread Venkateswarlu Bommineni
Cool..
Thanks, Shawn.
 It's worked.

But I don't see replication factor is increased in Solr, It's still showing
as 2 after adding third also.


Thanks,
Venkat.


On Tue, May 9, 2017 at 5:17 PM, Shawn Heisey  wrote:

> On 5/9/2017 5:31 AM, Venkateswarlu Bommineni wrote:
> > As you mentioned in para2, I have created a new node and started using
> > below command but I could not get any option to name a node.
> >
> > as the name of the node is required for addingreplica to existing
> solrcloud.
> >
> > Could you please help me where to find the name of the node?
>
> If you look at the example in the reference guide for ADDREPLICA, you'll
> see that the node parameter on that example is
> "|node=192.167.1.2:8983_solr|".
>
> Most of this information can be seen by clicking on the "Cloud" tab and
> looking at the Graph.  Each node there will have a name or IP address
> and a port, which will be the first part of the information you need for
> the node parameter on ADDREPLICA.
>
> The "_solr" part of the example's node parameter is related to the
> context path on that node, and "_solr" is nearly guaranteed to be the
> correct value, because official support for changing the context path
> disappeared in 6.0 with the new UI.
>
> Thanks,
> Shawn
>
>


Re: Solr licensing for commercial product.

2017-05-09 Thread vrindavda
Thanks Shawn,

One more question. I found below snippet in license file. Do I need to
mention my product owner details in highlighted section?


  APPENDIX: How to apply the Apache License to your work.

  To apply the Apache License to your work, attach the following
  boilerplate notice, with the fields enclosed by brackets "[]"
  replaced with your own identifying information. (Don't include
  the brackets!)  The text should be enclosed in the appropriate
  comment syntax for the file format. We also recommend that a
  file or class name and description of purpose be included on the
  same "printed page" as the copyright notice for easier
  identification within third-party archives.

*   Copyright [] [name of copyright owner]*

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-licensing-for-commercial-product-tp4334146p4334230.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Add new Solr Node to existing Solr setup

2017-05-09 Thread Shawn Heisey
On 5/9/2017 5:31 AM, Venkateswarlu Bommineni wrote:
> As you mentioned in para2, I have created a new node and started using
> below command but I could not get any option to name a node.
>
> as the name of the node is required for addingreplica to existing solrcloud.
>
> Could you please help me where to find the name of the node?

If you look at the example in the reference guide for ADDREPLICA, you'll
see that the node parameter on that example is
"|node=192.167.1.2:8983_solr|".

Most of this information can be seen by clicking on the "Cloud" tab and
looking at the Graph.  Each node there will have a name or IP address
and a port, which will be the first part of the information you need for
the node parameter on ADDREPLICA.

The "_solr" part of the example's node parameter is related to the
context path on that node, and "_solr" is nearly guaranteed to be the
correct value, because official support for changing the context path
disappeared in 6.0 with the new UI.

Thanks,
Shawn



RE: Solr 6.4. Can't index MS Visio vsdx files

2017-05-09 Thread Allison, Timothy B.
Probably better to ask on the Tika list.  We'll push the release asap after 
PDFBox 2.0.6 is out.  Andreas plans to cut the release candidate for PDFBox 
this Friday.  Tika will probably have an RC by Monday 5/15, with the release 
happening later in the week...That's if there are no surprises...[2]

You can get a recent build if you'd like to test [1].

Best,

  Tim

[1] https://builds.apache.org/view/Tika/job/Tika-trunk/
[2] If you are curious, for the comparison reports btwn PDFBox 2.0.5 and 
2.0.6-SNAPSHOT on ~500k pdfs, see: 
http://162.242.228.174/reports/reports_pdfbox_2_0_6.tar.gz
 
-Original Message-
From: Gytis Mikuciunas [mailto:gyt...@gmail.com] 
Sent: Tuesday, May 9, 2017 7:17 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 6.4. Can't index MS Visio vsdx files

Are there any news regarding Tika 1.15? Maybe it's already ready for download 
somewhere

G.

On Wed, Apr 12, 2017 at 6:57 PM, Allison, Timothy B. 
wrote:

> The release candidate for POI was just cut...unfortunately, I think 
> after Nick Burch fixed the 'PolylineTo' issue...thank you, btw, for opening 
> that!
>
> That'll be done within a week unless there are surprises.  Once that's 
> out, I have to update a few things, but I'd think we'd have a 
> candidate for Tika a week later, then a week for release.
>
> You can get nightly builds here: https://builds.apache.org/
>
> Please ask on the POI or Tika users lists for how to get the 
> latest/latest running, and thank you, again, for opening the issue on POI's 
> Bugzilla.
>
> Best,
>
>Tim
>
> -Original Message-
> From: Gytis Mikuciunas [mailto:gyt...@gmail.com]
> Sent: Wednesday, April 12, 2017 1:00 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 6.4. Can't index MS Visio vsdx files
>
> when 1.15 will be released? maybe you have some beta version and I 
> could test it :)
>
> SAX sounds interesting, and from info that I found in google it could 
> solve my issues.
>
> On Tue, Apr 11, 2017 at 10:48 PM, Allison, Timothy B. 
> 
> wrote:
>
> > It depends.  We've been trying to make parsers more, erm, flexible, 
> > but there are some problems from which we cannot recover.
> >
> > Tl;dr there isn't a short answer.  :(
> >
> > My sense is that DIH/ExtractingDocumentHandler is intended to get 
> > people up and running with Solr easily but it is not really a great 
> > idea for production.  See Erick's gem: https://lucidworks.com/2012/ 
> > 02/14/indexing-with-solrj/
> >
> > As for the Tika portion... at the very least, Tika _shouldn't_ cause 
> > the ingesting process to crash.  At most, it should fail at the file 
> > level and not cause greater havoc.  In practice, if you're 
> > processing millions of files from the wild, you'll run into bad 
> > behavior and need to defend against permanent hangs, oom, memory leaks.
> >
> > Also, at the least, if there's an exception with an embedded file, 
> > Tika should catch it and keep going with the rest of the file.  If 
> > this doesn't happen let us know!  We are aware that some types of 
> > embedded file stream problems were causing parse failures on the 
> > entire file, and we now catch those in Tika 1.15-SNAPSHOT and don't 
> > let them percolate up through the parent file (they're reported in 
> > the
> metadata though).
> >
> > Specifically for your stack traces:
> >
> > For your initial problem with the missing class exceptions -- I 
> > thought we used to catch those in docx and log them.  I haven't been 
> > able to track this down, though.  I can look more if you have a need.
> >
> > For "Caused by: org.apache.poi.POIXMLException: Invalid 'Row_Type'
> > name 'PolylineTo' ", this problem might go away if we implemented a 
> > pure SAX parser for vsdx.  We just did this for docx and pptx 
> > (coming in 1.15) and these are more robust to variation because they 
> > aren't requiring a match with the ooxml schema.  I haven't looked 
> > much at vsdx, but that _might_ help.
> >
> > For "TODO Support v5 Pointers", this isn't supported and would 
> > require contributions.  However, I agree that POI shouldn't throw a 
> > Runtime exception.  Perhaps open an issue in POI, or maybe we should 
> > catch this special example at the Tika level?
> >
> > For "Caused by: java.lang.ArrayIndexOutOfBoundsException:", the POI 
> > team _might_ be able to modify the parser to ignore a stream if 
> > there's an exception, but that's often a sign that something needs 
> > to be fixed with the parser.  In short, the solution will come from POI.
> >
> > Best,
> >
> >  Tim
> >
> > -Original Message-
> > From: Gytis Mikuciunas [mailto:gyt...@gmail.com]
> > Sent: Tuesday, April 11, 2017 1:56 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: Solr 6.4. Can't index MS Visio vsdx files
> >
> > Thanks for your responses.
> > Are there any posibilities to ignore parsing errors and continue
> indexing?
> > because now solr/tika stops parsing whole 

Re: SOLR as nosql database store

2017-05-09 Thread Shawn Heisey
On 5/9/2017 12:58 AM, Bharath Kumar wrote:
> Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas, will 
> that not serve as backup when something goes wrong? Also we use latest solr 6 
> and from the documentation of solr, the indexing performance has been good. 
> The reason is that we are using MySQL as the primary data store and the 
> performance might not be optimal if we write data at a very rapid rate. 
> Already we index almost half the fields that are in MySQL in solr.

A replica is protection against data loss in the event of hardware
failure, but there are classes of problems that it cannot protect against.

Although Solr (Lucene) does try *really* hard to never lose data that it
hasn't been asked to delete, it is not designed to be a database.  It's
a search engine.  Solr doesn't offer the same kinds of guarantees about
the data it contains that software like MySQL does.

I personally don't recommend trying to use Solr as a primary data store,
but if that's what you really want to do, then I would suggest that you
have two complete Solr installs, with multiple replicas on both.  One of
them will be used for searching and have a configuration you're already
familiar with, the other will be purely for data storage -- only certain
fields like the uniqueKey will be indexed, but every other field will be
stored only.

Running with two separate Solr installs will allow you to optimize one
for searching and the other for data storage.  The searching install
will be able to rebuild itself from the data storage install when that
is required.  If better performance is needed for the rebuild, you have
the option of writing a multi-threaded or multi-process program that
reads from one and writes to the other.

Thanks,
Shawn



Re: Could not initialize class JdbcSynonymFilterFactory

2017-05-09 Thread Amrit Sarkar
Just gathering more information on this Solr-JDBC;

Is it a open source plugin provided on https://github.com/shopping24/ and
not part of actual project *lucene-solr* project?

https://github.com/shopping24/solr-jdbc-synonyms


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, May 9, 2017 at 4:30 PM, sajjad karimi  wrote:

> http://stackoverflow.com/questions/43857712/could-not-initialize-class-
> jdbcsynonymfilterfactory
> :
>
>
> I'm new to solr, I want to add a field type with JdbcSynonymFilter and
> JdbcStopFilter to solr schema. I added my data source same as instruction
> in this link: [Loading stopwords from Postgresql to Solr6][1]
>
> then i configured managed-schema with code below:
>
> 
>  
>  pattern="[\s]+"
> />
>  class="com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory"
>sql="SELECT concat(term, '=>', use) as line FROM thesaurus;"
>dataSource="jdbc/dsTest" ignoreCase="false" expand="true" />
>  class="com.s24.search.solr.analysis.jdbc.JdbcStopFilterFactory"
> sql="SELECT stopword FROM stopwords"
> dataSource="jdbc/dsTest"/>
>  
> 
>
> I added solr-jdbc to dist folder, postgressql driver, beanutils and
> dbutils to contrib/jdbc/lib folder. Then, I included libs in solrconfig.xml
> of data_driven_schema_configs:
>
>  regex=".*\.jar" />
>regex="solr-jdbc-\d.*\.jar" />
>
> I encountered the following error when I was trying to start SolrCloud.
>
> > "Could not initialize class
> com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory,
> trace=java.lang.NoClassDefFoundError:
> Could not initialize class
> com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory"
>
>
>   [1]:
> http://stackoverflow.com/questions/43724758/loading-
> stopwords-from-postgresql-to-solr6?noredirect=1#comment74559858_43724758
>


Re: Add new Solr Node to existing Solr setup

2017-05-09 Thread Venkateswarlu Bommineni
Hi Shawn,

As you mentioned in para2, I have created a new node and started using
below command but I could not get any option to name a node.

as the name of the node is required for addingreplica to existing solrcloud.

Could you please help me where to find the name of the node?

Thanks,
Venkat.

On Tue, May 2, 2017 at 6:02 PM, Shawn Heisey  wrote:

> On 5/2/2017 4:24 AM, Venkateswarlu Bommineni wrote:
> > We have Solr setup with below configuration.
> >
> > 1) 1 collection with one shard
> > 2)  4 Solr Nodes
> > 2)  and replication factor 4 with one replication to each Solr Node.
> >
> > as of now, it's working fine.But going forward it Size may reach high and
> > we would need to add new Node.
> >
> > Could you guys please suggest any idea?
>
> I'm assuming SolrCloud, because you said "collection" and "replication
> factor" which are SolrCloud concepts.
>
> As soon as you start the new node pointing at your zookeeper ensemble,
> it will be part of the cluster and will accept requests for any
> collection in the cluster.  No index data will end up on the new node
> until you take action with the Collections API, though.
>
> One way to put data on the new node is the ADDREPLICA action.  Another
> is to create a brand new collection with the shard and replication
> characteristics you want, and use the new collection instead of the old
> one, or create an alias to use whatever name you like.  You can use
> SPLITSHARD and then ADDREPLICA/DELETEREPLICA to put *some* of the data
> from an existing collection on the new node.
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> I think the way I would proceed is to create a brand new collection set
> up with the correct number of shards and replicas to use the new node,
> populate that collection, delete the old collection, and set up a
> collection alias so that the new collection can be accessed with the old
> collection's name.
>
> Thanks,
> Shawn
>
>


Re: Solr licensing for commercial product.

2017-05-09 Thread Shawn Heisey
On 5/9/2017 12:28 AM, vrindavda wrote:
> Please let me know what all things do I need to consider for licensing
> before shipping solr with commercial product.
>
> How will Solr know that what client is using it.

There are essentially no restrictions for using Solr in a commercial
product.  Solr is licensed under the Apache 2.0 license, which is one of
the most permissive open source licenses around.

http://www.apache.org/foundation/license-faq.html

Solr will NOT know what client is using it, unless you provide that
information in requests from the client and modify Solr to recognize
that information.  As shipped from Apache, Solr has absolutely no
restrictions on which clients are allowed access.  That kind of
restriction would go against open source ideals.  You can enable certain
kinds of access restrictions, like authentication, as long as the client
you're using supports what you enable.

Thanks,
Shawn



Re: Solr 6.4. Can't index MS Visio vsdx files

2017-05-09 Thread Gytis Mikuciunas
Are there any news regarding Tika 1.15? Maybe it's already ready for
download somewhere

G.

On Wed, Apr 12, 2017 at 6:57 PM, Allison, Timothy B. 
wrote:

> The release candidate for POI was just cut...unfortunately, I think after
> Nick Burch fixed the 'PolylineTo' issue...thank you, btw, for opening that!
>
> That'll be done within a week unless there are surprises.  Once that's
> out, I have to update a few things, but I'd think we'd have a candidate for
> Tika a week later, then a week for release.
>
> You can get nightly builds here: https://builds.apache.org/
>
> Please ask on the POI or Tika users lists for how to get the latest/latest
> running, and thank you, again, for opening the issue on POI's Bugzilla.
>
> Best,
>
>Tim
>
> -Original Message-
> From: Gytis Mikuciunas [mailto:gyt...@gmail.com]
> Sent: Wednesday, April 12, 2017 1:00 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 6.4. Can't index MS Visio vsdx files
>
> when 1.15 will be released? maybe you have some beta version and I could
> test it :)
>
> SAX sounds interesting, and from info that I found in google it could
> solve my issues.
>
> On Tue, Apr 11, 2017 at 10:48 PM, Allison, Timothy B. 
> wrote:
>
> > It depends.  We've been trying to make parsers more, erm, flexible,
> > but there are some problems from which we cannot recover.
> >
> > Tl;dr there isn't a short answer.  :(
> >
> > My sense is that DIH/ExtractingDocumentHandler is intended to get
> > people up and running with Solr easily but it is not really a great
> > idea for production.  See Erick's gem: https://lucidworks.com/2012/
> > 02/14/indexing-with-solrj/
> >
> > As for the Tika portion... at the very least, Tika _shouldn't_ cause
> > the ingesting process to crash.  At most, it should fail at the file
> > level and not cause greater havoc.  In practice, if you're processing
> > millions of files from the wild, you'll run into bad behavior and need
> > to defend against permanent hangs, oom, memory leaks.
> >
> > Also, at the least, if there's an exception with an embedded file,
> > Tika should catch it and keep going with the rest of the file.  If
> > this doesn't happen let us know!  We are aware that some types of
> > embedded file stream problems were causing parse failures on the
> > entire file, and we now catch those in Tika 1.15-SNAPSHOT and don't
> > let them percolate up through the parent file (they're reported in the
> metadata though).
> >
> > Specifically for your stack traces:
> >
> > For your initial problem with the missing class exceptions -- I
> > thought we used to catch those in docx and log them.  I haven't been
> > able to track this down, though.  I can look more if you have a need.
> >
> > For "Caused by: org.apache.poi.POIXMLException: Invalid 'Row_Type'
> > name 'PolylineTo' ", this problem might go away if we implemented a
> > pure SAX parser for vsdx.  We just did this for docx and pptx (coming
> > in 1.15) and these are more robust to variation because they aren't
> > requiring a match with the ooxml schema.  I haven't looked much at
> > vsdx, but that _might_ help.
> >
> > For "TODO Support v5 Pointers", this isn't supported and would require
> > contributions.  However, I agree that POI shouldn't throw a Runtime
> > exception.  Perhaps open an issue in POI, or maybe we should catch
> > this special example at the Tika level?
> >
> > For "Caused by: java.lang.ArrayIndexOutOfBoundsException:", the POI
> > team _might_ be able to modify the parser to ignore a stream if
> > there's an exception, but that's often a sign that something needs to
> > be fixed with the parser.  In short, the solution will come from POI.
> >
> > Best,
> >
> >  Tim
> >
> > -Original Message-
> > From: Gytis Mikuciunas [mailto:gyt...@gmail.com]
> > Sent: Tuesday, April 11, 2017 1:56 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: Solr 6.4. Can't index MS Visio vsdx files
> >
> > Thanks for your responses.
> > Are there any posibilities to ignore parsing errors and continue
> indexing?
> > because now solr/tika stops parsing whole document if it finds any
> > exception
> >
> > On Apr 11, 2017 19:51, "Allison, Timothy B."  wrote:
> >
> > > You might want to drop a note to the dev or user's list on Apache POI.
> > >
> > > I'm not extremely familiar with the vsd(x) portion of our code base.
> > >
> > > The first item ("PolylineTo") may be caused by a mismatch btwn your
> > > doc and the ooxml spec.
> > >
> > > The second item appears to be an unsupported feature.
> > >
> > > The third item may be an area for improvement within our
> > > codebase...I can't tell just from the stacktrace.
> > >
> > > You'll probably get more helpful answers over on POI.  Sorry, I
> > > can't help with this...
> > >
> > > Best,
> > >
> > >Tim
> > >
> > > P.S.
> > > >  3.1. ooxml-schemas-1.3.jar instead of poi-ooxml-schemas-3.15.jar
> > >
> > > You shouldn't 

Re: SOLR as nosql database store

2017-05-09 Thread Rick Leir
The NoSQL DB can be Mongo Couch or something else. Choose a document DB by 
preference. You can add to these faster than MySQL (I think, test for sure). 
These DB's can have replicas easily. Choose one of them and use a simple script 
to index into Solr. Cheers -- Rick

On May 9, 2017 2:58:21 AM EDT, Bharath Kumar  wrote:
>Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
>will
>that not serve as backup when something goes wrong? Also we use latest
>solr
>6 and from the documentation of solr, the indexing performance has been
>good. The reason is that we are using MySQL as the primary data store
>and
>the performance might not be optimal if we write data at a very rapid
>rate.
>Already we index almost half the fields that are in MySQL in solr.
>
>On Mon, May 8, 2017 at 9:24 PM, Dave 
>wrote:
>
>> You will want to have both solr and a sql/nosql data storage option.
>They
>> serve different purposes
>>
>>
>> > On May 8, 2017, at 10:43 PM, bharath.mvkumar
>
>> wrote:
>> >
>> > Hi All,
>> >
>> > We have a use case where we have mysql database which stores
>documents
>> and
>> > also some of the fields in the document is also indexed in solr.
>> > We plan to move all those documents to solr by making solr as the
>nosql
>> > datastore for storing those documents. The reason we plan to do
>this is
>> > because we have to support cross center data replication for both
>mysql
>> and
>> > solr and we are in a way duplicating the same data.The number of
>writes
>> we
>> > do per second is around 10,000. Also currently we have only one
>shard
>> and we
>> > have around 70 million records and we plan to support close to 1
>billion
>> > records and also perform sharding.
>> >
>> > Using solr as the nosql database is a good choice or should we look
>at
>> > Cassandra for our use case?
>> >
>> > Thanks,
>> > Bharath Kumar
>> >
>> >
>> >
>> > --
>> > View this message in context: http://lucene.472066.n3.
>> nabble.com/SOLR-as-nosql-database-store-tp4334119.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
>-- 
>Thanks & Regards,
>Bharath MV Kumar
>
>"Life is short, enjoy every moment of it"

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Search inside grouping list

2017-05-09 Thread Emir Arnautovic
Can you try reproducing this issue on fresh Solr, and if you manage to, 
can you please share documents and steps to reproduce it.


Which version of Solr do you run and do you have any custom plugins 
running on it?


Emir


On 09.05.2017 13:01, donjose wrote:

Yes. I am getting the same result for both q and  fq



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-inside-grouping-list-tp4333488p4334206.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Could not initialize class JdbcSynonymFilterFactory

2017-05-09 Thread sajjad karimi
http://stackoverflow.com/questions/43857712/could-not-initialize-class-jdbcsynonymfilterfactory
:


I'm new to solr, I want to add a field type with JdbcSynonymFilter and
JdbcStopFilter to solr schema. I added my data source same as instruction
in this link: [Loading stopwords from Postgresql to Solr6][1]

then i configured managed-schema with code below:


 



 


I added solr-jdbc to dist folder, postgressql driver, beanutils and
dbutils to contrib/jdbc/lib folder. Then, I included libs in solrconfig.xml
of data_driven_schema_configs:


  

I encountered the following error when I was trying to start SolrCloud.

> "Could not initialize class
com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory,trace=java.lang.NoClassDefFoundError:
Could not initialize class
com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory"


  [1]:
http://stackoverflow.com/questions/43724758/loading-stopwords-from-postgresql-to-solr6?noredirect=1#comment74559858_43724758


Re: Solr licensing for commercial product.

2017-05-09 Thread Rick Leir
This was discussed here just a few weeks ago. See the archives. If I was at my 
desk I would send you a link. Cheers -- Rick

On May 9, 2017 2:28:06 AM EDT, vrindavda  wrote:
>Hello,
>
>Please let me know what all things do I need to consider for licensing
>before shipping solr with commercial product.
>
>How will Solr know that what client is using it.
>
>Thank you,
>Vrinda Davda 
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Solr-licensing-for-commercial-product-tp4334146.html
>Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Search inside grouping list

2017-05-09 Thread donjose
Yes. I am getting the same result for both q and  fq



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-inside-grouping-list-tp4333488p4334206.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search inside grouping list

2017-05-09 Thread Emir Arnautovic

Do you get the same result if you use q instead of fq?


On 09.05.2017 07:38, donjose wrote:

Hi Emir,

Grouping by default is part of the configuration


  true
  assetid
  true


Don.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-inside-grouping-list-tp4333488p4334136.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: distribution of leader and replica in SolrCloud

2017-05-09 Thread Bernd Fehling
Hi Erik,

just went through
https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement

I might be wrong but I didn't see anything to identify the "leader".
To solve my problem with a rule:
--> "do not create the replica on the same host where his leader exists"

May be something like "rule=replica:*,host:!leader" or anything similar.
But role "leader" is not available.

Any other rule to make this possible?

I must admit it is very flexible but also very complicated.

Regards,
Bernd

Am 08.05.2017 um 17:47 schrieb Erick Erickson:
> Also, you can specify custom placement rules, see:
> https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement
> 
> But Shawn's statement is the nub of what you're seeing, by default
> multiple JVMs on the same physical machine are considered separate
> Solr instances.
> 
> Also note that if you want to, you can specify a nodeSet when you
> create the nodes, and in particular the special value EMPTY. That'll
> create a collection with no replicas and you can ADDREPLICA to
> precisely place each one if you require that level of control.
> 
> Best,
> Erick
> 
> On Mon, May 8, 2017 at 7:44 AM, Shawn Heisey  wrote:
>> On 5/8/2017 5:38 AM, Bernd Fehling wrote:
>>> boss -- shard1 - server2:7574
>>>| |-- server2:8983 (leader)
>>
>> The reason that this happened is because you've got two nodes running on
>> every server.  From SolrCloud's perspective, there are ten distinct
>> nodes, not five.
>>
>> SolrCloud doesn't notice the fact that different nodes are running on
>> the same server(s).  If your reaction to hearing this is that it
>> *should* notice, you're probably right, but in a typical use case, each
>> server should only be running one Solr instance, so this would never happen.
>>
>> There is only one instance where I can think of where I would recommend
>> running multiple instances per server, and that is when the required
>> heap size for a single instance would be VERY large.  Running two
>> instances with smaller heaps can yield better performance.
>>
>> See this issue:
>>
>> https://issues.apache.org/jira/browse/SOLR-6027
>>
>> Thanks,
>> Shawn
>>


Solr SSL setup with bought certificate

2017-05-09 Thread Sebastjanas
Hello,

I installed 5.5.4 on Centos to /opt/solr. Also I installed init script
using install_solr_service.sh. I've imported bought certificate to keystore
and now trying to start it up with SSL, using following settings in
/etc/default/solr.in.sh:

SOLR_SSL_ENABLED=true
SOLR_SSL_KEY_STORE=etc/solr-ssl.keystore.jks
SOLR_SSL_KEY_STORE_PASSWORD=[password]
#SOLR_SSL_TRUST_STORE=etc/keystore.jks
#SOLR_SSL_TRUST_STORE_PASSWORD=[password]
SOLR_SSL_NEED_CLIENT_AUTH=false
SOLR_SSL_WANT_CLIENT_AUTH=false

But it doesn't start with following error:

1629 WARN  (main) [   ] o.e.j.u.c.AbstractLifeCycle FAILED
SslContextFactory@564fabc8(etc/solr-ssl.keystore.jks,):
java.io.FileNotFoundException: /opt/solr-5.5.4/server (Is a directory)
java.io.FileNotFoundException: /opt/solr-5.5.4/server (Is a directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at 
org.eclipse.jetty.util.resource.FileResource.getInputStream(FileResource.java:290)
at 
org.eclipse.jetty.util.security.CertificateUtils.getKeyStore(CertificateUtils.java:43)
at 
org.eclipse.jetty.util.ssl.SslContextFactory.loadTrustStore(SslContextFactory.java:884)
at 
org.eclipse.jetty.util.ssl.SslContextFactory.doStart(SslContextFactory.java:274)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
at 
org.eclipse.jetty.server.SslConnectionFactory.doStart(SslConnectionFactory.java:64)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
at 
org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:256)
at 
org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:81)
at 
org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.server.Server.doStart(Server.java:366)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1255)
at java.security.AccessController.doPrivileged(Native Method)
at 
org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1174)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:321)
at org.eclipse.jetty.start.Main.start(Main.java:817)
at org.eclipse.jetty.start.Main.main(Main.java:112)
1631 INFO  (coreLoadExecutor-6-thread-1) [   ] o.a.s.c.SolrConfig
Adding specified lib dirs to ClassLoader
1634 WARN  (main) [   ] o.e.j.u.c.AbstractLifeCycle FAILED
SslConnectionFactory@74fe5c40{SSL-http/1.1}:
java.io.FileNotFoundException: /opt/solr-5.5.4/server (Is a directory)
java.io.FileNotFoundException: /opt/solr-5.5.4/server (Is a directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at 
org.eclipse.jetty.util.resource.FileResource.getInputStream(FileResource.java:290)
at 
org.eclipse.jetty.util.security.CertificateUtils.getKeyStore(CertificateUtils.java:43)
at 
org.eclipse.jetty.util.ssl.SslContextFactory.loadTrustStore(SslContextFactory.java:884)
at 
org.eclipse.jetty.util.ssl.SslContextFactory.doStart(SslContextFactory.java:274)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
at 
org.eclipse.jetty.server.SslConnectionFactory.doStart(SslConnectionFactory.java:64)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
at 

Re: Suggester uses lots of 'Page cache' memory

2017-05-09 Thread Damien Kamerman
Memory/cache aside, the fundamental Solr issue is that the Suggester build
operation will read the entire index, even though very few docs have the
relevant fields.

Is there a way to set a 'fq' on the Suggester build?

   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:135)
at
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:138)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState.document(CompressingStoredFieldsReader.java:560)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.document(CompressingStoredFieldsReader.java:576)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:583)
at org.apache.lucene.index.CodecReader.document(CodecReader.java:88)
at
org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:411)
at
org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:411)
at
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:118)
at
org.apache.lucene.index.IndexReader.document(IndexReader.java:381)
at
org.apache.lucene.search.suggest.DocumentDictionary$DocumentInputIterator.next(DocumentDictionary.java:165)
at
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:300)
- locked <0x0004b8f29260> (a java.lang.Object)
at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:190)
at
org.apache.solr.spelling.suggest.SolrSuggester.build(SolrSuggester.java:178)
at
org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:179)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)


On 3 May 2017 at 12:47, Damien Kamerman  wrote:

> Thanks Shawn, I'll have to look closer into this.
>
> On 3 May 2017 at 12:10, Shawn Heisey  wrote:
>
>> On 5/2/2017 6:46 PM, Damien Kamerman wrote:
>> > Shalin, yes I think it's a case of the Suggester build hitting the index
>> > all at once. I'm thinking it's hitting all docs, even the ones without
>> > fields relevant to the suggester.
>> >
>> > Shawn, I am using ZFS, though I think it's comparable to other setups.
>> > mmap() should still be faster, while the ZFS ARC cache may prefer more
>> > memory that other OS disk caches.
>> >
>> > So, it sounds like I enough memory/swap to hold the entire index. When
>> will
>> > the memory be released? On a commit?
>> > https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/
>> store/MMapDirectory.html
>> > talks about a bug on the close().
>>
>> What I'm going to describe below is how things *normally* work on most
>> operating systems (think Linux or Windows) with most filesystems.  If
>> ZFS is different, and it sounds like it might be, then that's something
>> for you to discuss with Oracle.
>>
>> Normally, MMap doesn't *allocate* any memory -- so there's nothing to
>> release later.  It asks the operating system to map the file's contents
>> to a section of virtual memory, and then the program accesses that
>> memory block directly.
>>
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>>
>> A typical OS takes care of translating accesses to MMap virtual memory
>> into disk accesses, and uses available system memory to cache the data
>> that's read so a subsequent access of the same data is super fast.
>>
>> On most operating systems, memory in the disk cache is always available
>> to programs that request it for an allocation.
>>
>> ZFS uses a completely separate piece of memory for caching -- the ARC
>> cache.  I do not know if the OS is able to release memory from that
>> cache when a program requests it.  My experience with ZFS on Linux  (not
>> with Solr) suggests that the ARC cache holds onto memory a lot tighter
>> than the standard OS disk cache.  ZFS on Solaris might be a different
>> animal, though.
>>
>> I'm finding conflicting information regarding MMap problems on ZFS.
>> Some sources say that memory usage is doubled (data in both the standard
>> page cache and the arc cache), some say that this is not a general
>> problem.  This is probably a question for Oracle to answer.
>>
>> You don't want to count swap space when looking at how much memory you
>> 

Re: distribution of leader and replica in SolrCloud

2017-05-09 Thread Bernd Fehling
I would name your solution more a work around as any similar solution of this 
kind.
The issue SOLR-6027 is now 3 years open and the world has changed.
Instead of racks full of blades where you had many dedicated bare metal servers
you have now huge machines with 256GB RAM and many CPUs. Virtualization has 
taken place.
To get under these conditions some independance from the physical hardware you 
have
to spread the shards across several physical machines with virtual servers.
>From my point of view it is a good solution to have 5 virtual 64GB servers
on 5 different huge physical machines and start 2 instances on each virtual 
server.
If I would split up each 64GB virtual server into two 32GB virtual server there 
would
be no gain. We don't have 10 huge machines (no security win) and we have to 
admin
and control 10 virtual servers instead of 5 (plus zookeeper servers).

It is state of the art that you don't have to care about the servers within
the cloud. This is the main sense of a cloud.
The leader should always be aware who are the members of his cloud, how to reach
them (IP address) and how are the users of the cloud (collections) distributed
across the cloud.

It would be great if a solution of issue SOLR-6027 would lead to some kind of
"automatic mode" for server distribution, without any special configuring.

Regards,
Bernd


Am 08.05.2017 um 17:47 schrieb Erick Erickson:
> Also, you can specify custom placement rules, see:
> https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement
> 
> But Shawn's statement is the nub of what you're seeing, by default
> multiple JVMs on the same physical machine are considered separate
> Solr instances.
> 
> Also note that if you want to, you can specify a nodeSet when you
> create the nodes, and in particular the special value EMPTY. That'll
> create a collection with no replicas and you can ADDREPLICA to
> precisely place each one if you require that level of control.
> 
> Best,
> Erick
> 
> On Mon, May 8, 2017 at 7:44 AM, Shawn Heisey  wrote:
>> On 5/8/2017 5:38 AM, Bernd Fehling wrote:
>>> boss -- shard1 - server2:7574
>>>| |-- server2:8983 (leader)
>>
>> The reason that this happened is because you've got two nodes running on
>> every server.  From SolrCloud's perspective, there are ten distinct
>> nodes, not five.
>>
>> SolrCloud doesn't notice the fact that different nodes are running on
>> the same server(s).  If your reaction to hearing this is that it
>> *should* notice, you're probably right, but in a typical use case, each
>> server should only be running one Solr instance, so this would never happen.
>>
>> There is only one instance where I can think of where I would recommend
>> running multiple instances per server, and that is when the required
>> heap size for a single instance would be VERY large.  Running two
>> instances with smaller heaps can yield better performance.
>>
>> See this issue:
>>
>> https://issues.apache.org/jira/browse/SOLR-6027
>>
>> Thanks,
>> Shawn
>>


Re: SOLR as nosql database store

2017-05-09 Thread Bharath Kumar
Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas, will
that not serve as backup when something goes wrong? Also we use latest solr
6 and from the documentation of solr, the indexing performance has been
good. The reason is that we are using MySQL as the primary data store and
the performance might not be optimal if we write data at a very rapid rate.
Already we index almost half the fields that are in MySQL in solr.

On Mon, May 8, 2017 at 9:24 PM, Dave  wrote:

> You will want to have both solr and a sql/nosql data storage option. They
> serve different purposes
>
>
> > On May 8, 2017, at 10:43 PM, bharath.mvkumar 
> wrote:
> >
> > Hi All,
> >
> > We have a use case where we have mysql database which stores documents
> and
> > also some of the fields in the document is also indexed in solr.
> > We plan to move all those documents to solr by making solr as the nosql
> > datastore for storing those documents. The reason we plan to do this is
> > because we have to support cross center data replication for both mysql
> and
> > solr and we are in a way duplicating the same data.The number of writes
> we
> > do per second is around 10,000. Also currently we have only one shard
> and we
> > have around 70 million records and we plan to support close to 1 billion
> > records and also perform sharding.
> >
> > Using solr as the nosql database is a good choice or should we look at
> > Cassandra for our use case?
> >
> > Thanks,
> > Bharath Kumar
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> nabble.com/SOLR-as-nosql-database-store-tp4334119.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Solr licensing for commercial product.

2017-05-09 Thread vrindavda
Hello,

Please let me know what all things do I need to consider for licensing
before shipping solr with commercial product.

How will Solr know that what client is using it.

Thank you,
Vrinda Davda 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-licensing-for-commercial-product-tp4334146.html
Sent from the Solr - User mailing list archive at Nabble.com.