Re: [ANNOUNCE] YCSB 0.11.0 released

2016-09-22 Thread Alexandre Rafalovitch
Sorry, what is YCSB? The email does not say, the link does not say.

Solr connection is not exception except to say that this release has
not changed it and so - whatever the YCSB is - this specific update
not really relevant to announce to the Solr community.

Perhaps you meant to send that to the ElasticSearch list instead? At
least that's mentioned in the release notes :-)

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 23 September 2016 at 06:42, Govind Kamat  wrote:
> On behalf of the development community, I'm pleased to announce the
> release of YCSB version 0.11.0.
>
> Highlights:
> * Support for ArangoDB.  This is a new binding.
> * Update to Apache Geode (incubating) to improve memory footprint.
> * "couchbase" client deprecated in favor of "couchbase2".
> * Capability to specify TTL for Couchbase2.
> * Various Elasticsearch improvements.
> * Kudu binding updated for version 0.9.0.
> * Fix for issue with hdrhistogram+raw.
> * Performance optimizations for BasicDB and RandomByteIterator.
>
> Full release notes, including links to source and convenience binaries:
> https://github.com/brianfrankcooper/YCSB/releases/tag/0.11.0
>
> This release covers changes since the beginning of July.
>
> Govind


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Erick Erickson
If you can break these up into tokens somehow, that's clearly best. But from the
patterns you show it's not likely. WordDelimiterFactory won't quite
work since it
wouldn't be able to separate ASEF into the token SEF.

You'll have a _lot_ fewer terms if you don't use edgengram. Try just
using bigrams (i.e. NGramFilterFactory) with both mingram and maxgram set
to 2.

Now you do phrase searches (also automatic) on pairs. So in your example
some of the pairs are:
#o
of
ff
f-

To find off, you search for the _phrase_ "of ff". There'll be some
fiddling here to
make it all work.

Best,
Erick

On Thu, Sep 22, 2016 at 11:49 AM, slee  wrote:
> Alex,
>
> You do have a point with EdgeNGramFilterFactory. As requested, I've attached
> a sample screenshotfor your review.
> 
>
> Erick,
>
> Here's my use-case. Assume I have the following term stored in global_Value
> as such:
> - executionvenuetype#*OFF*-FACILITY
> - partyid#B2A*SEF*9AJP5P9OLL1190
>
> Now, I want to retrieve any document matching the term in global_Value that
> contains the keyword: "off" and "sef". With regards to leading wild-card,
> that's intentional. Not a mail issue. These fields typically contains Guid,
> and some financial terms (eg: Bonds, swaps, etc..). If I don't use any
> non-wildcard, then it's an exact match. But my use-case dictates that it
> should retrieve if it's a partial match.
>
> So what's my best bet for analyzer in such cases ?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297542.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Heap memory usage is -1 in UI

2016-09-22 Thread Shawn Heisey
On 9/22/2016 4:59 PM, Yago Riveiro wrote:
> The Heap Memory Usage in the UI it's always -1. There is some way to
> get the amount of heap that a core consumes?

In all the versions that I have looked at, up to 6.0, this number is
either entirely too small or -1.

Looking into the code, this info comes from the /admin/luke handler, and
that handler gets it from Lucene.  The -1 appears to come into play when
the reader object is not the expected type, so I'm guessing that past
changes in Lucene require changes in Solr that have not yet happened. 
Even if the code is fixed so the reader object(s) are calculated
correctly, that won't be enough information for a true picture of core
memory usage.

In order for this number to be accurate, size information from other
places, such as Lucene caches and Solr caches, must also be included. 
There might also be memory structures involved that I haven't even
thought of.  It is entirely possible that the code to gather all this
information does not yet exist.

In my opinion, the Heap Memory statistic should be removed until a time
when it can be overhauled so that it is as accurate as possible.  Can
you open an issue in Jira?

Thanks,
Shawn



Re: Solr Cloud prevent Ping Request From Forwarding Request

2016-09-22 Thread Shawn Heisey
On 9/22/2016 11:33 AM, jimtronic wrote:
> Boxes 1,2, and 3 have replicas of collections dogs and cats. Box 4 has
> only a replica of dogs. All of these boxes have a healthcheck file on
> them that works with the PingRequestHandler to say whether the box is
> up or not. If I hit Box4/cats/admin/ping, Solr forwards the ping
> request to another box which returns with status OK. Is there anyway
> to stop a box from forwarding a request to another node?

SolrCloud assures that as long as a node is functional, and the
collection is whole *somewhere* in the cloud, requests will work, even
if the node you're talking to has absolutely no data from that collection.

What exactly are you trying to determine with your ping handler?

If you're trying to check the status of the cores on each specific
machine, that's not the way to do it.  Try  sending a request directly
to the core (cats_shard1_replica1, for example) with distrib=false. 
That should remain entirely local.  The core name will typically be
different for every server that contains replicas for that collection,
which can make automation difficult.  You can get a list of cores for a
machine with a call to the CoreAdmin API.

To check whether the MACHINE is working, independent from any core or
collection, use a request for something global, like the LIST command on
the Collections API.

A request to /solr//admin/ping is a check to make sure the
*collection* is working.  It's reasonable in that scenario for SolrCloud
to forward the request to wherever it needs to go, and to load balance
the requests across the cloud -- that's what it is designed to do.  If
the request works, then the machine must be working ...but you can also
be sure that the collection is working, wherever it might live.

If you use "distrib=false" with a URL containing the *collection* name
(instead of the specific core name), the distrib parameter is probably
ignored, because satisfying a request sent to the collection name
requires a distributed lookup in zookeeper data just to learn where the
collection lives.  I do not have a large enough cloud install to check
whether this is true.

Thanks,
Shawn



Re: Very Slow Commits After Solr Index Optimization

2016-09-22 Thread Shawn Heisey
On 9/22/2016 3:27 PM, vsolakhian wrote:
> This is not the cause of the problem though. The disk cache is
> important for queries and overall performance during optimization, but
> once it is done, everything should go back to "normal" (whatever that
> normal is). In our case it is the SOFT COMMIT (that opens a new
> Searcher) that takes 10 times longer AFTER the index was optimized and
> deleted records were removed (and index size went down to 60 GB).

It's difficult to say without hard numbers, and that is complicated by
my very limited understanding of how HDFS gets cached.

"Normal" is achieved only when relevant data is in the disk cache. 
Which will most likely not be the case after an optimize, unless you
have enough caching memory for both the before and after index to fit at
the same time.  Similar performance issues are likely to occur right
after a server reboot.

A soft commit opens a new searcher.  When a new searcher is opened, the
*Solr* caches (which are entirely different from the disk cache) look at
their autowarmCount settings.  Each cache gathers the top N queries
contained in the cache, up to the autowarmCount number, and proceeds to
execute the those queries on the index to create a brand new cache for
the new searcher.  The new searcher is not put into place until the
warming is done.  The commit will not finish until the new searcher is
online.

If the info sitting in the OS disk cache when the warming queries happen
is not useful for fast queries, then those queries will be very slow,
which makes the commit take longer.

For better commit times, reduce autowarmCount on your Solr caches.  This
will make it more likely that users will notice slow queries, though.

Good Solr performance with large indexes requires a LOT of memory.  The
amount required is usually very surprising to admins.

Thanks,
Shawn



Re: How to retrieve parent documents without a nested structure (block-join)

2016-09-22 Thread Alexandre Rafalovitch
Why not a traditional join?
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 23 September 2016 at 00:16, Shamik Bandopadhyay  wrote:
> Hi,
>
>   I have a set of documents indexed which has a pseudo parent-child
> relationship. Each child document has a reference to the parent document
> through an ID. As the documents are not available to the crawler in order,
> I'm not able to index them in a nested structure to support
> block-join.Here's an example of a dataset in index right now.
>
> 
>   1
>   Parent title
>   123
> 
> 
>   2
>   Child title1
>   123
> 
> 
>   3
>   Child title2
>   123
> 
> 
>   4
>   Misc title2
> 
>
> As per my requirement, if I search on "title2", the result should bring
> back the following result, the parent document (id=1) and non-related
> document (id=4).
>
> 
>   1
>   Parent title
>   123
> 
> 
>   4
>   Misc title2
> 
>
> This is similar in lines with Block Join Parent Query Parser where I could
> have fired a query like : q={!parent
> which="content_type:parentDocument"}title:title2
>
> Not sure if the Graph Query Parser can be a relevant solution in this
> regard. The problem I see there is I'm running on 5.5 with 2 shard and n
> number of replicas. The graph query parser seems to be designed for a
> single node/single shard.
>
> This is tad urgent for me as I'm trying to come up with an approach to deal
> with this. Any pointers will be highly appreciated.
>
> Thanks,
> Shamik


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Alexandre Rafalovitch
Not fully clear still, but perhaps you need several fields, at least one of
which just contains your SEF and OFF values serving effectively as binary
switches (FQ matches). And then maybe you strip the leading IDs that you
are not matching on.

Remember your Solr data shape does not need to match your original data
shape. Especially with extra fields that you could get through copyField
commands or through UpdateRequestProcessor duplicates. And you don't need
to store those duplicates, just index them for most effective search.

And yes, reversing filter and edge ngram together mean you don't need a
wildcard queries.

Regards,
Alex

On 23 Sep 2016 1:49 AM, "slee"  wrote:

> Alex,
>
> You do have a point with EdgeNGramFilterFactory. As requested, I've
> attached
> a sample screenshotfor your review.
> 
>
> Erick,
>
> Here's my use-case. Assume I have the following term stored in global_Value
> as such:
> - executionvenuetype#*OFF*-FACILITY
> - partyid#B2A*SEF*9AJP5P9OLL1190
>
> Now, I want to retrieve any document matching the term in global_Value that
> contains the keyword: "off" and "sef". With regards to leading wild-card,
> that's intentional. Not a mail issue. These fields typically contains Guid,
> and some financial terms (eg: Bonds, swaps, etc..). If I don't use any
> non-wildcard, then it's an exact match. But my use-case dictates that it
> should retrieve if it's a partial match.
>
> So what's my best bet for analyzer in such cases ?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-
> tp4297255p4297542.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


[ANNOUNCE] YCSB 0.11.0 released

2016-09-22 Thread Govind Kamat
On behalf of the development community, I'm pleased to announce the
release of YCSB version 0.11.0.

Highlights:
* Support for ArangoDB.  This is a new binding.
* Update to Apache Geode (incubating) to improve memory footprint.
* "couchbase" client deprecated in favor of "couchbase2".
* Capability to specify TTL for Couchbase2.
* Various Elasticsearch improvements.
* Kudu binding updated for version 0.9.0.
* Fix for issue with hdrhistogram+raw.
* Performance optimizations for BasicDB and RandomByteIterator. 

Full release notes, including links to source and convenience binaries:
https://github.com/brianfrankcooper/YCSB/releases/tag/0.11.0

This release covers changes since the beginning of July.

Govind


Re: Heap memory usage is -1 in UI

2016-09-22 Thread Alexandre Rafalovitch
What version of Solr and which Operating System is that on?

Regards,
Alex

On 23 Sep 2016 5:59 AM, "Yago Riveiro"  wrote:

> The Heap Memory Usage in the UI it's always -1.
>
> There is some way to get the amount of heap that a core consumes?
>
>
>
> -
> Best regards
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Heap-memory-usage-is-1-in-UI-tp4297601.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


SolrCloud query logs, change from 4.10 to 5.5

2016-09-22 Thread Elaine Cario
We're in the process of upgrading from SolrCloud 4.10 to 5.5, and we
noticed a change in how distributed queries get logged.

In Solr 4.10 we noted that the original node receiving the query logged the
query with a full hit count and elapsed time for the entire query, using
the original request handler (we don't use the default /select handler).
The other nodes logged the queries sent out from the original node, using
the /select request handler.  These entries just included the query stats
from that particular node/shard.

This made it easy when log-diving to differentiate between the stats for
the entire query completion, vs the individual stats for each shard, and we
were also able to detect any unexpected network latencies between the
shards.

But now we are finding in Solr 5.5 that each shard just logs its own stats,
using the original request handler and there's no log entry for the query
as a whole.  This is making some of our existing log analysis difficult
when we try to tie it back to our other application logs.

So, I have 2 questions:

- is there a way to force a log entry for the complete query?
- is there some definitive way to link together all the log entries for a
query across the shards, e.g. some query parameter placed there by Solr?
 (In some cases our applications due add a custom param with a transaction
ID, but it's not consistent and I wonder if Solr is doing something or can
be configured to add something)

Thanks.


Heap memory usage is -1 in UI

2016-09-22 Thread Yago Riveiro
The Heap Memory Usage in the UI it's always -1.

There is some way to get the amount of heap that a core consumes?



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Heap-memory-usage-is-1-in-UI-tp4297601.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stream expressions: Break up multivalue field into usable tuples

2016-09-22 Thread Joel Bernstein
You could use the facet() expression which works with multi-value fields.
This emits aggregated tuples useful for recommendations. For example:

facet(baskets,
 q="item:taco",
 buckets="item",
 bucketSorts="count(*) desc",
 bucketSizeLimit="100",
 count(*))

You can feed this to scoreNodes() to score the tuples for a recommendation.
scoreNodes is a graph expression so it expects tuples to be formatted like
a node set. Specifically it looks for the following fields: node, field and
collection, which it uses to retrieve the IDF for each node.

The select() function can turn your facet response into a node set, so
scoreNodes can operate on it:

scoreNodes(
select(facet(baskets,
 q="item:taco",
 buckets="item",
 bucketSorts="count(*) desc",
 bucketSizeLimit=100,
 count(*)),
   item as node,
   count(*),
   replace(collection, null, withValue=baskets),
   replace(field, null, withValue=item)))

There is a ticket open to have scoreNodes operate directly on the facet()
function so you don't have to deal with
the select() function. https://issues.apache.org/jira/browse/SOLR-9537. I'd
like to get to this soon.







Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Sep 22, 2016 at 5:02 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> I have a field like follows in my search index
>
> {
>"shopper_id": 1234,
>"basket_id": 2512,
>"items_bought": ["eggs", "tacos", "nachos"]
> }
>
> {
>"shopper_id" 1236,
>"basket_id": 2515,
>"items_bought": ["eggs", "tacos", "chicken", "bubble gum"]
> }
>
> I would like to use some of the stream expression capabilities (in this
> case I'm looking at the recsys stuff) but it seems like I need to break up
> my data into tuples like
>
> {
>"shopper_id": 1234,
>"basket_id": 2512,
> "item": "egg"
> },
> {
>"shopper_id": 1234
>"basket_id": 2512,
>"item": "taco"
> }
> {
>"shopper_id": 1234
>"basket_id": 2512,
>"item": "nacho"
> }
> ...
>
> For various other reasons, I'd prefer to keep my original data model with
> Solr doc == one shopper basket.
>
> Now is there a way to take documents above, output from a search tuple
> source and apply a stream mutator to emit baskets with a field broken up
> like above? (do let me know if I'm missing something completely here)
>
> Thanks!
> -Doug
>


Re: Very Slow Commits After Solr Index Optimization

2016-09-22 Thread vsolakhian
Thanks again, Shawn.

You are completely right about the use of disk cache and the special note
regarding the optimize operation in Solr wiki.

This is not the cause of the problem though. The disk cache is important for
queries and overall performance during optimization, but once it is done,
everything should go back to "normal" (whatever that normal is). In our case
it is the SOFT COMMIT (that opens a new Searcher) that takes 10 times longer
AFTER the index was optimized and deleted records were removed (and index
size went down to 60 GB).

Regards,

Victor



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Very-Slow-Commits-After-Solr-Index-Optimization-tp4297022p4297588.html
Sent from the Solr - User mailing list archive at Nabble.com.


Stream expressions: Break up multivalue field into usable tuples

2016-09-22 Thread Doug Turnbull
I have a field like follows in my search index

{
   "shopper_id": 1234,
   "basket_id": 2512,
   "items_bought": ["eggs", "tacos", "nachos"]
}

{
   "shopper_id" 1236,
   "basket_id": 2515,
   "items_bought": ["eggs", "tacos", "chicken", "bubble gum"]
}

I would like to use some of the stream expression capabilities (in this
case I'm looking at the recsys stuff) but it seems like I need to break up
my data into tuples like

{
   "shopper_id": 1234,
   "basket_id": 2512,
"item": "egg"
},
{
   "shopper_id": 1234
   "basket_id": 2512,
   "item": "taco"
}
{
   "shopper_id": 1234
   "basket_id": 2512,
   "item": "nacho"
}
...

For various other reasons, I'd prefer to keep my original data model with
Solr doc == one shopper basket.

Now is there a way to take documents above, output from a search tuple
source and apply a stream mutator to emit baskets with a field broken up
like above? (do let me know if I'm missing something completely here)

Thanks!
-Doug


Re: SolrJ App Engine Client

2016-09-22 Thread Susheel Kumar
As per this doc, socket are allowed for paid apps. Not sure if this would
make it unrestricted.

https://cloud.google.com/appengine/docs/java/sockets/

On Thu, Sep 22, 2016 at 3:38 PM, Jay Parashar  wrote:

> I sent a similar message earlier but do not see it. Apologize if its
> duplicated.
>
> I am unable to connect to Solr Cloud zkhost (using CloudSolrClient) from a
> SolrJ client running on Google App Engine.
> The error message is "java.nio.channels.SocketChannel is a restricted
> class. Please see the Google  App Engine developer's guide for more
> details."
>
> Is there a workaround? Its required that the client is SolrJ and running
> on App Engine.
>
> Any feedback is much appreciated. Thanks
>


Re: Very Slow Commits After Solr Index Optimization

2016-09-22 Thread Shawn Heisey
On 9/22/2016 1:01 PM, vsolakhian wrote:
> Our index is in HDFS, but we did not change any configuration after we
> deleted 35% of records and optimized.
>
> The relatively slow commit (soft commit and warming up took 1.5 minutes) is
> OK for our use case (adding hundreds of thousands and even millions of
> records and then committing).
>
> The question is why it takes much longer after optimization, when disk
> caches, network and other configuration remained the same and the index is
> smaller?

When you optimize an index down to one segment, you are reading one
entire copy of the index and creating a second copy of the index.  This
is going to greatly affect the data that is in the disk cache.

Presumably you do not have enough caching memory to hold anywhere near
the entire 300GB index.  Memory sizes that large are possible, but not
common.  With HDFS, I think the amount of memory used for caching is
configurable.  I do not know if both HDFS clients and servers can do
caching, or if that's just a server-side option.  With a 300GB index,
150 to 250GB of memory should be available for caching if you want to
have stellar performance.  If you can get the entire 300GB to fit, then
you'd nearly be guaranteed good performance.

Assuming I'm right about the amount of caching memory available relative
to the index size, when the optimize is finished, chances are very good
that the particular data sitting in the disk cache is completely useless
for queries, so the first few warming and user queries will need to
actually read the *disk*, and put different data in the cache.  When
enough queries have been processed, eventually the disk cache will be
populated with enough relevant data that subsequent queries will be fast.

If there are other programs or Solr indexes competing for the same
caching memory, then the problem might be even worse.

You might want to refrain from optimizing indexes this large, at least
on a frequent basis, and just rely on normal index merging to handle
your deletes.

Optimizing is a special case when it comes to cache memory, and for
that, you need even more than in the general case.  There's a special
note about optimizes here:

https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

Thanks,
Shawn



RE: SolrJ App Engine Client

2016-09-22 Thread Jay Parashar
I am on java 7. As the GAE states, the SocketChannel is not on Google's white 
list.

Stackoverflow (the 2nd link you sent) suggests to re-invent the class. I will 
see if I come up with anything. 
Thanks John.

-Original Message-
From: John Bickerstaff [mailto:j...@johnbickerstaff.com]
Sent: Thursday, September 22, 2016 2:51 PM
To: solr-user@lucene.apache.org
Subject: [Ext] Re: SolrJ App Engine Client

Two possibilities from a quick search on the error message - both point to GAE 
NOT fully supporting Java 8

https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_29528580_how-2Dto-2Ddeal-2Dwith-2Dapp-2Dengine-2Ddevserver-2Dexception-2Ddue-2Dto-2Dformatstyle-2Drestricted-2Dcl=CwIBaQ=uGuXJ43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw=bRfqJEeedEKG5nkp5748YxbNMFrUYT3YiNl0Ni2vUBQ=FjaUoU-i-tiL8deMoKceLKxX-kgXBObYvgMAjZnac8A=5lMIyl1JJEfNqZSe80DnJ4PwWt_tpBoq3l6ZjM2EQBM=
https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_29543131_beancreationexception-2Dthrowed-2Dwhen-2Dtrying-2Dto-2Drun-2Dmy-2Dproject=CwIBaQ=uGuXJ43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw=bRfqJEeedEKG5nkp5748YxbNMFrUYT3YiNl0Ni2vUBQ=FjaUoU-i-tiL8deMoKceLKxX-kgXBObYvgMAjZnac8A=EkfJOFmbVi4fwdp1mBAnpIXC1XHnT8_eN6Jsz1PvDhw=
 


On Thu, Sep 22, 2016 at 1:38 PM, Jay Parashar  wrote:

> I sent a similar message earlier but do not see it. Apologize if its 
> duplicated.
>
> I am unable to connect to Solr Cloud zkhost (using CloudSolrClient) 
> from a SolrJ client running on Google App Engine.
> The error message is "java.nio.channels.SocketChannel is a restricted 
> class. Please see the Google  App Engine developer's guide for more 
> details."
>
> Is there a workaround? Its required that the client is SolrJ and 
> running on App Engine.
>
> Any feedback is much appreciated. Thanks
>


Re: SolrJ App Engine Client

2016-09-22 Thread Jay Parashar
No, it does not.

The error is (instead of SocketChannel) is now

Caused by: java.lang.NoClassDefFoundError: java.net.ProxySelector is a 
restricted class

And it's during an actual query (solrClient.query(query);)


-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org]
Sent: Thursday, September 22, 2016 2:59 PM
To: solr-user 
Subject: [Ext] Re: SolrJ App Engine Client

Does it work with plain HttpSolrClient?

On Thu, Sep 22, 2016 at 10:50 PM, John Bickerstaff  wrote:

> Two possibilities from a quick search on the error message - both 
> point to GAE NOT fully supporting Java 8
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_
> questions_29528580_how-2Dto-2Ddeal-2Dwith-2Dapp-2Dengine-2D=CwIBaQ
> =uGuXJ43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw=bRfqJEeedEKG5nkp5748Yxb
> NMFrUYT3YiNl0Ni2vUBQ=HDJS4ElFF2X939U2LWfIfRIdBJNLvm9q4mvpNmZp7kU=i
> 8WIpnKStYPvIRJTBTjBtqguv_nriuZMnLdBlB7pUWo=
> devserver-exception-due-to-formatstyle-restricted-cl
> https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_
> questions_29543131_beancreationexception-2Dthrowed-2D=CwIBaQ=uGuXJ
> 43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw=bRfqJEeedEKG5nkp5748YxbNMFrUY
> T3YiNl0Ni2vUBQ=HDJS4ElFF2X939U2LWfIfRIdBJNLvm9q4mvpNmZp7kU=kGg4rdS
> 7eJoNjVzljzxek-nIUeMnjxRhjETSDJzdaXY=
> when-trying-to-run-my-project
>
>
> On Thu, Sep 22, 2016 at 1:38 PM, Jay Parashar  wrote:
>
> > I sent a similar message earlier but do not see it. Apologize if its 
> > duplicated.
> >
> > I am unable to connect to Solr Cloud zkhost (using CloudSolrClient) 
> > from
> a
> > SolrJ client running on Google App Engine.
> > The error message is "java.nio.channels.SocketChannel is a 
> > restricted class. Please see the Google  App Engine developer's 
> > guide for more details."
> >
> > Is there a workaround? Its required that the client is SolrJ and 
> > running on App Engine.
> >
> > Any feedback is much appreciated. Thanks
> >
>



--
Sincerely yours
Mikhail Khludnev


RE: [Ext] Re: SolrJ App Engine Client

2016-09-22 Thread Jay Parashar
I am on java 7. As the GAE states, the SocketChannel is not on Google's white 
list.

Stackoverflow (the 2nd link you sent) suggests to re-invent the class. I will 
see if I come up with anything. 
Thanks John.

-Original Message-
From: John Bickerstaff [mailto:j...@johnbickerstaff.com] 
Sent: Thursday, September 22, 2016 2:51 PM
To: solr-user@lucene.apache.org
Subject: [Ext] Re: SolrJ App Engine Client

Two possibilities from a quick search on the error message - both point to GAE 
NOT fully supporting Java 8

https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_29528580_how-2Dto-2Ddeal-2Dwith-2Dapp-2Dengine-2Ddevserver-2Dexception-2Ddue-2Dto-2Dformatstyle-2Drestricted-2Dcl=CwIBaQ=uGuXJ43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw=bRfqJEeedEKG5nkp5748YxbNMFrUYT3YiNl0Ni2vUBQ=FjaUoU-i-tiL8deMoKceLKxX-kgXBObYvgMAjZnac8A=5lMIyl1JJEfNqZSe80DnJ4PwWt_tpBoq3l6ZjM2EQBM=
https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_29543131_beancreationexception-2Dthrowed-2Dwhen-2Dtrying-2Dto-2Drun-2Dmy-2Dproject=CwIBaQ=uGuXJ43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw=bRfqJEeedEKG5nkp5748YxbNMFrUYT3YiNl0Ni2vUBQ=FjaUoU-i-tiL8deMoKceLKxX-kgXBObYvgMAjZnac8A=EkfJOFmbVi4fwdp1mBAnpIXC1XHnT8_eN6Jsz1PvDhw=
 


On Thu, Sep 22, 2016 at 1:38 PM, Jay Parashar  wrote:

> I sent a similar message earlier but do not see it. Apologize if its
> duplicated.
>
> I am unable to connect to Solr Cloud zkhost (using CloudSolrClient) from a
> SolrJ client running on Google App Engine.
> The error message is "java.nio.channels.SocketChannel is a restricted
> class. Please see the Google  App Engine developer's guide for more
> details."
>
> Is there a workaround? Its required that the client is SolrJ and running
> on App Engine.
>
> Any feedback is much appreciated. Thanks
>


RE: [Ext] Re: SolrJ App Engine Client

2016-09-22 Thread Jay Parashar
No, it does not.

The error is (instead of SocketChannel) is now

Caused by: java.lang.NoClassDefFoundError: java.net.ProxySelector is a 
restricted class

And it's during an actual query (solrClient.query(query);)


-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org] 
Sent: Thursday, September 22, 2016 2:59 PM
To: solr-user 
Subject: [Ext] Re: SolrJ App Engine Client

Does it work with plain HttpSolrClient?

On Thu, Sep 22, 2016 at 10:50 PM, John Bickerstaff  wrote:

> Two possibilities from a quick search on the error message - both 
> point to GAE NOT fully supporting Java 8
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_
> questions_29528580_how-2Dto-2Ddeal-2Dwith-2Dapp-2Dengine-2D=CwIBaQ
> =uGuXJ43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw=bRfqJEeedEKG5nkp5748Yxb
> NMFrUYT3YiNl0Ni2vUBQ=HDJS4ElFF2X939U2LWfIfRIdBJNLvm9q4mvpNmZp7kU=i
> 8WIpnKStYPvIRJTBTjBtqguv_nriuZMnLdBlB7pUWo=
> devserver-exception-due-to-formatstyle-restricted-cl
> https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_
> questions_29543131_beancreationexception-2Dthrowed-2D=CwIBaQ=uGuXJ
> 43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw=bRfqJEeedEKG5nkp5748YxbNMFrUY
> T3YiNl0Ni2vUBQ=HDJS4ElFF2X939U2LWfIfRIdBJNLvm9q4mvpNmZp7kU=kGg4rdS
> 7eJoNjVzljzxek-nIUeMnjxRhjETSDJzdaXY=
> when-trying-to-run-my-project
>
>
> On Thu, Sep 22, 2016 at 1:38 PM, Jay Parashar  wrote:
>
> > I sent a similar message earlier but do not see it. Apologize if its 
> > duplicated.
> >
> > I am unable to connect to Solr Cloud zkhost (using CloudSolrClient) 
> > from
> a
> > SolrJ client running on Google App Engine.
> > The error message is "java.nio.channels.SocketChannel is a 
> > restricted class. Please see the Google  App Engine developer's 
> > guide for more details."
> >
> > Is there a workaround? Its required that the client is SolrJ and 
> > running on App Engine.
> >
> > Any feedback is much appreciated. Thanks
> >
>



--
Sincerely yours
Mikhail Khludnev


Re: Solr Cloud prevent Ping Request From Forwarding Request

2016-09-22 Thread jimtronic
It seems like all the parameters in the PingHandler get processed by the
remote server. So, things like shards=localhost or distrib=false take effect
too late.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-prevent-Ping-Request-From-Forwarding-Request-tp4297521p4297565.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ App Engine Client

2016-09-22 Thread Mikhail Khludnev
Does it work with plain HttpSolrClient?

On Thu, Sep 22, 2016 at 10:50 PM, John Bickerstaff  wrote:

> Two possibilities from a quick search on the error message - both point to
> GAE NOT fully supporting Java 8
>
> http://stackoverflow.com/questions/29528580/how-to-deal-with-app-engine-
> devserver-exception-due-to-formatstyle-restricted-cl
> http://stackoverflow.com/questions/29543131/beancreationexception-throwed-
> when-trying-to-run-my-project
>
>
> On Thu, Sep 22, 2016 at 1:38 PM, Jay Parashar  wrote:
>
> > I sent a similar message earlier but do not see it. Apologize if its
> > duplicated.
> >
> > I am unable to connect to Solr Cloud zkhost (using CloudSolrClient) from
> a
> > SolrJ client running on Google App Engine.
> > The error message is "java.nio.channels.SocketChannel is a restricted
> > class. Please see the Google  App Engine developer's guide for more
> > details."
> >
> > Is there a workaround? Its required that the client is SolrJ and running
> > on App Engine.
> >
> > Any feedback is much appreciated. Thanks
> >
>



-- 
Sincerely yours
Mikhail Khludnev


Re: SolrJ App Engine Client

2016-09-22 Thread John Bickerstaff
Two possibilities from a quick search on the error message - both point to
GAE NOT fully supporting Java 8

http://stackoverflow.com/questions/29528580/how-to-deal-with-app-engine-devserver-exception-due-to-formatstyle-restricted-cl
http://stackoverflow.com/questions/29543131/beancreationexception-throwed-when-trying-to-run-my-project


On Thu, Sep 22, 2016 at 1:38 PM, Jay Parashar  wrote:

> I sent a similar message earlier but do not see it. Apologize if its
> duplicated.
>
> I am unable to connect to Solr Cloud zkhost (using CloudSolrClient) from a
> SolrJ client running on Google App Engine.
> The error message is "java.nio.channels.SocketChannel is a restricted
> class. Please see the Google  App Engine developer's guide for more
> details."
>
> Is there a workaround? Its required that the client is SolrJ and running
> on App Engine.
>
> Any feedback is much appreciated. Thanks
>


SolrJ App Engine Client

2016-09-22 Thread Jay Parashar
I sent a similar message earlier but do not see it. Apologize if its duplicated.

I am unable to connect to Solr Cloud zkhost (using CloudSolrClient) from a 
SolrJ client running on Google App Engine.
The error message is "java.nio.channels.SocketChannel is a restricted class. 
Please see the Google  App Engine developer's guide for more details."

Is there a workaround? Its required that the client is SolrJ and running on App 
Engine.

Any feedback is much appreciated. Thanks


RE: Disabling Zip bomb detection in Tika

2016-09-22 Thread Allison, Timothy B.
Not sure what to do with this one.

The triggering document has a run of ~50  starts and then ~50+  
starts.  So, y, Tika limits nested elements to 100.

Tika's DefaultHtmlMapper only passes through a few handfuls of elements 
(SAFE_ELEMENTS), not including  or . 

Solr's MostlyPassThroughHtmlMapper passes through, well, mostly everything.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, September 22, 2016 12:47 PM
To: solr-user 
Subject: Re: Disabling Zip bomb detection in Tika

So far a Tika JIRA seems like the right thing. Tim is "a well known entity"
in Solr though so I'm sure he'll move it over to Solr if appropriate.

Erick

On Thu, Sep 22, 2016 at 9:43 AM, Rodrigo Rosenfeld Rosas 
 wrote:
> Here it is. Not sure if it's clear enough though:
>
> https://issues.apache.org/jira/browse/TIKA-2091
>
> Or should I have created the ticket in the Solr project instead?
>
>
> Em 22-09-2016 13:32, Rodrigo Rosenfeld Rosas escreveu:
>>
>> This is one of the documents:
>>
>>
>> https://www.sec.gov/Archives/edgar/data/1472033/000119380513001310/e6
>> 11133_f6ef-eutelsat.htm
>>
>> I'll try to create a ticket for this on Jira if I find its location 
>> but feel free to open it yourself if you prefer, just let me know.
>>
>> Em 22-09-2016 12:33, Allison, Timothy B. escreveu:

 I'll try to get a sample HTML yielding to this problem and attach 
 it to Jira.
>>>
>>> Great!  Tika 1.14 is around the corner...if this is an easy fix ... 
>>> :)
>>>
>>> Thank you.
>>>
>>
>>
>


Re: Very Slow Commits After Solr Index Optimization

2016-09-22 Thread vsolakhian
Hi Shawn,

Thank you for response. Everything you said is correct in general.

Our index is in HDFS, but we did not change any configuration after we
deleted 35% of records and optimized.

The relatively slow commit (soft commit and warming up took 1.5 minutes) is
OK for our use case (adding hundreds of thousands and even millions of
records and then committing).

The question is why it takes much longer after optimization, when disk
caches, network and other configuration remained the same and the index is
smaller?

Thanks,

Victor



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Very-Slow-Commits-After-Solr-Index-Optimization-tp4297022p4297548.html
Sent from the Solr - User mailing list archive at Nabble.com.


Merging two seperate Solr Indexs

2016-09-22 Thread Lakshmi
Hi Everone,

  we are redesigning our site and doing this in phases. we have solr as our
search engine, our new site data set is different from the old one and is
indexes into the new core. now we need to search accross both new and old
cores to show the results.  
1.how do we search across two different solr cores having different schemas
using single query?
2.Can we create a new core by merging the indexes from two different schemas
and use thrird core to provide search results
3. what is the ideal way to handle these kind of situations.


Thanks,
Lakshmi.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Merging-two-seperate-Solr-Indexs-tp4297547.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud prevent Ping Request From Forwarding Request

2016-09-22 Thread Erick Erickson
Don't know if it works with ping, but try =false perhaps?

But wouldn't you still have, at best, "no such collection?" or
something?

You may have to read state(s) from Zookeeper and ping each one
directly only if it has a replica for a particular collection.

Best,
Erick

On Thu, Sep 22, 2016 at 10:33 AM, jimtronic  wrote:
> Here's the scenario:
>
> Boxes 1,2, and 3 have replicas of collections dogs and cats. Box 4 has only
> a replica of dogs.
>
> All of these boxes have a healthcheck file on them that works with the
> PingRequestHandler to say whether the box is up or not.
>
> If I hit Box4/cats/admin/ping, Solr forwards the ping request to another box
> which returns with status OK.
>
> Is there anyway to stop a box from forwarding a request to another node?
>
> Thanks!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Cloud-prevent-Ping-Request-From-Forwarding-Request-tp4297521.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread slee
Alex,

You do have a point with EdgeNGramFilterFactory. As requested, I've attached
a sample screenshotfor your review.
 

Erick,

Here's my use-case. Assume I have the following term stored in global_Value
as such:
- executionvenuetype#*OFF*-FACILITY
- partyid#B2A*SEF*9AJP5P9OLL1190

Now, I want to retrieve any document matching the term in global_Value that
contains the keyword: "off" and "sef". With regards to leading wild-card,
that's intentional. Not a mail issue. These fields typically contains Guid,
and some financial terms (eg: Bonds, swaps, etc..). If I don't use any
non-wildcard, then it's an exact match. But my use-case dictates that it
should retrieve if it's a partial match.

So what's my best bet for analyzer in such cases ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297542.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to generate multivalue fields from streaming API

2016-09-22 Thread Gus Heck
Hi Mike,

Bit late on this, but just saw it...

Using streaming to ingest has occurred to me too but I think it's not
really right for that except in fairly trivial cases. The very first big
problem you will have in the example you give is that you won't be able to
mark things as already ingested, so you have to read the whole thing every
time, one could eventually add enough features to it, but that's probably
going to feature bloat it, and change the focus from processing data
originating in solr to processing data from external sources. At that point
I think it's better for it to be a separate system, and to be set up in a
way that can be managed. Any non-trivial ingestion process using streaming
is going to be configured as a large deeply nested streaming expression,
which I fear would be very hard to read and maintain. I did a talk a while
back that went through a wishlist for document ingestion... slides here:
https://docs.google.com/presentation/d/17NhL-nfYa-d2Vx_DleXo_JC1SwiBMlfP5Zm4IEiZOYY/pub?start=false=false=5000


I do presently have a case where I use streaming to create summary records
for some data once it's in solr.

-Gus

On Fri, Sep 16, 2016 at 11:52 AM, Joel Bernstein  wrote:

> Unfortunately there currently isn't a way to split a field. But this would
> be nice functionality to add.
>
> The approach would be to an add a split operation that would be used by the
> select() function. It would look like this:
>
> select(jdbc(...), split(fieldA, delim=","), ...)
>
> This would make a good jira issue.
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Sep 16, 2016 at 11:03 AM, Mike Thomsen 
> wrote:
>
> > Read this article and thought it could be interesting as a way to do
> > ingestion:
> >
> > https://dzone.com/articles/solr-streaming-expressions-
> > for-collection-auto-upd-1
> >
> > Example from the article:
> >
> > daemon(id="12345",
> >
> >  runInterval="6",
> >
> >  update(users,
> >
> >  batchSize=10,
> >
> >  jdbc(connection="jdbc:mysql://localhost/users?user=root=solr",
> > sql="SELECT id, name FROM users", sort="id asc",
> > driver="com.mysql.jdbc.Driver")
> >
> > )
> >
> > What's the best way to handle a multivalue field using this API? Is
> > there a way to tokenize something returned in a database field?
> >
> > Thanks,
> >
> > Mike
> >
>



-- 
http://www.the111shift.com


Solr Cloud prevent Ping Request From Forwarding Request

2016-09-22 Thread jimtronic
Here's the scenario:

Boxes 1,2, and 3 have replicas of collections dogs and cats. Box 4 has only
a replica of dogs.

All of these boxes have a healthcheck file on them that works with the
PingRequestHandler to say whether the box is up or not.

If I hit Box4/cats/admin/ping, Solr forwards the ping request to another box
which returns with status OK.

Is there anyway to stop a box from forwarding a request to another node?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-prevent-Ping-Request-From-Forwarding-Request-tp4297521.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to retrieve parent documents without a nested structure (block-join)

2016-09-22 Thread Shamik Bandopadhyay
Hi,

  I have a set of documents indexed which has a pseudo parent-child
relationship. Each child document has a reference to the parent document
through an ID. As the documents are not available to the crawler in order,
I'm not able to index them in a nested structure to support
block-join.Here's an example of a dataset in index right now.


  1
  Parent title
  123


  2
  Child title1
  123


  3
  Child title2
  123


  4
  Misc title2


As per my requirement, if I search on "title2", the result should bring
back the following result, the parent document (id=1) and non-related
document (id=4).


  1
  Parent title
  123


  4
  Misc title2


This is similar in lines with Block Join Parent Query Parser where I could
have fired a query like : q={!parent
which="content_type:parentDocument"}title:title2

Not sure if the Graph Query Parser can be a relevant solution in this
regard. The problem I see there is I'm running on 5.5 with 2 shard and n
number of replicas. The graph query parser seems to be designed for a
single node/single shard.

This is tad urgent for me as I'm trying to come up with an approach to deal
with this. Any pointers will be highly appreciated.

Thanks,
Shamik


Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Erick Erickson
So far a Tika JIRA seems like the right thing. Tim is "a well known entity"
in Solr though so I'm sure he'll move it over to Solr if appropriate.

Erick

On Thu, Sep 22, 2016 at 9:43 AM, Rodrigo Rosenfeld Rosas
 wrote:
> Here it is. Not sure if it's clear enough though:
>
> https://issues.apache.org/jira/browse/TIKA-2091
>
> Or should I have created the ticket in the Solr project instead?
>
>
> Em 22-09-2016 13:32, Rodrigo Rosenfeld Rosas escreveu:
>>
>> This is one of the documents:
>>
>>
>> https://www.sec.gov/Archives/edgar/data/1472033/000119380513001310/e611133_f6ef-eutelsat.htm
>>
>> I'll try to create a ticket for this on Jira if I find its location but
>> feel free to open it yourself if you prefer, just let me know.
>>
>> Em 22-09-2016 12:33, Allison, Timothy B. escreveu:

 I'll try to get a sample HTML yielding to this problem and attach it to
 Jira.
>>>
>>> Great!  Tika 1.14 is around the corner...if this is an easy fix ... :)
>>>
>>> Thank you.
>>>
>>
>>
>


Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Rodrigo Rosenfeld Rosas

Here it is. Not sure if it's clear enough though:

https://issues.apache.org/jira/browse/TIKA-2091

Or should I have created the ticket in the Solr project instead?

Em 22-09-2016 13:32, Rodrigo Rosenfeld Rosas escreveu:

This is one of the documents:

https://www.sec.gov/Archives/edgar/data/1472033/000119380513001310/e611133_f6ef-eutelsat.htm 



I'll try to create a ticket for this on Jira if I find its location 
but feel free to open it yourself if you prefer, just let me know.


Em 22-09-2016 12:33, Allison, Timothy B. escreveu:
I'll try to get a sample HTML yielding to this problem and attach it 
to Jira.

Great!  Tika 1.14 is around the corner...if this is an easy fix ... :)

Thank you.








RE: Disabling Zip bomb detection in Tika

2016-09-22 Thread Allison, Timothy B.
Tika might be overkill for you (no one can hear us, right?).  


One thing that Tika buys you is fairly smart encoding detection for html pages. 
 Looks like Nokogiri does do some kind of encoding detection, but it may only 
read the meta-headers.  I haven't used Nokogiri, but if you're happy with the 
results of that, go for it.


-Original Message-
From: Rodrigo Rosenfeld Rosas [mailto:rr_ro...@yahoo.com.br.INVALID] 
Sent: Thursday, September 22, 2016 12:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Disabling Zip bomb detection in Tika

Great, thanks for the URL, I'll check that.

I was wondering if maybe Tika would be an overkill solution to my specific 
case. We don't index PDF, DOC or anything like that, just plain HTML.

I mean, if everything Tika does is to extract text from HTML, maybe I could get 
the same result using Nokogiri directly in Ruby and send it as plain text to 
Solr? Am I missing something? What would Tika do besides extracting the text 
from the HTML?

Thanks in advance,
Rodrigo.

Em 22-09-2016 12:11, Erick Erickson escreveu:
> Tika was upgraded from 1.7 to 1.13 in Solr 6.2 so this is likely a 
> change in Tika.
>
> You could _try_ downgrading Tika, but that's chancy and I have no 
> guarantee that it'll work.
>
> Or use a SolrJ client to use an older version of Tika and transmit it 
> to Solr, here's an example:
>
> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
>
> Best,
> Erick
>
> On Thu, Sep 22, 2016 at 8:01 AM, Rodrigo Rosenfeld Rosas 
>  wrote:
>> I forgot to mention that this problem just happened after I upgraded 
>> to a recent version of Solr and tried to reindex all documents. Some 
>> documents that had previously succeeded now failed with this error.
>>
>> Em 22-09-2016 11:58, Rodrigo Rosenfeld Rosas escreveu:
>>> Hi, thanks. I was talking to @elyograg over freenode#solr and he (or 
>>> she, can't know by the nickname) recommended me to create a Java app 
>>> integrating SolrJ and Tika to perform the indexing. Is this the only 
>>> way to achieve that with Solr? Since I'm not usually a Java 
>>> developer, I'd prefer another kind of solution, but if there isn't, 
>>> I'll have to look at the Java API and examples for SolrJ and Tika to 
>>> achieve that...
>>>
>>> Just wanted to confirm. I'll try to get a sample HTML yielding to 
>>> this problem and attach it to Jira.
>>>
>>> Thanks,
>>> Rodrigo.
>>>
>>> Em 22-09-2016 11:48, Allison, Timothy B. escreveu:
 Y, looks like Nick (gagravarr) has answered on SO -- can't do it in 
 Tika currently.

 -Original Message-
 From: Allison, Timothy B. [mailto:talli...@mitre.org]
 Sent: Thursday, September 22, 2016 10:42 AM
 To: solr-user@lucene.apache.org
 Cc: 'u...@tika.apache.org' 
 Subject: RE: Disabling Zip bomb detection in Tika

 I don't think that's configurable at the moment.

 Tika-colleagues, any recommendations?

 If you're able to share the file on Tika's jira, we'd be happy to 
 take a look.  You shouldn't be getting the zip bomb unless there is 
 a mismatch between opening and closing tags (which could point to a bug in 
 Tika).

 -Original Message-
 From: Rodrigo Rosenfeld Rosas 
 [mailto:rr_ro...@yahoo.com.br.INVALID]
 Sent: Thursday, September 22, 2016 10:06 AM
 To: solr-user@lucene.apache.org
 Subject: Disabling Zip bomb detection in Tika

 Hi, this is my first message in this list.

 Is it possible to disable Zip bomb detection in the Tika handler?

 I've also described the problem here:


 http://stackoverflow.com/questions/39628519/how-to-disable-or-incre
 ase-limit-zip-bomb-detection-in-tika-with-solr-config?noredirect=1#
 comment66575342_39628519

 Basically, I get this error when trying to process some big valid 
 HTML
 documents:

 RSolr::Error::Http - 500 Internal Server Error
 Error:

 {'responseHeader'=>{'status'=>500,'QTime'=>76},'error'=>{'metadata'=>['error-class','org.apache.solr.common.SolrException','root-error-class','org.apache.tika.sax.SecureContentHandler$SecureSAXException'],'msg'=>'org.apache.tika.exception.TikaException:
 Zip bomb detected!','trace'=>'org.apache.solr.common.SolrException:
 org.apache.tika.exception.TikaException: Zip bomb detected!
at

 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
at

 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
at
 org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
at
 

Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Rodrigo Rosenfeld Rosas

This is one of the documents:

https://www.sec.gov/Archives/edgar/data/1472033/000119380513001310/e611133_f6ef-eutelsat.htm

I'll try to create a ticket for this on Jira if I find its location but 
feel free to open it yourself if you prefer, just let me know.


Em 22-09-2016 12:33, Allison, Timothy B. escreveu:

I'll try to get a sample HTML yielding to this problem and attach it to Jira.

Great!  Tika 1.14 is around the corner...if this is an easy fix ... :)

Thank you.





Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Rodrigo Rosenfeld Rosas

Great, thanks for the URL, I'll check that.

I was wondering if maybe Tika would be an overkill solution to my 
specific case. We don't index PDF, DOC or anything like that, just plain 
HTML.


I mean, if everything Tika does is to extract text from HTML, maybe I 
could get the same result using Nokogiri directly in Ruby and send it as 
plain text to Solr? Am I missing something? What would Tika do besides 
extracting the text from the HTML?


Thanks in advance,
Rodrigo.

Em 22-09-2016 12:11, Erick Erickson escreveu:

Tika was upgraded from 1.7 to 1.13 in Solr 6.2 so this is likely a
change in Tika.

You could _try_ downgrading Tika, but that's chancy and I have no guarantee
that it'll work.

Or use a SolrJ client to use an older version of Tika and transmit it
to Solr, here's
an example:

https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Best,
Erick

On Thu, Sep 22, 2016 at 8:01 AM, Rodrigo Rosenfeld Rosas
 wrote:

I forgot to mention that this problem just happened after I upgraded to a
recent version of Solr and tried to reindex all documents. Some documents
that had previously succeeded now failed with this error.

Em 22-09-2016 11:58, Rodrigo Rosenfeld Rosas escreveu:

Hi, thanks. I was talking to @elyograg over freenode#solr and he (or she,
can't know by the nickname) recommended me to create a Java app integrating
SolrJ and Tika to perform the indexing. Is this the only way to achieve that
with Solr? Since I'm not usually a Java developer, I'd prefer another kind
of solution, but if there isn't, I'll have to look at the Java API and
examples for SolrJ and Tika to achieve that...

Just wanted to confirm. I'll try to get a sample HTML yielding to this
problem and attach it to Jira.

Thanks,
Rodrigo.

Em 22-09-2016 11:48, Allison, Timothy B. escreveu:

Y, looks like Nick (gagravarr) has answered on SO -- can't do it in Tika
currently.

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org]
Sent: Thursday, September 22, 2016 10:42 AM
To: solr-user@lucene.apache.org
Cc: 'u...@tika.apache.org' 
Subject: RE: Disabling Zip bomb detection in Tika

I don't think that's configurable at the moment.

Tika-colleagues, any recommendations?

If you're able to share the file on Tika's jira, we'd be happy to take a
look.  You shouldn't be getting the zip bomb unless there is a mismatch
between opening and closing tags (which could point to a bug in Tika).

-Original Message-
From: Rodrigo Rosenfeld Rosas [mailto:rr_ro...@yahoo.com.br.INVALID]
Sent: Thursday, September 22, 2016 10:06 AM
To: solr-user@lucene.apache.org
Subject: Disabling Zip bomb detection in Tika

Hi, this is my first message in this list.

Is it possible to disable Zip bomb detection in the Tika handler?

I've also described the problem here:


http://stackoverflow.com/questions/39628519/how-to-disable-or-increase-limit-zip-bomb-detection-in-tika-with-solr-config?noredirect=1#comment66575342_39628519

Basically, I get this error when trying to process some big valid HTML
documents:

RSolr::Error::Http - 500 Internal Server Error
Error:

{'responseHeader'=>{'status'=>500,'QTime'=>76},'error'=>{'metadata'=>['error-class','org.apache.solr.common.SolrException','root-error-class','org.apache.tika.sax.SecureContentHandler$SecureSAXException'],'msg'=>'org.apache.tika.exception.TikaException:
Zip bomb detected!','trace'=>'org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Zip bomb detected!
   at

org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
   at

org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
   at

org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
   at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
   at
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
   at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
   at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)

I need to index those documents. Is it possible to disable Zip bomb
detection or to increase the limit using configuration files? I noticed it's
possible to add a tika.config file but I have no idea on how to specify what
I want in such Tika configuration files.

Any help is appreciated!

Thanks in advance,
Rodrigo.








Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Erick Erickson
I totally missed EdgeNGram. Good catch Alex!

Yeah, that's a killer. My shot in the dark here is that
your analysis chain isn't the best choice to support your use-case and you're
shooting yourself in the foot. So let's back up and talk
about your use-case and maybe re-define your analysis
chain for better performance.

Best,
Erick

On Thu, Sep 22, 2016 at 8:21 AM, Alexandre Rafalovitch
 wrote:
> Well,
>
> I am guessing this is the line that's causing the problem:
>  maxGramSize="50"/>
>
> Run your real sample for that field against your indexing definition
> in Admin UI and see how many tokens you end up with. You may have 50
> tokens, but if each of them generates up to 47 representations..
>
> Regards,
> Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 22 September 2016 at 22:08, slee  wrote:
>> Here's what I have define in my schema:
>> 
>> 
>>   
>>   
>>   
>>   > maxGramSize="50"/>
>> 
>> 
>>   
>>   
>>   
>> 
>>   
>>
>> > required="true" stored="true"/>
>>
>> This is what I send in the query (2 values):
>> q=global_Value:*mas+AND+global_Value:*sef=text=5=2.2=explicit=global_Value
>>
>> In addition, memory is taking way over 90%, given the heap space set at 5g.
>>
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297474.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


RE: Disabling Zip bomb detection in Tika

2016-09-22 Thread Allison, Timothy B.
> I'll try to get a sample HTML yielding to this problem and attach it to Jira.

Great!  Tika 1.14 is around the corner...if this is an easy fix ... :)

Thank you.



Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Alexandre Rafalovitch
Well,

I am guessing this is the line that's causing the problem:


Run your real sample for that field against your indexing definition
in Admin UI and see how many tokens you end up with. You may have 50
tokens, but if each of them generates up to 47 representations..

Regards,
Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 22 September 2016 at 22:08, slee  wrote:
> Here's what I have define in my schema:
> 
> 
>   
>   
>   
>maxGramSize="50"/>
> 
> 
>   
>   
>   
> 
>   
>
>  required="true" stored="true"/>
>
> This is what I send in the query (2 values):
> q=global_Value:*mas+AND+global_Value:*sef=text=5=2.2=explicit=global_Value
>
> In addition, memory is taking way over 90%, given the heap space set at 5g.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297474.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Erick Erickson
Wait: Are you really doing leading wildcard queries? If so, that's
likely the root of
the problem. Unless you add ReverseWildcardFilterFactory to your
analysis chain, Lucene has to enumerate your entire set of terms to
find likely candidates,
which takes a lot of resources. What happens if you use similar
trailing wildcards? And
what happens when you use simple non-wildcard queries?

Or is this just bolding that gets translated to asterisks by the mail
formatting?

Finally, what are typical values in this field? I'm really asking if your use of
KeywordTokenizer is the best choice here. It often is, but I've seen
it mis-used so
I thought we should check.

Best,
Erick



On Thu, Sep 22, 2016 at 8:08 AM, slee  wrote:
> Here's what I have define in my schema:
> 
> 
>   
>   
>   
>maxGramSize="50"/>
> 
> 
>   
>   
>   
> 
>   
>
>  required="true" stored="true"/>
>
> This is what I send in the query (2 values):
> q=global_Value:*mas+AND+global_Value:*sef=text=5=2.2=explicit=global_Value
>
> In addition, memory is taking way over 90%, given the heap space set at 5g.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297474.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Erick Erickson
Tika was upgraded from 1.7 to 1.13 in Solr 6.2 so this is likely a
change in Tika.

You could _try_ downgrading Tika, but that's chancy and I have no guarantee
that it'll work.

Or use a SolrJ client to use an older version of Tika and transmit it
to Solr, here's
an example:

https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Best,
Erick

On Thu, Sep 22, 2016 at 8:01 AM, Rodrigo Rosenfeld Rosas
 wrote:
> I forgot to mention that this problem just happened after I upgraded to a
> recent version of Solr and tried to reindex all documents. Some documents
> that had previously succeeded now failed with this error.
>
> Em 22-09-2016 11:58, Rodrigo Rosenfeld Rosas escreveu:
>>
>> Hi, thanks. I was talking to @elyograg over freenode#solr and he (or she,
>> can't know by the nickname) recommended me to create a Java app integrating
>> SolrJ and Tika to perform the indexing. Is this the only way to achieve that
>> with Solr? Since I'm not usually a Java developer, I'd prefer another kind
>> of solution, but if there isn't, I'll have to look at the Java API and
>> examples for SolrJ and Tika to achieve that...
>>
>> Just wanted to confirm. I'll try to get a sample HTML yielding to this
>> problem and attach it to Jira.
>>
>> Thanks,
>> Rodrigo.
>>
>> Em 22-09-2016 11:48, Allison, Timothy B. escreveu:
>>>
>>> Y, looks like Nick (gagravarr) has answered on SO -- can't do it in Tika
>>> currently.
>>>
>>> -Original Message-
>>> From: Allison, Timothy B. [mailto:talli...@mitre.org]
>>> Sent: Thursday, September 22, 2016 10:42 AM
>>> To: solr-user@lucene.apache.org
>>> Cc: 'u...@tika.apache.org' 
>>> Subject: RE: Disabling Zip bomb detection in Tika
>>>
>>> I don't think that's configurable at the moment.
>>>
>>> Tika-colleagues, any recommendations?
>>>
>>> If you're able to share the file on Tika's jira, we'd be happy to take a
>>> look.  You shouldn't be getting the zip bomb unless there is a mismatch
>>> between opening and closing tags (which could point to a bug in Tika).
>>>
>>> -Original Message-
>>> From: Rodrigo Rosenfeld Rosas [mailto:rr_ro...@yahoo.com.br.INVALID]
>>> Sent: Thursday, September 22, 2016 10:06 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Disabling Zip bomb detection in Tika
>>>
>>> Hi, this is my first message in this list.
>>>
>>> Is it possible to disable Zip bomb detection in the Tika handler?
>>>
>>> I've also described the problem here:
>>>
>>>
>>> http://stackoverflow.com/questions/39628519/how-to-disable-or-increase-limit-zip-bomb-detection-in-tika-with-solr-config?noredirect=1#comment66575342_39628519
>>>
>>> Basically, I get this error when trying to process some big valid HTML
>>> documents:
>>>
>>> RSolr::Error::Http - 500 Internal Server Error
>>> Error:
>>>
>>> {'responseHeader'=>{'status'=>500,'QTime'=>76},'error'=>{'metadata'=>['error-class','org.apache.solr.common.SolrException','root-error-class','org.apache.tika.sax.SecureContentHandler$SecureSAXException'],'msg'=>'org.apache.tika.exception.TikaException:
>>> Zip bomb detected!','trace'=>'org.apache.solr.common.SolrException:
>>> org.apache.tika.exception.TikaException: Zip bomb detected!
>>>   at
>>>
>>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
>>>   at
>>>
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>>>   at
>>>
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)
>>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
>>>   at
>>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
>>>   at
>>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
>>>   at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
>>>   at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
>>>
>>> I need to index those documents. Is it possible to disable Zip bomb
>>> detection or to increase the limit using configuration files? I noticed it's
>>> possible to add a tika.config file but I have no idea on how to specify what
>>> I want in such Tika configuration files.
>>>
>>> Any help is appreciated!
>>>
>>> Thanks in advance,
>>> Rodrigo.
>>
>>
>>
>>
>


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread slee
Here's what I have define in my schema:


  
  
  
  


  
  
  

  



This is what I send in the query (2 values):
q=global_Value:*mas+AND+global_Value:*sef=text=5=2.2=explicit=global_Value

In addition, memory is taking way over 90%, given the heap space set at 5g.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297474.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Rodrigo Rosenfeld Rosas
Hi, thanks. I was talking to @elyograg over freenode#solr and he (or 
she, can't know by the nickname) recommended me to create a Java app 
integrating SolrJ and Tika to perform the indexing. Is this the only way 
to achieve that with Solr? Since I'm not usually a Java developer, I'd 
prefer another kind of solution, but if there isn't, I'll have to look 
at the Java API and examples for SolrJ and Tika to achieve that...


Just wanted to confirm. I'll try to get a sample HTML yielding to this 
problem and attach it to Jira.


Thanks,
Rodrigo.

Em 22-09-2016 11:48, Allison, Timothy B. escreveu:

Y, looks like Nick (gagravarr) has answered on SO -- can't do it in Tika 
currently.

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org]
Sent: Thursday, September 22, 2016 10:42 AM
To: solr-user@lucene.apache.org
Cc: 'u...@tika.apache.org' 
Subject: RE: Disabling Zip bomb detection in Tika

I don't think that's configurable at the moment.

Tika-colleagues, any recommendations?

If you're able to share the file on Tika's jira, we'd be happy to take a look.  
You shouldn't be getting the zip bomb unless there is a mismatch between 
opening and closing tags (which could point to a bug in Tika).

-Original Message-
From: Rodrigo Rosenfeld Rosas [mailto:rr_ro...@yahoo.com.br.INVALID]
Sent: Thursday, September 22, 2016 10:06 AM
To: solr-user@lucene.apache.org
Subject: Disabling Zip bomb detection in Tika

Hi, this is my first message in this list.

Is it possible to disable Zip bomb detection in the Tika handler?

I've also described the problem here:

http://stackoverflow.com/questions/39628519/how-to-disable-or-increase-limit-zip-bomb-detection-in-tika-with-solr-config?noredirect=1#comment66575342_39628519

Basically, I get this error when trying to process some big valid HTML
documents:

RSolr::Error::Http - 500 Internal Server Error
Error:
{'responseHeader'=>{'status'=>500,'QTime'=>76},'error'=>{'metadata'=>['error-class','org.apache.solr.common.SolrException','root-error-class','org.apache.tika.sax.SecureContentHandler$SecureSAXException'],'msg'=>'org.apache.tika.exception.TikaException:
Zip bomb detected!','trace'=>'org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Zip bomb detected!
  at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
  at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
  at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
  at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
  at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)

I need to index those documents. Is it possible to disable Zip bomb detection 
or to increase the limit using configuration files? I noticed it's possible to 
add a tika.config file but I have no idea on how to specify what I want in such 
Tika configuration files.

Any help is appreciated!

Thanks in advance,
Rodrigo.





Solr on GCE

2016-09-22 Thread Jay Parashar
Hi,

Is it possible to have SolrJ client running on Google App Engine to talk to a 
Solr instance hosted on a compute engine? The solr version is 6.2.0

There is also a similar question on Stack Overflow but no answers
http://stackoverflow.com/questions/37390072/httpsolrclient-on-google-app-engine


I am getting the following error

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1003) 
~[zookeeper-3.4.6.jar:3.4.6-1569965]
[INFO] 09:46:56.419 [main-SendThread(nlxs5139.best-nl0114.slb.com:2181)] INFO  
org.apache.zookeeper.ClientCnxn - Opening socket connection to server 
nlxs5139.best-nl0114.slb.com/199.6.212.77:2181. Will not attempt to 
authenticate using SASL (unknown error)
[INFO] 09:46:56.419 [main-SendThread(nlxs5139.best-nl0114.slb.com:2181)] WARN  
org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected 
error, closing socket connection and attempting reconnect
[INFO] java.lang.NoClassDefFoundError: java.nio.channels.SocketChannel is a 
restricted class. Please see the Google  App Engine developer's guide for more 
details.
[INFO]  at 
com.google.appengine.tools.development.agent.runtime.Runtime.reject(Runtime.java:52)
 ~[appengine-agentruntime.jar:na]


Thanks
Jay


Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Rodrigo Rosenfeld Rosas
I forgot to mention that this problem just happened after I upgraded to 
a recent version of Solr and tried to reindex all documents. Some 
documents that had previously succeeded now failed with this error.


Em 22-09-2016 11:58, Rodrigo Rosenfeld Rosas escreveu:
Hi, thanks. I was talking to @elyograg over freenode#solr and he (or 
she, can't know by the nickname) recommended me to create a Java app 
integrating SolrJ and Tika to perform the indexing. Is this the only 
way to achieve that with Solr? Since I'm not usually a Java developer, 
I'd prefer another kind of solution, but if there isn't, I'll have to 
look at the Java API and examples for SolrJ and Tika to achieve that...


Just wanted to confirm. I'll try to get a sample HTML yielding to this 
problem and attach it to Jira.


Thanks,
Rodrigo.

Em 22-09-2016 11:48, Allison, Timothy B. escreveu:
Y, looks like Nick (gagravarr) has answered on SO -- can't do it in 
Tika currently.


-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org]
Sent: Thursday, September 22, 2016 10:42 AM
To: solr-user@lucene.apache.org
Cc: 'u...@tika.apache.org' 
Subject: RE: Disabling Zip bomb detection in Tika

I don't think that's configurable at the moment.

Tika-colleagues, any recommendations?

If you're able to share the file on Tika's jira, we'd be happy to 
take a look.  You shouldn't be getting the zip bomb unless there is a 
mismatch between opening and closing tags (which could point to a bug 
in Tika).


-Original Message-
From: Rodrigo Rosenfeld Rosas [mailto:rr_ro...@yahoo.com.br.INVALID]
Sent: Thursday, September 22, 2016 10:06 AM
To: solr-user@lucene.apache.org
Subject: Disabling Zip bomb detection in Tika

Hi, this is my first message in this list.

Is it possible to disable Zip bomb detection in the Tika handler?

I've also described the problem here:

http://stackoverflow.com/questions/39628519/how-to-disable-or-increase-limit-zip-bomb-detection-in-tika-with-solr-config?noredirect=1#comment66575342_39628519 



Basically, I get this error when trying to process some big valid HTML
documents:

RSolr::Error::Http - 500 Internal Server Error
Error:
{'responseHeader'=>{'status'=>500,'QTime'=>76},'error'=>{'metadata'=>['error-class','org.apache.solr.common.SolrException','root-error-class','org.apache.tika.sax.SecureContentHandler$SecureSAXException'],'msg'=>'org.apache.tika.exception.TikaException: 


Zip bomb detected!','trace'=>'org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Zip bomb detected!
  at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234) 


  at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) 


  at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154) 


  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
  at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
  at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)

  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257) 


  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208) 



I need to index those documents. Is it possible to disable Zip bomb 
detection or to increase the limit using configuration files? I 
noticed it's possible to add a tika.config file but I have no idea on 
how to specify what I want in such Tika configuration files.


Any help is appreciated!

Thanks in advance,
Rodrigo.








Re: Removing SOLR fields from schema

2016-09-22 Thread Erick Erickson
Not only will optimize not help, even re-indexing all
the docs to the current collection will leave the
meta-data in the index about the removed fields. For
50 fields that likely won't matter.

As Shawn says, though, re-indexing from scratch
(and I'd use a new collection) is best if at all possible.

Best,
Erick

On Thu, Sep 22, 2016 at 6:29 AM, Shawn Heisey  wrote:
> On 9/22/2016 7:17 AM, David Santamauro wrote:
>> Will an optimize remove those fields and corresponding data?
> I am about 99 percent sure that an optimize will have no effect on
> fields removed from the Solr schema.  The schema doesn't exist at the
> Lucene level.  When you do an optimize, the entire operation is handled
> by Lucene, using its forceMerge process.
>
> Thanks,
> Shawn
>


RE: Disabling Zip bomb detection in Tika

2016-09-22 Thread Allison, Timothy B.
Y, looks like Nick (gagravarr) has answered on SO -- can't do it in Tika 
currently.

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Thursday, September 22, 2016 10:42 AM
To: solr-user@lucene.apache.org
Cc: 'u...@tika.apache.org' 
Subject: RE: Disabling Zip bomb detection in Tika

I don't think that's configurable at the moment.  

Tika-colleagues, any recommendations?

If you're able to share the file on Tika's jira, we'd be happy to take a look.  
You shouldn't be getting the zip bomb unless there is a mismatch between 
opening and closing tags (which could point to a bug in Tika).

-Original Message-
From: Rodrigo Rosenfeld Rosas [mailto:rr_ro...@yahoo.com.br.INVALID] 
Sent: Thursday, September 22, 2016 10:06 AM
To: solr-user@lucene.apache.org
Subject: Disabling Zip bomb detection in Tika

Hi, this is my first message in this list.

Is it possible to disable Zip bomb detection in the Tika handler?

I've also described the problem here:

http://stackoverflow.com/questions/39628519/how-to-disable-or-increase-limit-zip-bomb-detection-in-tika-with-solr-config?noredirect=1#comment66575342_39628519

Basically, I get this error when trying to process some big valid HTML
documents:

RSolr::Error::Http - 500 Internal Server Error
Error: 
{'responseHeader'=>{'status'=>500,'QTime'=>76},'error'=>{'metadata'=>['error-class','org.apache.solr.common.SolrException','root-error-class','org.apache.tika.sax.SecureContentHandler$SecureSAXException'],'msg'=>'org.apache.tika.exception.TikaException:
 
Zip bomb detected!','trace'=>'org.apache.solr.common.SolrException: 
org.apache.tika.exception.TikaException: Zip bomb detected!
 at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
 at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
 at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)

I need to index those documents. Is it possible to disable Zip bomb detection 
or to increase the limit using configuration files? I noticed it's possible to 
add a tika.config file but I have no idea on how to specify what I want in such 
Tika configuration files.

Any help is appreciated!

Thanks in advance,
Rodrigo.


RE: Disabling Zip bomb detection in Tika

2016-09-22 Thread Allison, Timothy B.
I don't think that's configurable at the moment.  

Tika-colleagues, any recommendations?

If you're able to share the file on Tika's jira, we'd be happy to take a look.  
You shouldn't be getting the zip bomb unless there is a mismatch between 
opening and closing tags (which could point to a bug in Tika).

-Original Message-
From: Rodrigo Rosenfeld Rosas [mailto:rr_ro...@yahoo.com.br.INVALID] 
Sent: Thursday, September 22, 2016 10:06 AM
To: solr-user@lucene.apache.org
Subject: Disabling Zip bomb detection in Tika

Hi, this is my first message in this list.

Is it possible to disable Zip bomb detection in the Tika handler?

I've also described the problem here:

http://stackoverflow.com/questions/39628519/how-to-disable-or-increase-limit-zip-bomb-detection-in-tika-with-solr-config?noredirect=1#comment66575342_39628519

Basically, I get this error when trying to process some big valid HTML
documents:

RSolr::Error::Http - 500 Internal Server Error
Error: 
{'responseHeader'=>{'status'=>500,'QTime'=>76},'error'=>{'metadata'=>['error-class','org.apache.solr.common.SolrException','root-error-class','org.apache.tika.sax.SecureContentHandler$SecureSAXException'],'msg'=>'org.apache.tika.exception.TikaException:
 
Zip bomb detected!','trace'=>'org.apache.solr.common.SolrException: 
org.apache.tika.exception.TikaException: Zip bomb detected!
 at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
 at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
 at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)

I need to index those documents. Is it possible to disable Zip bomb detection 
or to increase the limit using configuration files? I noticed it's possible to 
add a tika.config file but I have no idea on how to specify what I want in such 
Tika configuration files.

Any help is appreciated!

Thanks in advance,
Rodrigo.


Re: migration to solr 5.5.2 highlight on ngrams not working

2016-09-22 Thread elisabeth benoit
and as was said in previous post, we can clearly see in analysis output
that end values for edgengrams are good for solr 4.10.1 and not good for
solr 5.5.2


solr 5.5.2

text
raw_bytes
start
end
positionLength
type
position
p
[70]
0
5
1
word
1
pa
[70 61]
0
5
1
word
1
par
[70 61 72]
0
5
1
word
1
pari
[70 61 72 69]
0
5
1
word
1
paris
[70 61 72 69 73]
0
5
1
word



end is always set to 5, which is false


solr 4.10.1


text
raw_bytes
start
end
positionLength
type
position
p
[70]
0
1
1
word
1
pa
[70 61]
0
2
1
word
1
par
[70 61 72]
0
3
1
word
1
pari
[70 61 72 69]
0
4
1
word
1
paris
[70 61 72 69 73]
0
5
1
word

end is set to 1, 2, 3 or 4 depending on edgengrams length


2016-09-22 14:57 GMT+02:00 elisabeth benoit :

>
> Hello
>
> After migrating from solr 4.10.1 to solr 5.5.2, we dont have the same
> behaviour with highlighting on edge ngrams fields.
>
> We're using it for an autocomplete component. With Solr 4.10.1, if request
> is sol, highlighting on solr is sol<\em>r
>
> with solr 5.5.2, we have solr<\em>.
>
> Same problem as described in http://grokbase.com/t/
> lucene/solr-user/154m4jzv2f/solr-5-hit-highlight-with-
> ngram-edgengram-fields
>
> but nobody answered the post.
>
> Does anyone know we can fix this?
>
> Best regards,
> Elisabeth
>
> Field definition
>
> 
>   
> 
> 
>  pattern="[\s,;:\-\]"/>
>  splitOnNumerics="0"
> generateWordParts="1"
> generateNumberParts="1"
> catenateWords="0"
> catenateNumbers="0"
> catenateAll="0"
> splitOnCaseChange="1"
> preserveOriginal="1"
> types="wdfftypes.txt"
> />
> 
>  minGramSize="1"/>
>   
>   
> 
> 
>  pattern="[\s,;:\-\]"/>
>  splitOnNumerics="0"
> generateWordParts="1"
> generateNumberParts="0"
> catenateWords="0"
> catenateNumbers="0"
> catenateAll="0"
> splitOnCaseChange="0"
> preserveOriginal="1"
> types="wdfftypes.txt"
> />
> 
>
>   
> 
>


Disabling Zip bomb detection in Tika

2016-09-22 Thread Rodrigo Rosenfeld Rosas

Hi, this is my first message in this list.

Is it possible to disable Zip bomb detection in the Tika handler?

I've also described the problem here:

http://stackoverflow.com/questions/39628519/how-to-disable-or-increase-limit-zip-bomb-detection-in-tika-with-solr-config?noredirect=1#comment66575342_39628519

Basically, I get this error when trying to process some big valid HTML 
documents:


RSolr::Error::Http - 500 Internal Server Error
Error: 
{'responseHeader'=>{'status'=>500,'QTime'=>76},'error'=>{'metadata'=>['error-class','org.apache.solr.common.SolrException','root-error-class','org.apache.tika.sax.SecureContentHandler$SecureSAXException'],'msg'=>'org.apache.tika.exception.TikaException: 
Zip bomb detected!','trace'=>'org.apache.solr.common.SolrException: 
org.apache.tika.exception.TikaException: Zip bomb detected!
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)

at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)


I need to index those documents. Is it possible to disable Zip bomb 
detection or to increase the limit using configuration files? I noticed 
it's possible to add a tika.config file but I have no idea on how to 
specify what I want in such Tika configuration files.


Any help is appreciated!

Thanks in advance,
Rodrigo.


Re: Removing SOLR fields from schema

2016-09-22 Thread Shawn Heisey
On 9/22/2016 7:17 AM, David Santamauro wrote:
> Will an optimize remove those fields and corresponding data?
I am about 99 percent sure that an optimize will have no effect on
fields removed from the Solr schema.  The schema doesn't exist at the
Lucene level.  When you do an optimize, the entire operation is handled
by Lucene, using its forceMerge process.

Thanks,
Shawn



Re: Removing SOLR fields from schema

2016-09-22 Thread David Santamauro



On 09/22/2016 08:55 AM, Shawn Heisey wrote:

On 9/21/2016 11:46 PM, Selvam wrote:

We use SOLR 5.x in cloud mode and have huge set of fields. We now want
to remove some 50 fields from Index/schema itself so that indexing &
querying will be faster. Is there a way to do that without losing
existing data on other fields? We don't want to do full re-indexing.


When you remove fields from your schema, you can continue to use Solr
with no problems even without a reindex.  But you won't see any benefit
to your query performance until you DO reindex.  Until the reindex is
done (ideally wiping the index first), all the data from the removed
fields will remain in the index and affect your query speeds.


Will an optimize remove those fields and corresponding data?





Re: slow updates/searches

2016-09-22 Thread Shawn Heisey
On 9/22/2016 5:46 AM, Muhammad Zahid Iqbal wrote:
> Did you find any solution to slow searches? As far as I know jetty
> container default configuration is bit slow for large production
> environment. 

This might be true for the default configuration that comes with a
completely stock jetty downloaded from eclipse.org, but the jetty
configuration that *Solr* ships with is adequate for just about any Solr
installation.  The Solr configuration may require adjustment as the
query load increases, but the jetty configuration usually doesn't.

Thanks,
Shawn



migration to solr 5.5.2 highlight on ngrams not working

2016-09-22 Thread elisabeth benoit
Hello

After migrating from solr 4.10.1 to solr 5.5.2, we dont have the same
behaviour with highlighting on edge ngrams fields.

We're using it for an autocomplete component. With Solr 4.10.1, if request
is sol, highlighting on solr is sol<\em>r

with solr 5.5.2, we have solr<\em>.

Same problem as described in
http://grokbase.com/t/lucene/solr-user/154m4jzv2f/solr-5-hit-highlight-with-ngram-edgengram-fields

but nobody answered the post.

Does anyone know we can fix this?

Best regards,
Elisabeth

Field definition


  






  
  






  



Re: Tutorial not working for me

2016-09-22 Thread Pritchett, James
>
>
>
> From your perspective as a new user, did you find it
> anoying/frustrating/confusing that the README.txt in the films example
> required/instructed you to first create a handful of fields using a curl
> command to hit the Schema API before you could index any of the documents?
>
> https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=
> blob;f=solr/example/films/README.txt
>
>
> N
​o, I didn't find that to be a problem. In fact, in my view that's not a
bug, that's a feature -- at least from my very limited experience, it seems
like that kind of schema setup is probably pretty standard stuff when
building a SOLR core, and so including it in the example teaches you
something useful that you'll need to do pretty much right off the bat. I
don't think that I did it via curl, though ... I must have used the admin
interface, which was just simpler than copying and pasting that
hairy-looking, multiline command into a terminal. If you used the films
example as the basis for a tutorial and wrote it up in pretty HTML, you
could include screenshots, etc. That would make it completely painless.

James


Re: Removing SOLR fields from schema

2016-09-22 Thread Shawn Heisey
On 9/21/2016 11:46 PM, Selvam wrote:
> We use SOLR 5.x in cloud mode and have huge set of fields. We now want
> to remove some 50 fields from Index/schema itself so that indexing &
> querying will be faster. Is there a way to do that without losing
> existing data on other fields? We don't want to do full re-indexing.

When you remove fields from your schema, you can continue to use Solr
with no problems even without a reindex.  But you won't see any benefit
to your query performance until you DO reindex.  Until the reindex is
done (ideally wiping the index first), all the data from the removed
fields will remain in the index and affect your query speeds.

Thanks,
Shawn



Re: slow updates/searches

2016-09-22 Thread Muhammad Zahid Iqbal
Rallavagu,

Did you find any solution to slow searches? As far as I know jetty
container default configuration is bit slow for large production
environment.

On Tue, Sep 20, 2016 at 8:05 AM, Erick Erickson 
wrote:

> If both queries _and_ updates are slow, it's hard to see how upping
> the number of
> threads would help overall. Hmmm, you also reported that the CPUs
> didn't seem to be
> stressed so its worth a try, perhaps there's some kind of blocking going
> on
>
> Best,
> Erick
>
> On Mon, Sep 19, 2016 at 5:33 PM, Rallavagu  wrote:
> > Hi Erick,
> >
> > Would increasing (or adjusting) update threads help as per this JIRA
> ((Allow
> > the number of threads ConcurrentUpdateSolrClient StreamingSolrClients
> > configurable by a system property) here?
> >
> > https://issues.apache.org/jira/browse/SOLR-8500
> >
> > Thanks
> >
> >
> > On 9/19/16 8:30 AM, Erick Erickson wrote:
> >>
> >> Hmmm, not sure, and also not sure what to suggest next. QTimes
> >> measure only the search time, not, say, time waiting for the request to
> >> get
> >> serviced.
> >>
> >> I'm afraid the next suggestion is to throw a profiler at it 'cause
> nothing
> >> jumps
> >> out at me..'
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Sep 16, 2016 at 10:23 AM, Rallavagu 
> wrote:
> >>>
> >>> Comments in line...
> >>>
> >>> On 9/16/16 10:15 AM, Erick Erickson wrote:
> 
> 
>  Well, the next thing I'd look at is CPU activity. If you're flooding
> the
>  system
>  with updates there'll be CPU contention.
> >>>
> >>>
> >>>
> >>> Monitoring does not suggest any high CPU but as you can see from vmstat
> >>> output "user" cpu is a bit high during updates that are taking time (34
> >>> user, 65 idle).
> >>>
> 
>  And there are a number of things you can do that make updates in
>  particular
>  much less efficient, from committing very frequently (sometimes
> combined
>  with excessive autowarm parameters) and the like.
> >>>
> >>>
> >>>
> >>> softCommit is set to 10 minutes, autowarm count is set to 0 and commit
> is
> >>> set to 15 sec for NRT.
> >>>
> 
>  There are a series of ideas that might trigger an "aha" moment:
>  https://wiki.apache.org/solr/SolrPerformanceFactors
> >>>
> >>>
> >>>
> >>> Reviewed this document and made few changes accordingly a while ago.
> 
> 
> 
>  But the crude measure is just to look at CPU usage when updates
> happen,
>  or
>  just before. Are you running hot with queries alone then add an update
>  burden?
> >>>
> >>>
> >>>
> >>> Essentially, it is high QTimes for queries got me looking into logs,
> >>> system
> >>> etc and I could correlate updates slowness and searching slowness. Some
> >>> other time QTimes go high is right after softCommit which is expected.
> >>>
> >>> Wondering what causes update threads wait and if it has any impact on
> >>> search
> >>> at all. I had couple of more CPUs added but I still see similar
> behavior.
> >>>
> >>> Thanks.
> >>>
> >>>
> 
>  Best,
>  Erick
> 
>  On Fri, Sep 16, 2016 at 9:19 AM, Rallavagu 
> wrote:
> >
> >
> > Erick,
> >
> > Was monitoring GC activity and couldn't align GC pauses to this
> > behavior.
> > Also, the vmstat shows no swapping or cpu I/O wait. However,
> whenever I
> > see
> > high update response times (corresponding high QTimes for searches)
> > vmstat
> > shows as series of number of "waiting to runnable" processes in "r"
> > column
> > of "procs" section.
> >
> >
> >
> > https://dl.dropboxusercontent.com/u/39813705/Screen%20Shot%
> 202016-09-16%20at%209.05.51%20AM.png
> >
> > procs ---memory-- ---swap--
> > -io -system-- cpu -timestamp-
> >  r  b swpd freeinact   active   si   so
> bi
> > bo
> > in   cs  us  sy  id  wa  st CDT
> >  2  071068 18688496  2526604 2420444000 0
> > 0
> > 1433  462  27   1  73   0   0 2016-09-16 11:02:32
> >  1  071068 18688180  2526600 2420456800 0
> > 0
> > 1388  404  26   1  74   0   0 2016-09-16 11:02:33
> >  1  071068 18687928  2526600 2420456800 0
> > 0
> > 1354  401  25   0  75   0   0 2016-09-16 11:02:34
> >  1  071068 18687800  2526600 2420457200 0
> > 0
> > 1311  397  25   0  74   0   0 2016-09-16 11:02:35
> >  1  071068 18687164  2527116 2420484400 0
> > 0
> > 1770  702  31   1  69   0   0 2016-09-16 11:02:36
> >  1  071068 18686944  2527108 2420490800 0
> > 52
> > 1266  421  26   0  74   0   0 2016-09-16 11:02:37
> > 12  171068 18682676  2528560 242071160   

Re: How to set NOT clause on Date range query in Solr

2016-09-22 Thread Muhammad Zahid Iqbal
Intend your question properly so that someone can understand.

I am out!

On Tue, Sep 20, 2016 at 12:23 PM, Sandeep Khanzode <
sandeep_khanz...@yahoo.com.invalid> wrote:

> Have been trying to understand this for a while ...How can I specify NOT
> clause in the following query?{!field f=schedule
> op=Intersects}[2016-08-26T12:30:00Z TO 2016-08-26T18:30:00Z]{!field
> f=schedule op=Contains}[2016-08-26T12:30:00Z TO
> 2016-08-26T18:30:00Z]Like, without LocalParams, we can specify
> -DateField:[2016-08-26T12:30:00Z TO 2016-08-26T18:30:00Z] to get an
> equivalent NOT clause. But, I need a NOT Contains Date Range query.I have
> tried a few options but I end up getting parsing errors. Surely there must
> be some obvious way I am missing. SRK


Re: Solr Special Character Search

2016-09-22 Thread Muhammad Zahid Iqbal
Hi,

To handled special characters, either you need to create your own custom
filter factory or need to replace already specified filter factory with
some other, if you are using StandardFilterFactory.



On Tue, Sep 20, 2016 at 5:16 PM, Alexandre Rafalovitch 
wrote:

> What's your field definition? What happens when the text goes through the
> analysis chain as you can test in Admin UI?
>
> Regards,
>Alex
>
> On 20 Sep 2016 6:49 PM, "Cheatham, Kevin" 
> wrote:
>
> > Hello - Has anyone out there had success with anything similar to our
> > issue below and be kind enough to share?
> >
> > We posted several files as text and we're able to search for alphanumeric
> > characters, but not able to search for special characters such as @ or ©
> > through Solrcloud Admin 5.2 UI.
> > We've searched through lots of documentation but haven't had success yet.
> >
> > We also tried posting files not as text but seems we're not able to
> search
> > for any special characters below hexadecimal 20.
> >
> > Any assistance would be greatly appreciated!
> >
> > Thanks!
> >
> > Kevin Cheatham | Office (314) 573-5534 | kevin.cheat...@graybar.com
> > www.graybar.com - Graybar Works to Your Advantage
> >
> >
>


Re: SolrCloud setup

2016-09-22 Thread Customer
Would be great if someone could share link how to create solrcloud on 3 
different machines with zookeeper. Been reading documentation and there 
is nothing worth it for beginner, and best would be if Solr 
documentation team could make similar example somewhere in 
documentation, that would be very valuable for everyone who is very new 
to this search stuff.



Thanks


On 22/09/16 05:45, Preeti Bhat wrote:

HI,


For starters, the below blog looks good for Windows installation.

http://blog.thedigitalgroup.com/susheelk/2015/08/03/solrcloud-2-nodes-solr-1-node-zk-setup/



Thanks and Regards,
Preeti

-Original Message-
From: John Bickerstaff [mailto:j...@johnbickerstaff.com]
Sent: Thursday, September 22, 2016 9:19 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud setup

I found it to be way less than intuitive when I first started to get going.

I wished for an example or step by step (including zookeeper)

Pulling it all together from the docs wasn't straightforward although I guess 
the info is still there.

I'll send you my rough notes in case they're helpful...



On Wed, Sep 21, 2016 at 9:30 PM, Erick Erickson 
wrote:


Setting up SolrCloud on multiple hosts is exactly the same as a single
host. You just install Solr on all the hosts you care about and start
it up. As long as the hosts can talk to each other via HTTP, it's all
magic.

The "glue" is Zookeeper. All the Solrs are started up with the same ZK
ensemble string. So when you start Solr on host1 it registers itself
with ZK. When you start another Solr on host2 it does the same. But
then ZK sends a message to host1 informing it "there's another Solr
out there" and now the Solr on host1 knows the url of Solr on host2
(and vice versa).

For just getting started, you can just use a single Zookeeper but for
prod situations you'll want 3 or more.

Best,
Erick

On Wed, Sep 21, 2016 at 6:10 PM, S L  wrote:

Can someone point me to a tutorial or blog to setup SolrCloud on
multiple hosts? LucidWorks just have a trivial single host example.
I searched around but only found some blogs for older versions (2014 or 
earlier).

thanks.

NOTICE TO RECIPIENTS: This communication may contain confidential and/or 
privileged information. If you are not the intended recipient (or have received 
this communication in error) please notify the sender and 
it-supp...@shoregrp.com immediately, and destroy this communication. Any 
unauthorized copying, disclosure or distribution of the material in this 
communication is strictly forbidden. Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those of 
the company. Finally, the recipient should check this email and any attachments 
for the presence of viruses. The company accepts no liability for any damage 
caused by any virus transmitted by this email.






Re: Hackday next month

2016-09-22 Thread Anshum Gupta
Sure, seems like Tuesday works best :) I'll try and make it too.

On Thu, Sep 22, 2016 at 10:02 AM Charlie Hull  wrote:

> On 21/09/2016 19:28, Trey Grainger wrote:
> > I know a bunch of folks who would be likely attend the hackday (including
> > committers) will have some other meetings on Wednesday before the
> > conference, so I think that Tuesday is actually a pretty good time to
> have
> > this.
>
> Wednesday is also Yom Kippur - we weren't sure how many people this
> might affect but figured it would be best to avoid it for the Hackday.
> In any case, the venue is all arranged now and people have signed up, so
> Tuesday it is. There will also be beer & pizza that evening!
>
> Cheers
>
> Charlie
> >
> > My 2 cents,
> >
> > Trey Grainger
> > SVP of Engineering @ Lucidworks
> > Co-author, Solr in Action
> >
> > On Wed, Sep 21, 2016 at 1:20 PM, Anshum Gupta 
> > wrote:
> >
> >> This is good but is there a way to instead do this on Wednesday?
> >> Considering that the conference starts on Thursday, perhaps it makes
> sense
> >>  to do it just a day before ? Not sure about others but it certainly
> would
> >> work much better for me.
> >>
> >> -Anshum
> >>
> >> On Wed, Sep 21, 2016 at 2:18 PM Charlie Hull 
> wrote:
> >>
> >>> Hi all,
> >>>
> >>> If you're coming to Lucene Revolution next month in Boston, we're
> >>> running a Lucene-focused hackday (Lucene, Solr, Elasticsearch)
> >>> kindly hosted by BA Insight. There will be Lucene committers there,
> it's
> >>> free to attend and we also need ideas on what to do! Come and join us.
> >>>
> >>> http://www.meetup.com/New-England-Search-Technologies-
> >> NEST-Group/events/233492535/
> >>>
> >>> Cheers
> >>>
> >>> Charlie
> >>>
> >>> --
> >>> Charlie Hull
> >>> Flax - Open Source Enterprise Search
> >>>
> >>> tel/fax: +44 (0)8700 118334
> >>> mobile:  +44 (0)7767 825828
> >>> web: www.flax.co.uk
> >>>
> >>
> >
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>


Re: Hackday next month

2016-09-22 Thread Charlie Hull

On 21/09/2016 19:28, Trey Grainger wrote:

I know a bunch of folks who would be likely attend the hackday (including
committers) will have some other meetings on Wednesday before the
conference, so I think that Tuesday is actually a pretty good time to have
this.


Wednesday is also Yom Kippur - we weren't sure how many people this 
might affect but figured it would be best to avoid it for the Hackday. 
In any case, the venue is all arranged now and people have signed up, so 
Tuesday it is. There will also be beer & pizza that evening!


Cheers

Charlie


My 2 cents,

Trey Grainger
SVP of Engineering @ Lucidworks
Co-author, Solr in Action

On Wed, Sep 21, 2016 at 1:20 PM, Anshum Gupta 
wrote:


This is good but is there a way to instead do this on Wednesday?
Considering that the conference starts on Thursday, perhaps it makes sense
 to do it just a day before ? Not sure about others but it certainly would
work much better for me.

-Anshum

On Wed, Sep 21, 2016 at 2:18 PM Charlie Hull  wrote:


Hi all,

If you're coming to Lucene Revolution next month in Boston, we're
running a Lucene-focused hackday (Lucene, Solr, Elasticsearch)
kindly hosted by BA Insight. There will be Lucene committers there, it's
free to attend and we also need ideas on what to do! Come and join us.

http://www.meetup.com/New-England-Search-Technologies-

NEST-Group/events/233492535/


Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk








--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk