Count on Multivalued field using facet

2016-10-06 Thread Aswath Srinivasan (TMS)
Hello,

I'm having a result set something like this, and query like below. The facet 
count for Line field is 1(1). That is, value Line's value 1 has numBucket = 1.

However, I need to count the number of occurrence of each of the values in the 
LINE field. Is there a way to do this?

Expecting something like, LINE 1(10), 2(2)

http://localhost:8983/solr/collection/select?facet.field=line=on=id:123456789=on=*:*=json

{
  "responseHeader":{
"status":0,
"QTime":25,
"params":{
  "q":"*:*",
  "facet.field":"line",
  "indent":"on",
  "fq":"id:123456789",
  "facet":"on",
  "wt":"json",
  "_":"1475711557126"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"123456789",
" name":["abc"],
"year":["2016"],
"idno":[6009250200],
"issue":["Paint",
  "zTest",
  "zTest",
  "Paint",
  "zTest",
  "zTest",
  "zTest",
  "Paint",
  "Paint",
  "Paint",
  "Paint",
  "Paint"],
"line":["1",
  "1",
  "1",
  "2",
  "1",
  "1",
  "1",
  "1",
  "2",
  "1",
  "1",
  "1"],
"_version_":1547467907197304832}]
  },
  "facet_counts":{
"facet_queries":{},
"facet_fields":{
  "line":[
"1",1]
"2",1]
},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}

Thank you,
Aswath NS



Month facet - possible bucket values are Jan, Feb, Mar,…. Nov, Dec

2016-09-20 Thread Aswath Srinivasan (TMS)
Hello,

How to build a Month facet from a date field? The facet that I’m looking for 
should have a maximum of only 12 buckets. The possible bucket values are Jan, 
Feb, Mar,…. Nov, Dec.

http://localhost:8983/solr/collection1/select?facet=on=0=on=*:*=json={type:range,field:cdate,start:"2000-01-01T00:00:00Z",end:NOW,gap:"+1MONTH"}}

This is the query that I have so far but this doesn’t group the facet by Month, 
obviously, because of the gap:"+1MONTH"

Really appreciate the help.

Aswath NS


SQL Joins in Parallel SQL Interface

2016-09-14 Thread Aswath Srinivasan (TMS)
Hello,

I'm exploring the Parallel SQL. I don't see any SQL JOIN features available in 
the parallel SQL interface, in the documentation. Is it even possible to do SQL 
JOIN in the parallel SQL interface?

Was looking at streaming expression but looks like facets are not possible with 
it. Not even count(*) kind of operations?

Thank you,
Aswath NS



facet.piovt on a join query, joining two different collections

2016-09-13 Thread Aswath Srinivasan (TMS)
Hello,

We are trying to do a pivot on the facet on two fields which exists on two 
different collections. We are joining the two collections using a common filed. 
Below is the query I'm having right now and it doesn't seem to work. Any help 
would be much appreciated.


http://loalhost:8983/solr/abc/select?q={!join from=id to=id 
fromIndex=xyz}*=json=true=0=1=true= 
name,count=localhost:8983/solr/abc,localhost:8983/solr/xyz



1.   id field exists in both collections - abc & xyz

2.   name field exist in xyz collection

3.   count field exist in abc collection


Thank you,
Aswath NS



Collection going to recovery mode - Leader election issue?

2016-08-02 Thread Aswath Srinivasan (TMS)
Hi All,

Solr verion 5.3.2
Zookeeper 3.6.2
SolrCloud - 2 shards, 4 replicas, 4 nodes

Above is the set up. 3 of the shards (replicas) went to a recovery mode which 
the following ERROR in the logs. Anyone experienced this before? I had to 
restart the Solr server nodes to bring them all up. Looks like a leader 
election issue?

2016-07-29 06:52:48.610 ERROR (coreZkRegister-1-thread-32-processing-s:shard2 
x:tCollection_shard2_replica4 c:tCollection n:tsolr.prod2.xxx.com:8983_solr 
r:core_node6) [c:tCollection s:shard2 r:core_node6 
x:tCollection_shard2_replica4] o.a.s.c.ZkController Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found after 
waiting for 156ms , collection: tCollection slice: shard2
  at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:637)
  at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderUrl(ZkStateReader.java:604)
  at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:970)
  at org.apache.solr.cloud.ZkController.register(ZkController.java:907)
  at 
org.apache.solr.cloud.ZkController$RegisterCoreAsync.call(ZkController.java:227)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)

2016-07-29 09:17:14.440 WARN  (ShutdownMonitor) [   ] o.a.s.c.RecoveryStrategy 
Stopping recovery for core=tCollection_shard1_replica4 coreNodeName=core_node5
2016-07-29 09:17:14.683 WARN  
(zkCallback-3-thread-380-processing-n:tsolr.prod2.xxx.com:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader ZooKeeper watch triggered, but Solr cannot talk to ZK
2016-07-29 09:17:14.684 WARN  
(zkCallback-3-thread-374-processing-n:tsolr.prod2.xxx.com:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader ZooKeeper watch triggered, but Solr cannot talk to ZK
2016-07-29 09:17:14.684 ERROR 
(zkCallback-3-thread-9-processing-n:tsolr.prod2.xxx.com:8983_solr-EventThread) 
[   ] o.a.z.ClientCnxn Error while calling watcher
java.util.concurrent.RejectedExecutionException: Task 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1@7402ec22 
rejected from 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@73ee87d4[Shutting
 down, pool size = 9, active threads = 2, queued tasks = 0, completed tasks = 
1585]
  at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
  at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
  at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
  at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:193)
  at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
  at 
org.apache.solr.common.cloud.SolrZkClient$3.process(SolrZkClient.java:261)
  at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
  at 
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

Thank you,
Aswath NS



BinFileDataSource delta import

2016-03-28 Thread Aswath Srinivasan (TMS)
Hi fellow developers,

We are using "BinFileDataSource" datasource in our DIH config file to index 
local file system files. It is able to index the files however, during the next 
cycle of indexing, files that were removed from source file system folder is 
not removed from index. I believe Solr currently has no capability of doing 
this. Can someone please confirm based on your experience?

Also, does delta import work for this datasource? It dosen't seem to work for 
me.




  
  
  
  
  
  
  

Thank you,
Aswath NS



RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-22 Thread Aswath Srinivasan (TMS)
>> Since you've already reproduced it on a small scale, we'll need your entire 
>> Solr logfile.  The mailing list eats attachments, so you'll need to place it 
>> somewhere and provide a URL.  Sites like gist and dropbox are excellent for 
>> sharing large text content.

Sure. I will try and sent it across. However I don’t see anything in them. I 
have FINE level logs.

>> Do you literally mean 10 records (a number I can count on my fingers)? 
How much data is in each of those DB records?  Which configset did you use when 
creating the index?

Yes. Crazy right. Actually the select query I gave will yield 10 records only. 
Total records in the table is 200,000. I restricted the query to reproduce the 
issue in small scale. This issue started appearing in my QA, where, one time we 
happen to have an accidently frequent hard commit by two batch jobs. There is 
no autocommit set in solrconfig. Only the batch jobs send a commit. I was never 
able to recover the collection so I had to delete the data and reindex to fix 
it. Hence decided to reproduce the issue in very small scale and trying to fix 
it because deleting data and reindex cannot be a fix. DB records are just 
normal varchars. Some 7 columns. I don’t think data is the problem.

I cloned the 'solr-5.3.2\example\example-DIH\solr\db' and added some addition 
fields and removed unused default fields.

>> You mentioned a 10GB heap and then said the machine has 8GB of RAM.  Is this 
>> correct?  If so, this would be a source of serious performance issues.

OOPPSS. Its 1 GB heap. That was a typo. The consumed heap is around 300-400 MB.

Thank you,
Aswath NS


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Tuesday, March 22, 2016 10:41 AM
To: solr-user@lucene.apache.org
Subject: Re: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

On 3/22/2016 11:32 AM, Aswath Srinivasan (TMS) wrote:
> Thank you Shawn for taking time and responding.
>
> Unfortunately, this is not the case. My heap is not even going past 
> 50% and I have a heap of 10 GB on a instance that I just installed as 
> a standalone version and was only trying out these,
>
> •   Install a standalone solr 5.3.2 in my PC
> •   Indexed some 10 db records
> •   Hit core reload/call commit frequently in quick internals
> •   Seeing the  o.a.s.c.SolrCore [db] PERFORMANCE WARNING: Overlapping 
> onDeckSearchers=2
> •   Collection crashes
> •   Only way to recover is to stop solr – delete the data folder – start 
> solr – reindex
>
> In any case, if this heap related issue, a solr restart should help, is what 
> I think.

That shouldn't happen.

Since you've already reproduced it on a small scale, we'll need your entire 
Solr logfile.  The mailing list eats attachments, so you'll need to place it 
somewhere and provide a URL.  Sites like gist and dropbox are excellent for 
sharing large text content.

More questions:

Do you literally mean 10 records (a number I can count on my fingers)? 
How much data is in each of those DB records?  Which configset did you use when 
creating the index?

You mentioned a 10GB heap and then said the machine has 8GB of RAM.  Is this 
correct?  If so, this would be a source of serious performance issues.

Thanks,
Shawn



RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-22 Thread Aswath Srinivasan (TMS)
>> If you're not actually hitting OutOfMemoryError, then my best guess about 
>> what's happening is that you are running >>right at the edge of the 
>> available Java heap memory, so your JVM is constantly running full garbage 
>> collections to free up >>enough memory for normal operation.  In this 
>> situation, Solr is actually still running, but is spending most of its time 
>> >>paused for garbage collection.

Thank you Shawn for taking time and responding.

Unfortunately, this is not the case. My heap is not even going past 50% and I 
have a heap of 10 GB on a instance that I just installed as a standalone 
version and was only trying out these,

•   Install a standalone solr 5.3.2 in my PC
•   Indexed some 10 db records
•   Hit core reload/call commit frequently in quick internals
•   Seeing the  o.a.s.c.SolrCore [db] PERFORMANCE WARNING: Overlapping 
onDeckSearchers=2
•   Collection crashes
•   Only way to recover is to stop solr – delete the data folder – start 
solr – reindex

In any case, if this heap related issue, a solr restart should help, is what I 
think.

>>If I'm wrong about what's happening, then we'll need a lot more details about 
>>your server and your Solr setup.

Nothing really. Just a standalone solr 5.3.2 on a windows 7 machine - 64 bit, 8 
GB RAM. I bet anybody could reproduce the problem if they follow my above steps.

Thank you all for spending time on this. I shall post back my findings, if I'm 
findings are useful.

Thank you,
Aswath NS
Mobile  +1 424 345 5340
Office+1 310 468 6729

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Monday, March 21, 2016 6:07 PM
To: solr-user@lucene.apache.org
Subject: Re: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

On 3/21/2016 6:49 PM, Aswath Srinivasan (TMS) wrote:
>>> Thank you for the responses. Collection crashes as in, I'm unable to open 
>>> the core tab in Solr console. Search is not returning. None of the page 
>>> opens in solr admin dashboard.
>>>
>>> I do understand how and why this issue occurs and I'm going to do all it 
>>> takes to avoid this issue. However, on an event of an accidental frequent 
>>> hard commit close to each other which throws this WARN then - I'm just 
>>> trying to figure out a way to make my collection throw results without 
>>> having to delete and re-create the collection or delete the data folder.
>>>
>>> Again, I know how to avoid this issue but if it still happens then what can 
>>> be done to avoid a complete reindexing.

If you're not actually hitting OutOfMemoryError, then my best guess about 
what's happening is that you are running right at the edge of the available 
Java heap memory, so your JVM is constantly running full garbage collections to 
free up enough memory for normal operation.  In this situation, Solr is 
actually still running, but is spending most of its time paused for garbage 
collection.

https://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems

The first part of the "GC pause problems" section on the above wiki page talks 
about very large heaps, but there is a paragraph just before "Tools and Garbage 
Collection" that talks about heaps that are a little bit too small.

If I'm right about this, you're going to need to increase your java heap size.  
Exactly how to do this will depend on what version of Solr you're running, how 
you installed it, and how you start it.

For 5.x versions using the included scripts, you can use the "-m" option on the 
"bin/solr" command when you start Solr manually, or you can edit the solr.in.sh 
file (usually found in /etc/default or /var/solr) if you used the service 
installer script on a UNIX/Linux platform.  The default heap size in 5.x 
scripts is 512MB, which is VERY small.

For earlier versions, there's too many install/start options available.
There were no installation scripts included with Solr itself, so I won't know 
anything about the setup.

If I'm wrong about what's happening, then we'll need a lot more details about 
your server and your Solr setup.

Thanks,
Shawn



RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-21 Thread Aswath Srinivasan (TMS)
>>The only way that I can imagine any part of Solr *crashing* when this message 
>>happens is if you are also hitting an OutOfMemoryError

exception.   You've said that your collection crashes ... but not what

actually happens -- what "crash" means for your situation.  I've never heard of 
a collection crashing.



>>If you're running version 4.0 or later, you actually *do* want autoCommit 
>>configured, with openSearcher set to false.  This configuration will not 
>>change document visibility at all, because it will not open a new searcher.  
>>You need different commits for document visibility.


Thank you for the responses. Collection crashes as in, I'm unable to open the 
core tab in Solr console. Search is not returning. None of the page opens in 
solr admin dashboard.

I do understand how and why this issue occurs and I'm going to do all it takes 
to avoid this issue. However, on an event of an accidental frequent hard commit 
close to each other which throws this WARN then - I'm just trying to figure out 
a way to make my collection throw results without having to delete and 
re-create the collection or delete the data folder.

Again, I know how to avoid this issue but if it still happens then what can be 
done to avoid a complete reindexing.

Thank you,
Aswath NS

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Monday, March 21, 2016 4:19 PM
To: solr-user@lucene.apache.org
Subject: Re: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

On 3/21/2016 12:52 PM, Aswath Srinivasan (TMS) wrote:
> Fellow developers,
>
> PERFORMANCE WARNING: Overlapping onDeckSearchers=2
>
> I'm seeing this warning often and whenever I see this, the collection 
> crashes. The only way to overcome this is by deleting the data folder and 
> reindexing.
>
> In my observation, this WARN comes when I hit frequent hard commits or hit 
> re-load config. I'm not planning on to hit frequent hard commits, however 
> sometimes accidently it happens. And when it happens the collection crashes 
> without a recovery.
>
> Have you faced this issue? Is there a recovery procedure for this WARN?
>
> Also, I don't want to increase maxWarmingSearchers or set autocommit.

This is a lot of the same info that you've gotten from Hoss. I'm just going to 
leave it all here and add a little bit related to the rest of the thread.

Increasing maxWarmingSearchers is almost always the WRONG thing to do.
The reason that you are running into this message is that your commits (those 
that open a new searcher) are taking longer to finish than your commit 
frequency, so you end up warming multiple searchers at the same time. To limit 
memory usage, Solr will keep the number of warming searches from exceeding a 
threshold.

You need to either reduce the frequency of your commits that open a new 
searcher or change your configuration so they complete faster. Here's some info 
about slow commits:

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits

The only way that I can imagine any part of Solr *crashing* when this message 
happens is if you are also hitting an OutOfMemoryError
exception. You've said that your collection crashes ... but not what
actually happens -- what "crash" means for your situation. I've never heard of 
a collection crashing.

If you're running version 4.0 or later, you actually *do* want autoCommit 
configured, with openSearcher set to false. This configuration will not change 
document visibility at all, because it will not open a new searcher. You need 
different commits for document visibility.

This is the updateHandler config that I use which includes autoCommit:



12
false




With this config, there will be at least two minutes between automatic hard 
commits. Because these commits will not open a new searcher, they cannot cause 
the message about onDeckSearchers. Commits that do not open a new searcher will 
normally complete VERY quickly. The reason you want this kind of autoCommit 
configuration is to avoid extremely large transaction logs.

See this blog post for more info than you ever wanted about commits:

http://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

If you're going to do all your indexing with the dataimport handler, you could 
just let the commit option on the dataimport take care of document visibility.

Thanks,
Shawn


RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-21 Thread Aswath Srinivasan (TMS)
If you're seeing a crash, then that's a distinct problem from the WARN -- it 
might be related tothe warning, but it's not identical -- Solr doesn't always 
(or even normally) crash in the "Overlapping onDeckSearchers"
situation

That is what I hoped for. But I could see nothing else in the log. All I'm 
trying to do is run a full import in the DIH handler and index some 10 records 
from DB and check the "commit' check box. Then when I immediately re-run the 
full import again OR do a reload config, I start seeing this warning and my 
collection crashes.

I have turn off autocommit in the solrconfig.

I can try and avoid frequent hard commits but I wanted a solution to overcome 
this WARN if an accidental frequent hard commit happens.

Thank you,
Aswath NS



-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Monday, March 21, 2016 2:26 PM
To: solr-user@lucene.apache.org
Subject: RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2


: What I'm wondering is, what should one do to fix this issue when it
: happens. Is there a way to recover? after the WARN appears.

It's just a warning that you have a sub-optimal situation from a performance 
standpoint -- either committing too fast, or warming too much.
It's not a failure, and Solr will continue to serve queries and process updates 
-- but meanwhile it's detected that the situation it's in involves wasted 
CPU/RAM.

: In my observation, this WARN comes when I hit frequent hard commits or
: hit re-load config. I'm not planning on to hit frequent hard commits,
: however sometimes accidently it happens. And when it happens the
: collection crashes without a recovery.

If you're seeing a crash, then that's a distinct problem from the WARN -- it 
might be related tothe warning, but it's not identical -- Solr doesn't always 
(or even normally) crash in the "Overlapping onDeckSearchers"
sitaution

So if you are seeing crashes, please give us more detials about these
crashes: namely more details about everything you are seeing in your logs (on 
all the nodes, even if only one node is crashing)

https://wiki.apache.org/solr/UsingMailingLists



-Hoss
http://www.lucidworks.com/


RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-21 Thread Aswath Srinivasan (TMS)
Please note that I'm not looking to find ways to avoid this issue. There are 
lot of internet articles on this topic.

What I'm wondering is, what should one do to fix this issue when it happens. Is 
there a way to recover? after the WARN appears.

Thank you,
Aswath NS

-Original Message-
From: Aswath Srinivasan (TMS) [mailto:aswath.sriniva...@toyota.com] 
Sent: Monday, March 21, 2016 11:52 AM
To: solr-user@lucene.apache.org
Subject: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

Fellow developers,

PERFORMANCE WARNING: Overlapping onDeckSearchers=2

I'm seeing this warning often and whenever I see this, the collection crashes. 
The only way to overcome this is by deleting the data folder and reindexing.

In my observation, this WARN comes when I hit frequent hard commits or hit 
re-load config. I'm not planning on to hit frequent hard commits, however 
sometimes accidently it happens. And when it happens the collection crashes 
without a recovery.

Have you faced this issue? Is there a recovery procedure for this WARN?

Also, I don't want to increase maxWarmingSearchers or set autocommit.

Thank you,
Aswath NS


PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-21 Thread Aswath Srinivasan (TMS)
Fellow developers,

PERFORMANCE WARNING: Overlapping onDeckSearchers=2

I'm seeing this warning often and whenever I see this, the collection crashes. 
The only way to overcome this is by deleting the data folder and reindexing.

In my observation, this WARN comes when I hit frequent hard commits or hit 
re-load config. I'm not planning on to hit frequent hard commits, however 
sometimes accidently it happens. And when it happens the collection crashes 
without a recovery.

Have you faced this issue? Is there a recovery procedure for this WARN?

Also, I don't want to increase maxWarmingSearchers or set autocommit.

Thank you,
Aswath NS


Zookeeper upconfig files to upload big config files

2016-02-17 Thread Aswath Srinivasan (TMS)
Hi fellow Solr developers,

I'm tyring to upconfig my config files and my synonyms.txt file is about 2 MB. 
Whenever I try to do this, I get the following expection. It's either a "broken 
pipe" expection or the following expection. Any advice for me to fix it?

If I remove most of the synonym entries and keep my synonyms.txt file simple 
and short then upconfig works without any problem.

WARN  - 2016-02-17 11:48:55.514; org.apache.zookeeper.ClientCnxn$SendThread; 
Session 0x252e439e59f0017 for server zookeeperhost/zookeeperhost:2181, 
unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
[abc@dc1-abc ~]$ ^C
[abc@dc1-abc ~]$ /t3/apps/solr-5.3.2/server/scripts/cloud-scripts/zkcli.sh 
-zkhost 
t3solr.test1.abc.com:2181,t3solr.test2.abc.com:2181,t3solr.test3.abc.com:2181 
-cmd upconfig -confname test_config_2 -confdir 
/t3/apps/solr-5.3.2/example/example-DIH/solr/TestEnvCore/conf
Exception in thread "main" java.io.IOException: Error uploading file 
/t3/apps/solr-5.3.2/example/example-DIH/solr/TestEnvCore/conf/synonyms.txt to 
zookeeper path /configs/test_config_2/synonyms.txt
at 
org.apache.solr.common.cloud.ZkConfigManager$1.visitFile(ZkConfigManager.java:68)
at 
org.apache.solr.common.cloud.ZkConfigManager$1.visitFile(ZkConfigManager.java:58)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:135)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:199)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:69)
at java.nio.file.Files.walkFileTree(Files.java:2602)
at java.nio.file.Files.walkFileTree(Files.java:2635)
at 
org.apache.solr.common.cloud.ZkConfigManager.uploadToZK(ZkConfigManager.java:58)
at 
org.apache.solr.common.cloud.ZkConfigManager.uploadConfigDir(ZkConfigManager.java:120)
at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:220)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /configs/test_config_2/synonyms.txt
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
at 
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:362)
at 
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:359)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:359)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:529)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:408)
at 
org.apache.solr.common.cloud.ZkConfigManager$1.visitFile(ZkConfigManager.java:66)
... 9 more
[abc@dc1-abc ~]$


[cid:image002.png@01D1697B.64FB26D0]
Thank you,
Aswath NS



Taking Solr to production

2016-01-22 Thread Aswath Srinivasan (TMS)
If below is the situation,


* 4 Virtual machines with 64 GB RAM - 64bit machines, 512 GB storage 
for each VM

* Totally about 2.5 million documents to  be indexed

* Documents average size is 512 KB - pdfs and htmls

* Expected QPS is 150

* Incremental indexing is once per day at around 50,000 documents per 
day (update & delete combined)

This being said I was thinking I would take the Solr to production with,


* 2 shards, 1 Leader & 3 Replicas

* 2 solr instance per VM

* 3 Zookeepers on the same machines as that of Solr (3 out of 4 VMs 
will have external zookeeper)

* Solr 5.3.1 version

Do you all think this set up will work? Will this server me 150 QPS?

I know that nobody can give a definite answer and the only way is to do a 
performance testing and tweak it from there but there is another proposal to 
have 4 shards, 1 Leader and 1 Replica which I'm not in favor off. So, posting 
it here, just trying to get some peer opinion!!

Thank you,
Aswath NS



RE: Taking Solr to production

2016-01-22 Thread Aswath Srinivasan (TMS)
Thanks guys for all the responses.

True. What I wanted to convey is  2 shards with 4 replicas.

>> use more shards if the query latency is too high.

Shouldn't we go for more replicas if query latency is too high? You can go for 
more shard if you have number of indexing documents and at a much frequent 
rate. Do you disagree with my point of view?

There are no facets but complex queries exist. A safe bet is to have 2 shards 
is what I was thinking so I give enough breathing space for the indexing jobs 
and 4 replicas to address the high QPS request. Am I thinking correctly?

I cannot thank you enough you guys!!

Thank you,
Aswath NS


-Original Message-
From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: Friday, January 22, 2016 3:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Taking Solr to production

"1 Leader & 3 Replicas"

SolrCloud does not distinguish leaders from replicas - that's old master-slave 
terminology. The leader is just one of the replicas.

So, are you really talking about 2 shards with 4 replicas each or 2 shards with 
2 replicas each?

Putting multiple replica instances on each machine isn't buying you anything, 
just making it more complicated to manage.

Number of shards is determined by amount of data and whether query latency can 
be achieved - use more shards if the query latency is too high.

2.5 million (2,500,000) documents is rather small, so unless your queries are 
running really slow, it's not clear you even need sharding, but we don't know 
your document and query complexity. Heavy faceting or complex function queries?

Number of replicas is determined by query load - number of simultaneous query 
requests, as well as HA availability requirements.




-- Jack Krupansky

On Fri, Jan 22, 2016 at 5:45 PM, Toke Eskildsen
wrote:

> Aswath Srinivasan (TMS) wrote:
> > * Totally about 2.5 million documents to be indexed
> > * Documents average size is 512 KB - pdfs and htmls
>
> > This being said I was thinking I would take the Solr to production with,
> > * 2 shards, 1 Leader & 3 Replicas
>
> > Do you all think this set up will work? Will this server me 150 QPS?
>
> It certainly helps that you are batch updating. What is missing in
> this estimation is how large the documents are when indexed, as I
> guess the ½MB average is for the raw files? If they are your everyday
> short PDFs with images, meaning not a lot of text, handling 2M+ of
> them is easy. If they are all full-length books, it is another matter.
>
> Your document count is relatively low and if your index data end up
> being not-too-big (let's say 100GB), then you ought to consider having
> just a single shard with 4 replicas: There is a non-trivial overhead
> going from 1 shard to more than one, especially if you are doing faceting.
>
> - Toke Eskildsen
>


RE: Solrcloud for Java 1.6

2016-01-07 Thread Aswath Srinivasan (TMS)
Thanks for the responses guys.

>> i have solrj5 client for 1.7 converted into 1.6

Can you please explain in part with little more details?

Thank you,
Aswath NS
Mobile  +1 424 345 5340
Office+1 310 468 6729

-Original Message-
From: Zap Org [mailto:zapor...@gmail.com] 
Sent: Thursday, January 07, 2016 9:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Solrcloud for Java 1.6

i have solrj5 client for 1.7 converted into 1.6 and solr instances running with 
1.7. I am connecting solrj (1.6) wit solr instances(1.7) on two different 
machines

On Fri, Jan 8, 2016 at 8:26 AM, <billnb...@gmail.com> wrote:

> Run it on 2 separate boxes
>
> Bill Bell
> Sent from mobile
>
>
> > On Jan 7, 2016, at 3:11 PM, Aswath Srinivasan (TMS) <
> aswath.sriniva...@toyota.com> wrote:
> >
> > Hi fellow developers,
> >
> > I have a situation where the search front-end application is using 
> > java
> 1.6. Upgrading Java version is out of the question.
> >
> > Planning to use Solrcloud 5.x version for the search implementation. 
> > The
> show stopper here is, solrj for solrcloud needs atleast java1.7
> >
> > What best can be done to use the latest version of solrcloud and 
> > solrj
> for a portal that runs on java 1.6?
> >
> > I was thinking, in solrj, instead of using zookeeper (which also 
> > acts as
> the load balancer) I can mention the exact replica's 
> http://solr-cloud-HOST:PORT pairs using some kind of round-robin with 
> some external load balancer.
> >
> > Any suggestion is highly appreciated.
> >
> > Aswath NS
> >
>


RE: Solrcloud for Java 1.6

2016-01-07 Thread Aswath Srinivasan (TMS)
Thanks for the responses guys.

>> i have solrj5 client for 1.7 converted into 1.6

Can you please explain this part with little more details?

Thank you,
Aswath NS


-Original Message-
From: Aswath Srinivasan (TMS) [mailto:aswath.sriniva...@toyota.com] 
Sent: Thursday, January 07, 2016 9:51 PM
To: solr-user@lucene.apache.org
Subject: RE: Solrcloud for Java 1.6

Thanks for the responses guys.

>> i have solrj5 client for 1.7 converted into 1.6

Can you please explain in part with little more details?

Thank you,
Aswath NS

-Original Message-
From: Zap Org [mailto:zapor...@gmail.com]
Sent: Thursday, January 07, 2016 9:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Solrcloud for Java 1.6

i have solrj5 client for 1.7 converted into 1.6 and solr instances running with 
1.7. I am connecting solrj (1.6) wit solr instances(1.7) on two different 
machines

On Fri, Jan 8, 2016 at 8:26 AM, <billnb...@gmail.com> wrote:

> Run it on 2 separate boxes
>
> Bill Bell
> Sent from mobile
>
>
> > On Jan 7, 2016, at 3:11 PM, Aswath Srinivasan (TMS) <
> aswath.sriniva...@toyota.com> wrote:
> >
> > Hi fellow developers,
> >
> > I have a situation where the search front-end application is using 
> > java
> 1.6. Upgrading Java version is out of the question.
> >
> > Planning to use Solrcloud 5.x version for the search implementation. 
> > The
> show stopper here is, solrj for solrcloud needs atleast java1.7
> >
> > What best can be done to use the latest version of solrcloud and 
> > solrj
> for a portal that runs on java 1.6?
> >
> > I was thinking, in solrj, instead of using zookeeper (which also 
> > acts as
> the load balancer) I can mention the exact replica's 
> http://solr-cloud-HOST:PORT pairs using some kind of round-robin with 
> some external load balancer.
> >
> > Any suggestion is highly appreciated.
> >
> > Aswath NS
> >
>


Solrcloud for Java 1.6

2016-01-07 Thread Aswath Srinivasan (TMS)
Hi fellow developers,

I have a situation where the search front-end application is using java 1.6. 
Upgrading Java version is out of the question.

Planning to use Solrcloud 5.x version for the search implementation. The show 
stopper here is, solrj for solrcloud needs atleast java1.7

What best can be done to use the latest version of solrcloud and solrj for a 
portal that runs on java 1.6?

I was thinking, in solrj, instead of using zookeeper (which also acts as the 
load balancer) I can mention the exact replica's http://solr-cloud-HOST:PORT 
pairs using some kind of round-robin with some external load balancer.

Any suggestion is highly appreciated.

Aswath NS



SolrCloud page is blank

2015-12-11 Thread Aswath Srinivasan (TMS)
Hi All,

We have set up a solr 5.3.1. Now I realize that in the solr admin UI, the cloud 
page is blank. What could be the reason behind this? Following are the 
exceptions that I’m seeing in the logs

12/11/2015, 9:58:37 AM

WARN

null

ClientCnxn

Session 0x25111a5595ab885 for server null,​ unexpected error,​ closing socket 
connection and attempting reconnect


java.net.ConnectException: Connection refused

 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

 at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)

 at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)


12/11/2015, 9:58:36 AM

WARN

null

ClientCnxn

Session 0x25111a5595ab885 for server abc01.abc.anbc.com/10.15.12.122:2181,​ 
unexpected error,​ closing socket connection and attempting reconnect

java.io.IOException: Unreasonable length = 2703892
 at 
org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:95)
 at 
org.apache.zookeeper.proto.GetDataResponse.deserialize(GetDataResponse.java:54)
 at 
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:814)
 at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94)
 at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)


Thank you,
Aswath NS
Mobile  +1 424 345 5340
Office+1 310 468 6729



Zookeeper connection refused error

2015-12-10 Thread Aswath Srinivasan (TMS)
Hi all,

I tried to create a collection with 3 shards and it got created. Verified the 
same in SOLR Dashboard.

But while creating the collection I'm seeing a "Connection Refused" when 
connecting to Zookeeper.

Following is the expectation trace. Can somebody spot the mistake that is 
happening with my settings?


[something@something solr-5.3.1]$ bin/solr create_collection -c T3Collection1 
-n T3Collection1_Config -d /opt/solr-5.3.1/example/example-DIH/solr/db -shards 
3 -p 8993

Connecting to ZooKeeper at server01:2181,server02:2181,server03:2181/t3solr ...
WARN  - 2015-12-10 14:46:58.476; org.apache.zookeeper.ClientCnxn$SendThread; 
Session 0x0 for server null, unexpected error, closing socket connection and 
attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
Uploading /opt/solr-5.3.1/example/example-DIH/solr/db/conf for config 
T3Collection1_Config to ZooKeeper at 
server01:2181,server02:2181,server03:2181/t3solr

Creating new collection 'T3Collection1' using command:
http://localhost:8993/solr/admin/collections?action=CREATE=T3Collection1=3=1=1=T3Collection1_Config

{
  "responseHeader":{
"status":0,
"QTime":2141},
  "success":{"":{
  "responseHeader":{
"status":0,
"QTime":1932},
  "core":"T3Collection1_shard2_replica1"}}}


When I try to do a downconfig I get the following exception,

[something@something solr-5.3.1]$ ./server/scripts/cloud-scripts/zkcli.sh 
-zkhost server01:2181,server02:2181,server03:2181 -cmd downconfig -confname 
T3Collection1_Config -confdir "/opt/solr-5.3.1/downconfigs"
Exception in thread "main" java.io.IOException: Error downloading files from 
zookeeper path /configs/T3Collection1_Config to /opt/solr-5.3.1/downconfigs
at 
org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK(ZkConfigManager.java:107)
at 
org.apache.solr.common.cloud.ZkConfigManager.downloadConfigDir(ZkConfigManager.java:131)
at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:230)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /configs/T3Collection1_Config
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at 
org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:328)
at 
org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:325)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
at 
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:325)
at 
org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK(ZkConfigManager.java:92)
... 2 more


Thank you,
Aswath NS



Performance testing on SOLR cloud

2015-11-17 Thread Aswath Srinivasan (TMS)
Hi fellow developers,

Please share your experience, on how you did performance testing on SOLR? What 
I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM and index 
a total of 2.2 million. Yet to decide how many shards and replicas to have (Any 
hint on this is welcome too, basically 'only' performance testing, so suggest 
the number of shards and replicas if you can). Ultimately, I'm trying to find 
the QPS that this SOLR cloud set up can handle.

To summarize,

1.   Find the QPS that my solr cloud set up can support

2.   Using 5.3.1 version with external zookeeper

3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million documents

4.   Yet to decide number of shards and replicas

5.   Not using any custom search application (performance testing for SOLR and 
not for Search portal)

Thank you


RE: tikaparser docx file fails with exception

2015-11-05 Thread Aswath Srinivasan (TMS)
Thank you for attempting to answer. I will try out with solrj and standalone 
java with tika parser. I completely understand that a bad document could cause 
this, however, when I opened up the document I couldn't find anything 
suspicious expect for some binary images/pictures embedded into the document.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, November 04, 2015 4:33 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: tikaparser docx file fails with exception

Possibly a corrupt file? Tika does its best, but bad data is...bad data.

You can experiment a bit with using Tika in Java, that might give you a better 
idea of what's really going on, here's a SolrJ example:

https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Best,
Erick

On Wed, Nov 4, 2015 at 3:49 PM, Aswath Srinivasan (TMS) 
<aswath.sriniva...@toyota.com> wrote:
>
> Trying to index a document. A docx file. Ending up with the below exception. 
> Not sure why it is erroring out. When I opened the docx I was able to see 
> lots of binary data like embedded pictures etc., Is there a possible solution 
> to this or is it a bug? Only one such file fails. Rest of the files are 
> smoothly indexed.
>
> 2015-11-04 23:16:11.549 INFO  (coreLoadExecutor-6-thread-1) [   x:tika] 
> o.a.s.c.CoreContainer registering core: tika
> 2015-11-04 23:16:11.549 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.c.SolrCore QuerySenderListener sending requests to 
> Searcher@1eb69b2[tika] 
> main{ExitableDirectoryReader(UninvertingDirectoryReader())}
> 2015-11-04 23:16:11.585 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.c.S.Request [tika] webapp=null path=null 
> params={q=static+firstSearcher+warming+in+solrconfig.xml=false=firstSearcher}
>  hits=0 status=0 QTime=34
> 2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.c.SolrCore QuerySenderListener done.
> 2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.h.c.SpellCheckComponent Loading spell index for 
> spellchecker: default
> 2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.h.c.SpellCheckComponent Loading spell index for 
> spellchecker: wordbreak
> 2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.h.c.SuggestComponent buildOnStartup: mySuggester
> 2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.s.s.SolrSuggester SolrSuggester.build(mySuggester)
> 2015-11-04 23:16:11.605 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.c.SolrCore [tika] Registered new searcher 
> Searcher@1eb69b2[tika] 
> main{ExitableDirectoryReader(UninvertingDirectoryReader())}
> 2015-11-04 23:16:25.923 INFO  (qtp7980742-16) [   x:tika] 
> o.a.s.h.d.DataImporter Loading DIH Configuration: tika-data-config.xml
> 2015-11-04 23:16:25.937 INFO  (qtp7980742-16) [   x:tika] 
> o.a.s.h.d.DataImporter Data Configuration loaded successfully
> 2015-11-04 23:16:25.947 INFO  (qtp7980742-16) [   x:tika] o.a.s.c.S.Request 
> [tika] webapp=/solr path=/dataimport 
> params={debug=false=false=true=true=true=json=full-import=false}
>  status=0 QTime=28
> 2015-11-04 23:16:25.948 INFO  (Thread-17) [   x:tika] o.a.s.h.d.DataImporter 
> Starting Full Import
> 2015-11-04 23:16:25.961 INFO  (Thread-17) [   x:tika] 
> o.a.s.h.d.SimplePropertiesWriter Read dataimport.properties
> 2015-11-04 23:16:25.966 INFO  (qtp7980742-14) [   x:tika] o.a.s.c.S.Request 
> [tika] webapp=/solr path=/dataimport 
> params={indent=true=json=status&_=1446678985952} status=0 QTime=1
> 2015-11-04 23:16:25.998 INFO  (Thread-17) [   x:tika] o.a.s.c.SolrCore [tika] 
> REMOVING ALL DOCUMENTS FROM INDEX
> 2015-11-04 23:16:26.728 ERROR (Thread-17) [   x:tika] 
> o.a.s.h.d.EntityProcessorWrapper Exception in entity : 
> documentImport:org.apache.solr.handler.dataimport.DataImportHandlerException: 
> Unable to read content Processing Document # 1
>
>   at 
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT
> hrow(DataImportHandlerException.java:70)
>
>   at 
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEnt
> ityProcessor.java:168)
>
>   at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti
> tyProcessorWrapper.java:243)
>
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:475)
>
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:514)
>
>   at 
> org.apache.solr.handler.da

To update or change your SolrCloud configuration files

2015-11-04 Thread Aswath Srinivasan (TMS)
Hi fellow SOLR developers,

https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+FilesThis
 link says the below

To update or change your SolrCloud configuration files:

 1. Download the latest configuration files from ZooKeeper, using the

source control checkout process.

 2. Make your changes. Commit your changed file to source control.

 3. Push the changes back to ZooKeeper.

 4. Reload the collection so that the changes will be in effect.



But I was wondering if there are some examples or articles somewhere which can 
help me do this. For instance, how would one download the latest config files 
from zookeeper? Or should I know Zoopkeeper in-and-out to do this?

Version 5.3.1 is what I'm using

I updated the synonym file in the 
"C:\12345\solrcloud\solr-5.3.1\example\example-DIH\solr\db\conf" folder and I'm 
using the below command

zkcli -zkhost localhost:2181,localhost:2182,localhost:2183 -cmd upconfig 
-confname db_config -confdir 
"C:\12345\solrcloud\solr-5.3.1\example\example-DIH\solr\db\conf"

But the file doesn't seem to change nor the query seem to work. I reloaded the 
core which did not help too. I even tried restarting my solrcloud



RE: tikaparser docx file fails with exception

2015-11-04 Thread Aswath Srinivasan (TMS)

Trying to index a document. A docx file. Ending up with the below exception. 
Not sure why it is erroring out. When I opened the docx I was able to see lots 
of binary data like embedded pictures etc., Is there a possible solution to 
this or is it a bug? Only one such file fails. Rest of the files are smoothly 
indexed.

2015-11-04 23:16:11.549 INFO  (coreLoadExecutor-6-thread-1) [   x:tika] 
o.a.s.c.CoreContainer registering core: tika
2015-11-04 23:16:11.549 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.c.SolrCore QuerySenderListener sending requests to 
Searcher@1eb69b2[tika] 
main{ExitableDirectoryReader(UninvertingDirectoryReader())}
2015-11-04 23:16:11.585 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.c.S.Request [tika] webapp=null path=null 
params={q=static+firstSearcher+warming+in+solrconfig.xml=false=firstSearcher}
 hits=0 status=0 QTime=34
2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.c.SolrCore QuerySenderListener done.
2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.h.c.SpellCheckComponent Loading spell index for spellchecker: 
default
2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.h.c.SpellCheckComponent Loading spell index for spellchecker: 
wordbreak
2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.h.c.SuggestComponent buildOnStartup: mySuggester
2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.s.s.SolrSuggester SolrSuggester.build(mySuggester)
2015-11-04 23:16:11.605 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.c.SolrCore [tika] Registered new searcher 
Searcher@1eb69b2[tika] 
main{ExitableDirectoryReader(UninvertingDirectoryReader())}
2015-11-04 23:16:25.923 INFO  (qtp7980742-16) [   x:tika] 
o.a.s.h.d.DataImporter Loading DIH Configuration: tika-data-config.xml
2015-11-04 23:16:25.937 INFO  (qtp7980742-16) [   x:tika] 
o.a.s.h.d.DataImporter Data Configuration loaded successfully
2015-11-04 23:16:25.947 INFO  (qtp7980742-16) [   x:tika] o.a.s.c.S.Request 
[tika] webapp=/solr path=/dataimport 
params={debug=false=false=true=true=true=json=full-import=false}
 status=0 QTime=28
2015-11-04 23:16:25.948 INFO  (Thread-17) [   x:tika] o.a.s.h.d.DataImporter 
Starting Full Import
2015-11-04 23:16:25.961 INFO  (Thread-17) [   x:tika] 
o.a.s.h.d.SimplePropertiesWriter Read dataimport.properties
2015-11-04 23:16:25.966 INFO  (qtp7980742-14) [   x:tika] o.a.s.c.S.Request 
[tika] webapp=/solr path=/dataimport 
params={indent=true=json=status&_=1446678985952} status=0 QTime=1
2015-11-04 23:16:25.998 INFO  (Thread-17) [   x:tika] o.a.s.c.SolrCore [tika] 
REMOVING ALL DOCUMENTS FROM INDEX
2015-11-04 23:16:26.728 ERROR (Thread-17) [   x:tika] 
o.a.s.h.d.EntityProcessorWrapper Exception in entity : 
documentImport:org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to read content Processing Document # 1

  at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)

  at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:168)

  at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)

  at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)

  at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)

  at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)

  at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)

  at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)

  at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)

  at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)

  at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)

Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal 
IOException from 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@1b3e0a6

  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:262)

  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)

  at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)

  at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:162)

  ... 9 more

Caused by: java.io.CharConversionException: Characters larger than 4 bytes are 
not supported: byte 0xb7 implies a length of more than 4 bytes

  at 
org.apache.xmlbeans.impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:162)

  at