Re: exceeded limit of maxWarmingSearchers

2013-09-12 Thread Erick Erickson
I really think this is the wrong approach.

bq: We do a commit on every update, but updates are very infrequent

I doubt this is actually true. You may think it is, but you just don't get
more than 8 warming searchers in the situation you describe. Fix the
_real_ problem here.

Do what Hoss said. Look at your logs and examine the commits
that happen just before you get this message. I pretty much guarantee
that you will find a flurry of them in quick succession. Very quick
succession.

I claim you can get around this problem entirely by

1> setting your autocommit (hard, openSearcher=true) to
10 seconds. Which, btw, is very very low. I'd go with a minute
or more myself, and also configure soft commits if you're
on Solr 4 and really really care about latency.

2> never committing from the client.

The reason I'm adamant about this is what you're doing will come
back to bite you in the future if you don't fix the problem now. As you
get more and more documents in the system, your 8 warming
searchers will try to warm 8 versions of the caches you've configured
in solrconfig.xml. If your index is very large, you'll hit OOM errors.
Then you'll have to figure out why all over again.

Really, with 37ms warming, you have a pathological situation that
you need to understand and fix.

Best,
Erick


On Thu, Sep 12, 2013 at 4:18 PM, gfbj  wrote:

> I ended up having to do a mathematical increase of the delay
>
> 
>
> because the indexing eventually would outstrip the static value I set and
> crash the maxWarmingSearchers.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-tp489803p4089699.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Different Responses for 4.4 and 3.5 solr index

2013-09-12 Thread Kuchekar
Hi,

Any updates on this?. Is ranking computation dependent on the 'maxDoc'
value in the solr? Is this happening due to changing value of 'maxDoc'
value after each optimization. As in, in solr 4.4 every time optimization
is ran, the 'maxDoc' value is reset, where as this is not the case in solr
3.5.

Looking forward for the reply.

Thanks.
Kuchekar, Nilesh


On Wed, Aug 28, 2013 at 3:32 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:

> We've been seeing changes in our rankings as well.  I don't have a
> definite answer yet, since we're waiting on an index rebuild, but our
> current working theory is that the change to default omitNorms="true" for
> primitive types may have had an effect, possibly due to follow on
> confusion: our developers may have omitted norms from some other fields
> they shouldn't have?
>
> -Mike
>
>
> On 08/26/2013 09:46 AM, Stefan Matheis wrote:
>
>> Did you check the scoring? (use fl=*,score to retrieve it) ..
>> additionally debugQuery=true might provide more information about how the
>> score was calculated.
>>
>> - Stefan
>>
>>
>> On Monday, August 26, 2013 at 12:46 AM, Kuchekar wrote:
>>
>>  Hi,
>>> The response from 4.4 and 3.5 in the current scenario differs in the
>>> sequence in which results are given us back.
>>>
>>> For example :
>>>
>>> Response from 3.5 solr is : id:A, id:B, id:C, id:D ...
>>> Response from 4.4 solr is : id C, id:A, id:D, id:B...
>>>
>>> Looking forward your reply.
>>>
>>> Thanks.
>>> Kuchekar, Nilesh
>>>
>>>
>>> On Sun, Aug 25, 2013 at 11:32 AM, Stefan Matheis
>>> >> (mailto:matheis.stefan@gmail.**com
>>> )>wrote:
>>>
>>>  Kuchekar (hope that's your first name?)

 you didn't tell us .. how they differ? do you get an actual error? or
 does
 the result contain documents you didn't expect? or the other way round,
 that some are missing you'd expect to be there?

 - Stefan


 On Sunday, August 25, 2013 at 4:43 PM, Kuchekar wrote:

  Hi,
>
> We get different response when we query 4.4 and 3.5 solr using same
> query params.
>
> My query param are as following :
>
> facet=true
> &facet.mincount=1
> &facet.limit=25
>
>  &qf=content^0.0+p_last_name^**500.0+p_first_name^50.0+**
 strong_topic^0.0+first_author_**topic^0.0+last_author_topic^0.**
 0+title_topic^0.0

> &wt=javabin
> &version=2
> &rows=10
> &f.affiliation_org.facet.**limit=150
> &fl=p_id,p_first_name,p_last_**name
> &start=0
> &q=Apple
> &facet.field=affiliation_org
> &fq=table:profile
> &fq=num_content:[*+TO+1500]
> &fq=name:"Apple"
>
> The content in both (solr 4.4 and solr 3.5) are same.
>
> The solrconfig.xml from 3.5 an 4.4 are similarly constructed.
>
> Is there something I am missing that might have been changed in 4.4,
>
 which

> might be causing this issue. ?. The "qf" params looks same.
>
> Looking forward for your reply.
>
> Thanks.
> Kuchekar, Nilesh
>
>

>>>
>>>
>>
>>
>


Solr 4.5 spatial search - distance and score

2013-09-12 Thread Weber
I'm trying to get score by using a custom boost and also get the distance. I
found David's code* to get it using "Intersects", which I want to replace by
{!geofilt} or geodist()

*David's code: https://issues.apache.org/jira/browse/SOLR-4255

He told me geodist() will be available again for this kind of field, which
is a geohash type.

Then, I'd like to know how it can be done today on 4.4 with {!geofilt} and
how it will be done on 4.5 using geodist()

Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-5-spatial-search-distance-and-score-tp4089706.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Get the commit time of a document in Solr

2013-09-12 Thread Otis Gospodnetic
Solr admin exposes time of last commit. You can use that.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Sep 12, 2013 3:22 PM, "phanichaitanya"  wrote:

> Apologies again. But here is another try :
>
> I want to make sure that documents that are indexed are committed in say an
> hour. I agree that if you pass commitWithIn params and the like will make
> sure of that based on the time configurations we set. But, I want to make
> sure that the document is really committed within whatever time we set
> using
> commitWithIn.
>
> It's a question asking for proof that Solr commits within that time if we
> add commitWithIn parameter to the configuration.
>
> That is about commitWithIn parameter option that you suggested.
>
> Now is there a way to explicitly get all the documents that are committed
> when a hard commit request is issued ? This might not make sense but we are
> pondered with that question.
>
>
>
> -
> Phani Chaitanya
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089687.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Stop filter changes in Solr >= 4.4

2013-09-12 Thread Christopher Condit
While attempting to upgrade from Solr 4.3.0 to Solr 4.4.0 I ran into
this exception:

 java.lang.IllegalArgumentException: enablePositionIncrements=false is
not supported anymore as of Lucene 4.4 as it can create broken token
streams

which led me to https://issues.apache.org/jira/browse/LUCENE-4963.  I
need to be able to match queries irrespective of intervening stopwords
(which used to work with enablePositionIncrements="true"). For
instance: "foo of the bar" would find documents matching "foo bar",
"foo of bar", and "foo of the bar". With this option deprecated in
4.4.0 I'm not clear on how to maintain the same functionality.

The package javadoc adds:

If the selected analyzer filters the stop words "is" and "the", then
for a document containing the string "blue is the sky", only the
tokens "blue", "sky" are indexed, with position("sky") = 3 +
position("blue"). Now, a phrase query "blue is the sky" would find
that document, because the same analyzer filters the same stop words
from that query. But the phrase query "blue sky" would not find that
document because the position increment between "blue" and "sky" is
only 1.

If this behavior does not fit the application needs, the query parser
needs to be configured to not take position increments into account
when generating phrase queries.

But there's no mention of how to actually configure the query parser
to do this. Does anyone know how to deal with this issue as Solr moves
toward 5.0?

Crossposted from stackoverflow:
http://stackoverflow.com/questions/18668376/solr-4-4-stopfilterfactory-and-enablepositionincrements


"Unable to connect" to "http://localhost:8983/solr/"

2013-09-12 Thread Raheel Hasan
Hi,

I just have this issue came out of no where
Everything was fine until all of a sudden the browser cant connect to this
solr.


Here is the solr log:

INFO  - 2013-09-12 20:07:58.142; org.eclipse.jetty.server.Server;
jetty-8.1.8.v20121106
INFO  - 2013-09-12 20:07:58.179;
org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor
E:\Projects\G1\A1\trunk\solr_root\solrization\contexts at interval 0
INFO  - 2013-09-12 20:07:58.191;
org.eclipse.jetty.deploy.DeploymentManager; Deployable added:
E:\Projects\G1\A1\trunk\solr_root\solrization\contexts\solr-jetty-context.xml
INFO  - 2013-09-12 20:07:59.159;
org.eclipse.jetty.webapp.StandardDescriptorProcessor; NO JSP Support for
/solr, did not find org.apache.jasper.servlet.JspServlet
INFO  - 2013-09-12 20:07:59.189;
org.eclipse.jetty.server.handler.ContextHandler; started
o.e.j.w.WebAppContext{/solr,file:/E:/Projects/G1/A1/trunk/solr_root/solrization/solr-webapp/webapp/},E:\Projects\G1\A1\trunk\solr_root\solrization/webapps/solr.war
INFO  - 2013-09-12 20:07:59.190;
org.eclipse.jetty.server.handler.ContextHandler; started
o.e.j.w.WebAppContext{/solr,file:/E:/Projects/G1/A1/trunk/solr_root/solrization/solr-webapp/webapp/},E:\Projects\G1\A1\trunk\solr_root\solrization/webapps/solr.war
INFO  - 2013-09-12 20:07:59.206;
org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init()
INFO  - 2013-09-12 20:07:59.231; org.apache.solr.core.SolrResourceLoader;
JNDI not configured for solr (NoInitialContextEx)
INFO  - 2013-09-12 20:07:59.231; org.apache.solr.core.SolrResourceLoader;
solr home defaulted to 'solr/' (could not find system property or JNDI)
INFO  - 2013-09-12 20:07:59.241;
org.apache.solr.core.CoreContainer$Initializer; looking for solr config
file: E:\Projects\G1\A1\trunk\solr_root\solrization\solr\solr.xml
INFO  - 2013-09-12 20:07:59.244; org.apache.solr.core.CoreContainer; New
CoreContainer 24012447
INFO  - 2013-09-12 20:07:59.244; org.apache.solr.core.CoreContainer;
Loading CoreContainer using Solr Home: 'solr/'
INFO  - 2013-09-12 20:07:59.245; org.apache.solr.core.SolrResourceLoader;
new SolrResourceLoader for directory: 'solr/'
INFO  - 2013-09-12 20:07:59.483;
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
socketTimeout to: 0
INFO  - 2013-09-12 20:07:59.484;
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
urlScheme to: http://
INFO  - 2013-09-12 20:07:59.485;
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
connTimeout to: 0
INFO  - 2013-09-12 20:07:59.486;
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
maxConnectionsPerHost to: 20
INFO  - 2013-09-12 20:07:59.487;
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
corePoolSize to: 0
INFO  - 2013-09-12 20:07:59.488;
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
maximumPoolSize to: 2147483647
INFO  - 2013-09-12 20:07:59.489;
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
maxThreadIdleTime to: 5
INFO  - 2013-09-12 20:07:59.490;
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
sizeOfQueue to: -1
INFO  - 2013-09-12 20:07:59.490;
org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
fairnessPolicy to: false
INFO  - 2013-09-12 20:07:59.498;
org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client,
config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
INFO  - 2013-09-12 20:07:59.671; org.apache.solr.core.CoreContainer;
Registering Log Listener
INFO  - 2013-09-12 20:07:59.689; org.apache.solr.core.CoreContainer;
Creating SolrCore 'A1' using instanceDir: solr\A1
INFO  - 2013-09-12 20:07:59.690; org.apache.solr.core.SolrResourceLoader;
new SolrResourceLoader for directory: 'solr\A1\'
INFO  - 2013-09-12 20:07:59.724; org.apache.solr.core.SolrConfig; Adding
specified lib dirs to ClassLoader
INFO  - 2013-09-12 20:07:59.726; org.apache.solr.core.SolrResourceLoader;
Adding
'file:/E:/Projects/G1/A1/trunk/solr_root/solrization/lib/mysql-connector-java-5.1.25-bin.jar'
to classloader
INFO  - 2013-09-12 20:07:59.727; org.apache.solr.core.SolrResourceLoader;
Adding
'file:/E:/Projects/G1/A1/trunk/solr_root/contrib/dataimporthandler/lib/activation-1.1.jar'
to classloader
INFO  - 2013-09-12 20:07:59.727; org.apache.solr.core.SolrResourceLoader;
Adding
'file:/E:/Projects/G1/A1/trunk/solr_root/contrib/dataimporthandler/lib/mail-1.4.1.jar'
to classloader
INFO  - 2013-09-12 20:07:59.728; org.apache.solr.core.SolrResourceLoader;
Adding
'file:/E:/Projects/G1/A1/trunk/solr_root/dist/solr-dataimporthandler-4.3.0.jar'
to classloader
INFO  - 2013-09-12 20:07:59.729; org.apache.solr.core.SolrResourceLoader;
Adding
'file:/E:/Projects/G1/A1/trunk/solr_root/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.3.0.jar'
to classloader
INFO  - 2013-09-12 20:07:59.729; org.apache.solr.core.SolrResourceLoader;
Adding
'file:/E:/Projects/G1/A1/trunk/solr_root/contrib/analysis-extras/lucene-libs/lu

Re: Regarding improving performance of the solr

2013-09-12 Thread Steve Rowe
Hi Prabu,

It's difficult to tell what's going wrong without the full exception stack 
trace, including what the exception is.

If you can provide the specific input that triggers the exception, that might 
also help.

Steve

On Sep 12, 2013, at 4:14 AM, prabu palanisamy  wrote:

> Hi
> 
> I tried to reindex the solr. I get the regular expression problem. The
> steps I followed are
> 
> I started the java -jar start.jar
> http://localhost:8983/solr/update?stream.body=
> *:*
> http://localhost:8983/solr/update?stream.body=
> I stopped the solr server
> 
> I changed indexed and stored tags as false for some of the fields in
> schema.xml
> 
>  required="true"/>
>  multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
> 
> 
> 
>  multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
>  stored="false"/>
>  stored="false"  multiValued="true" compressed="true" termVectors="true"
> termPositions="true" termOffsets="true"/>
>  multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
>  multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
>  stored="true"  multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
> 
> 
> id
> 
> 
> My data-config.xml
> 
>
>
>processor="XPathEntityProcessor"
>stream="true"
>forEach="/mediawiki/page/"
>url="/home/prabu/wikipedia_full_indexed_dump.xml"
> 
> transformer="RegexTransformer,DateFormatTransformer,HTMLStripTransformer"
>> 
> stripHTML="true"/>
> stripHTML="true"/>
> stripHTML="true"/>
> stripHTML="true"/>
> xpath="/mediawiki/page/revision/contributor/username" stripHTML="true"/>
> xpath="/mediawiki/page/revision/contributor/id" stripHTML="true"/>
> stripHTML="true"/>
> stripHTML="true"/>
> stripHTML="true"/>
> stripHTML="true"/>
> xpath="/mediawiki/page/revision/timestamp"
> dateTimeFormat="-MM-dd'T'hh:mm:ss'Z'" />
> replaceWith="true" sourceColName="text"/>
> sourceColName="text" stripHTML="true"/>
> sourceColName="title"/>
>   
>
> 
> 
> I tried the http://localhost:8983/solr/dataimport?command=full-import.  At
> 50,000 document, I get some error related to regular expression.
> 
> at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
>   at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
>   at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
>   at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
>   at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
>   at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
>   at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
>   at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
>   at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
>   at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
>   at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
> 
> I do not how to proceed. Please help me out.
> 
> Thanks and Regards
> Prabu
> 
> 
> On Wed, Sep 11, 2013 at 11:31 AM, Erick Erickson 
> wrote:
> 
>> Be a little careful when extrapolating from disk to memory.
>> Any fields where you've set stored="true" will put data in
>> segment files with extensions .fdt and .fdx, see
>> These are the compressed verbatim copy of the data
>> for stored fields and have very little impact on
>> memory required for searching. I've seen indexes where
>> 75% of the data is stored and indexes where 5% of the
>> data is stored.
>> 
>> "Summary of File Extensions" here:
>> 
>> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html
>> 
>> Best,
>> Erick
>> 
>> 
>> On Wed, Sep 11, 2013 at 2:57 AM, prabu palanisamy >> wrote:
>> 
>>> @Shawn: Correctly I am trying to reduce the index size. I am working on
>>> reindex the solr with some of the features as indexed and not stored
>>> 
>>> @Jean: I tried with  different caches. It did not show much improvement.
>>> 
>>> 
>>> On Fri, Sep 6, 2013 at 3:17 PM, Shawn Heisey  wrote:
>>> 
 On 9/6/2013 2:54 AM, prabu palanisamy wrote:
> I am currently using solr -3.5.0,  indexed  wikipedia dump (50 gb)
>> with
> java 1.6.
> I am searching the solr with text (which is actually twitter tweets)
>> .
> Currently it takes average time of 210 millisecond for each post, out
>>> of
> which 200 millisecond is consumed by solr server (QTime).  I used the
> jconsole monitor tool.
 
 If the size of all your Solr indexes on disk is in the 50GB range of
 your wikipedia dump, then for ideal performance, you'll want to have
 50GB of free memory so the OS can cache your index.  You might be able
 to get by with 25-30GB of free memory, d

Get the commit time of a document in Solr

2013-09-12 Thread phanichaitanya
I'd like to know when a document is committed in Solr vs. the indexed time. 

For indexed time, I can add a field as : .

If I have say, 10 million docs indexed and I want to know the actual commit
time of the document which makes it searchable. The problem is to just find
the time when a document can be searchable which will be after it is
committed ? (I don't want to do any soft commits).

If there is a way to know this, please let me know so that I'd like to know
more details based on it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Get the commit time of a document in Solr

2013-09-12 Thread Jack Krupansky

Yes, the document will be searchable after it is committed.

Although you can also do auto commits and commitWithin which do not 
guarantee immediate visibility of index changes, you can do a hard commit 
any time you want to make a document searchable.


-- Jack Krupansky

-Original Message- 
From: phanichaitanya

Sent: Thursday, September 12, 2013 12:07 PM
To: solr-user@lucene.apache.org
Subject: Get the commit time of a document in Solr

I'd like to know when a document is committed in Solr vs. the indexed time.

For indexed time, I can add a field as : .

If I have say, 10 million docs indexed and I want to know the actual commit
time of the document which makes it searchable. The problem is to just find
the time when a document can be searchable which will be after it is
committed ? (I don't want to do any soft commits).

If there is a way to know this, please let me know so that I'd like to know
more details based on it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Get the commit time of a document in Solr

2013-09-12 Thread Shawn Heisey
On 9/12/2013 11:04 AM, phanichaitanya wrote:
> So, now I want to know when that document becomes searchable or when it is
> committed. I've the following scenario:
> 
> 1) Indexing starts at say 9:00 AM - with the above additions to the
> schema.xml I'll know the indexed time of each document I send to Solr via
> the update handler. Say 9:01, 9:02 and so on ... lets say I send a document
> for every second between 9 - 9:30 AM and it makes it 30*60 = 1800 docs
> 2) Now at 9:30 AM, I issue a hard commit and now I'll be able to search
> these 1800 documents which is fine.
> 3) Now I want to know that I can search these 1800 documents only at >=9:30
> AM but not < 9:30 AM as I did not do a hard commit before 9:30 AM. 
> 
> In order to know that, is there a way in Solr rather than some application
> keeping track of the documents it sends to Solr between any two commits. The
> reason I'm asking is, if there are say two parallel processes indexing to
> the same index and one process issues a commit - then whatever documents
> process two indexed until that point of time would also be committed right ?
> Now if I keep track of commit times in each process it doesn't reflect the
> true commit times as they are inter-twined.

>From what I understand, if you use the default of NOW for a field in
your schema, then all documents indexed in that request will have the
timestamp of the time that indexing started.

Assuming what I understand is the way it actually works, if you want the
time to reflect anything even close to commit time, then you will need
to send very small batches and you will need to commit after every
batch.  If you are indexing very quickly, you'll probably want those
commits to be soft commits.

You'll also want to have an autoCommit set up to do hard commits less
frequently with openSearcher=false, or you'll run into the problem
described at the link below.  There is a good autoCommit example there:

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

I've heard (but have not tested) that with the NOW default, large
imports with the dataimporthandler will all have the timestamp of when
the DIH request started, no matter what you do with autoCommit or
autoSoftCommit.

Thanks,
Shawn



Re: Some highlighted snippets aren't being returned

2013-09-12 Thread Eric O'Hanlon
maxAnalyzedChars did it!  I wasn't setting that param, and I'm working with 
some very long documents.  I also made the hl.fl param formatting change that 
you suggested, Aloke.

Thanks again!

- Eric

On Sep 11, 2013, at 3:10 AM, Eric O'Hanlon  wrote:

> Thank you, Aloke and Bryan!  I'll give this a try and I'll report back on 
> what happens!
> 
> - Eric
> 
> On Sep 9, 2013, at 2:32 AM, Aloke Ghoshal  wrote:
> 
>> Hi Eric,
>> 
>> As Bryan suggests, you should look at appropriately setting up the
>> fragSize & maxAnalyzedChars for long documents.
>> 
>> One issue I find with your search request is that in trying to
>> highlight across three separate fields, you have added each of them as
>> a separate request param:
>> hl.fl=contents&hl.fl=title&hl.fl=original_url
>> 
>> The way to do it would be
>> (http://wiki.apache.org/solr/HighlightingParameters#hl.fl) to pass
>> them as values to one comma (or space) separated field:
>> hl.fl=contents,title,original_url
>> 
>> Regards,
>> Aloke
>> 
>> On 9/9/13, Bryan Loofbourrow  wrote:
>>> Eric,
>>> 
>>> Your example document is quite long. Are you setting hl.maxAnalyzedChars?
>>> If you don't, the highlighter you appear to be using will not look past
>>> the first 51,200 characters of the document for snippet candidates.
>>> 
>>> http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars
>>> 
>>> -- Bryan
>>> 
>>> 
 -Original Message-
 From: Eric O'Hanlon [mailto:elo2...@columbia.edu]
 Sent: Sunday, September 08, 2013 2:01 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Some highlighted snippets aren't being returned
 
 Hi again Everyone,
 
 I didn't get any replies to this, so I thought I'd re-send in case
>>> anyone
 missed it and has any thoughts.
 
 Thanks,
 Eric
 
 On Aug 7, 2013, at 1:51 PM, Eric O'Hanlon  wrote:
 
> Hi Everyone,
> 
> I'm facing an issue in which my solr query is returning highlighted
 snippets for some, but not all results.  For reference, I'm searching
 through an index that contains web crawls of human-rights-related
 websites.  I'm running solr as a webapp under Tomcat and I've included
>>> the
 query's solr params from the Tomcat log:
> 
> ...
> webapp=/solr-4.2
> path=/select
> 
 
>>> params={facet=true&sort=score+desc&group.limit=10&spellcheck.q=Unangan&f.m
 
>>> imetype_code.facet.limit=7&hl.simple.pre=&q.alt=*:*&f.organization_t
 
>>> ype__facet.facet.limit=6&f.language__facet.facet.limit=6&hl=true&f.date_of
 
>>> _capture_.facet.limit=6&group.field=original_url&hl.simple.post=>>> 
 &facet.field=domain&facet.field=date_of_capture_&facet.field=mimetype
 
>>> _code&facet.field=geographic_focus__facet&facet.field=organization_based_i
 
>>> n__facet&facet.field=organization_type__facet&facet.field=language__facet&
 
>>> facet.field=creator_name__facet&hl.fragsize=600&f.creator_name__facet.face
 
>>> t.limit=6&facet.mincount=1&qf=text^1&hl.fl=contents&hl.fl=title&hl.fl=orig
 
>>> inal_url&wt=ruby&f.geographic_focus__facet.facet.limit=6&defType=edismax&r
 
>>> ows=10&f.domain.facet.limit=6&q=Unangan&f.organization_based_in__facet.fac
 et.limit=6&q.op=AND&group=true&hl.usePhraseHighlighter=true} hits=8
 status=0 QTime=108
> ...
> 
> For the query above (which can be simplified to say: find all
>>> documents
 that contain the word "unangan" and return facets, highlights, etc.), I
 get five search results.  Only three of these are returning highlighted
 snippets.  Here's the "highlighting" portion of the solr response (note:
 printed in ruby notation because I'm receiving this response in a Rails
 app):
> 
> 
> "highlighting"=>
> 
 
>>> {"20100602195444/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%
 202002%20tentang%20Perlindungan%20Anak.pdf"=>
>  {},
> 
 
>>> "20100902203939/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%2
 02002%20tentang%20Perlindungan%20Anak.pdf"=>
>  {},
> 
 
>>> "20111202233029/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%2
 02002%20tentang%20Perlindungan%20Anak.pdf"=>
>  {},
> "20100618201646/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
>  {"contents"=>
>["...actual snippet is returned here..."]},
> "20100902235358/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
>  {"contents"=>
>["...actual snippet is returned here..."]},
> "20110302213056/http://www.komnasham.go.id/publikasi/doc_download/2-
 uu-no-39-tahun-1999"=>
>  {"contents"=>
>["...actual snippet is returned here..."]},
> 
>>> "20110302213102/http://www.komnasham.go.id/publikasi/doc_view/2-uu-no-
 39-tahun-1999?tmpl=component&format=raw"=>
>  {"contents"=>
>["...actual snippet is returned here..."]},
> 
 
>>> "20120303113654/http://www.iwgia.org/i

Re: SolrCloud 4.x hangs under high update volume

2013-09-12 Thread Tim Vaillancourt
That makes sense, thanks Erick and Mark for you help! :)

I'll see if I can find a place to assist with the testing of SOLR-5232.

Cheers,

Tim



On 12 September 2013 11:16, Mark Miller  wrote:

> Right, I don't see SOLR-5232 making 4.5 unfortunately. It could perhaps
> make a 4.5.1 - it does resolve a critical issue - but 4.5 is in motion and
> SOLR-5232 is not quite ready - we need some testing.
>
> - Mark
>
> On Sep 12, 2013, at 2:12 PM, Erick Erickson 
> wrote:
>
> > My take on it is this, assuming I'm reading this right:
> > 1> SOLR-5216 - probably not going anywhere, 5232 will take care of it.
> > 2> SOLR-5232 - expected to fix the underlying issue no matter whether
> > you're using CloudSolrServer from SolrJ or sending lots of updates from
> > lots of clients.
> > 3> SOLR-4816 - use this patch and CloudSolrServer from SolrJ in the
> > meantime.
> >
> > I don't quite know whether SOLR-5232 will make it in to 4.5 or not, it
> > hasn't been committed anywhere yet. The Solr 4.5 release is imminent, RC0
> > is looking like it'll be ready to cut next week so it might not be
> included.
> >
> > Best,
> > Erick
> >
> >
> > On Thu, Sep 12, 2013 at 1:42 PM, Tim Vaillancourt  >wrote:
> >
> >> Lol, at breaking during a demo - always the way it is! :) I agree, we
> are
> >> just tip-toeing around the issue, but waiting for 4.5 is definitely an
> >> option if we "get-by" for now in testing; patched Solr versions seem to
> >> make people uneasy sometimes :).
> >>
> >> Seeing there seems to be some danger to SOLR-5216 (in some ways it
> blows up
> >> worse due to less limitations on thread), I'm guessing only SOLR-5232
> and
> >> SOLR-4816 are making it into 4.5? I feel those 2 in combination will
> make a
> >> world of difference!
> >>
> >> Thanks so much again guys!
> >>
> >> Tim
> >>
> >>
> >>
> >> On 12 September 2013 03:43, Erick Erickson 
> >> wrote:
> >>
> >>> Fewer client threads updating makes sense, and going to 1 core also
> seems
> >>> like it might help. But it's all a crap-shoot unless the underlying
> cause
> >>> gets fixed up. Both would improve things, but you'll still hit the
> >> problem
> >>> sometime, probably when doing a demo for your boss ;).
> >>>
> >>> Adrien has branched the code for SOLR 4.5 in preparation for a release
> >>> candidate tentatively scheduled for next week. You might just start
> >> working
> >>> with that branch if you can rather than apply individual patches...
> >>>
> >>> I suspect there'll be a couple more changes to this code (looks like
> >>> Shikhar already raised an issue for instance) before 4.5 is finally
> >> cut...
> >>>
> >>> FWIW,
> >>> Erick
> >>>
> >>>
> >>>
> >>> On Thu, Sep 12, 2013 at 2:13 AM, Tim Vaillancourt <
> t...@elementspace.com
>  wrote:
> >>>
>  Thanks Erick!
> 
>  Yeah, I think the next step will be CloudSolrServer with the SOLR-4816
>  patch. I think that is a very, very useful patch by the way. SOLR-5232
>  seems promising as well.
> 
>  I see your point on the more-shards idea, this is obviously a
>  global/instance-level lock. If I really had to, I suppose I could run
> >>> more
>  Solr instances to reduce locking then? Currently I have 2 cores per
>  instance and I could go 1-to-1 to simplify things.
> 
>  The good news is we seem to be more stable since changing to a bigger
>  client->solr batch-size and fewer client threads updating.
> 
>  Cheers,
> 
>  Tim
> 
>  On 11/09/13 04:19 AM, Erick Erickson wrote:
> 
> > If you use CloudSolrServer, you need to apply SOLR-4816 or use a
> >> recent
> > copy of the 4x branch. By "recent", I mean like today, it looks like
> >>> Mark
> > applied this early this morning. But several reports indicate that
> >> this
> > will
> > solve your problem.
> >
> > I would expect that increasing the number of shards would make the
> >>> problem
> > worse, not
> > better.
> >
> > There's also SOLR-5232...
> >
> > Best
> > Erick
> >
> >
> > On Tue, Sep 10, 2013 at 5:20 PM, Tim Vaillancourt >>> **com
> >> wrote:
> >
> > Hey guys,
> >>
> >> Based on my understanding of the problem we are encountering, I feel
> >> we've
> >> been able to reduce the likelihood of this issue by making the
> >>> following
> >> changes to our app's usage of SolrCloud:
> >>
> >> 1) We increased our document batch size to 200 from 10 - our app
> >>> batches
> >> updates to reduce HTTP requests/overhead. The theory is increasing
> >> the
> >> batch size reduces the likelihood of this issue happening.
> >> 2) We reduced to 1 application node sending updates to SolrCloud -
> we
> >> write
> >> Solr updates to Redis, and have previously had 4 application nodes
> >> pushing
> >> the updates to Solr (popping off the Redis queue). Reducing the
> >> number
> >>> of
> >> nodes pushing to Solr reduces the concurrency on SolrClou

Re: charset encoding

2013-09-12 Thread Shawn Heisey
On 9/12/2013 11:17 AM, Andreas Owen wrote:
> it was the http-header, as soon as i force a iso-8859-1 header it worked

Glad you found a workaround!

If you are in a situation where you cannot control the header of the
request or modify the content itself to include charset information, or
there's some reason you would rather not take that route, there will be
another way with the next Solr release.

https://issues.apache.org/jira/browse/SOLR-5082

Solr 4.5 will support an "ie" (input encoding) parameter for the update
request so you can inform Solr what charset encoding to expect.  The
release process for Solr 4.5 has been started, it usually takes 2-3
weeks to complete.

Thanks,
Shawn



Re: SolrCloud 4.x hangs under high update volume

2013-09-12 Thread Erick Erickson
My take on it is this, assuming I'm reading this right:
1> SOLR-5216 - probably not going anywhere, 5232 will take care of it.
2> SOLR-5232 - expected to fix the underlying issue no matter whether
you're using CloudSolrServer from SolrJ or sending lots of updates from
lots of clients.
3> SOLR-4816 - use this patch and CloudSolrServer from SolrJ in the
meantime.

I don't quite know whether SOLR-5232 will make it in to 4.5 or not, it
hasn't been committed anywhere yet. The Solr 4.5 release is imminent, RC0
is looking like it'll be ready to cut next week so it might not be included.

Best,
Erick


On Thu, Sep 12, 2013 at 1:42 PM, Tim Vaillancourt wrote:

> Lol, at breaking during a demo - always the way it is! :) I agree, we are
> just tip-toeing around the issue, but waiting for 4.5 is definitely an
> option if we "get-by" for now in testing; patched Solr versions seem to
> make people uneasy sometimes :).
>
> Seeing there seems to be some danger to SOLR-5216 (in some ways it blows up
> worse due to less limitations on thread), I'm guessing only SOLR-5232 and
> SOLR-4816 are making it into 4.5? I feel those 2 in combination will make a
> world of difference!
>
> Thanks so much again guys!
>
> Tim
>
>
>
> On 12 September 2013 03:43, Erick Erickson 
> wrote:
>
> > Fewer client threads updating makes sense, and going to 1 core also seems
> > like it might help. But it's all a crap-shoot unless the underlying cause
> > gets fixed up. Both would improve things, but you'll still hit the
> problem
> > sometime, probably when doing a demo for your boss ;).
> >
> > Adrien has branched the code for SOLR 4.5 in preparation for a release
> > candidate tentatively scheduled for next week. You might just start
> working
> > with that branch if you can rather than apply individual patches...
> >
> > I suspect there'll be a couple more changes to this code (looks like
> > Shikhar already raised an issue for instance) before 4.5 is finally
> cut...
> >
> > FWIW,
> > Erick
> >
> >
> >
> > On Thu, Sep 12, 2013 at 2:13 AM, Tim Vaillancourt  > >wrote:
> >
> > > Thanks Erick!
> > >
> > > Yeah, I think the next step will be CloudSolrServer with the SOLR-4816
> > > patch. I think that is a very, very useful patch by the way. SOLR-5232
> > > seems promising as well.
> > >
> > > I see your point on the more-shards idea, this is obviously a
> > > global/instance-level lock. If I really had to, I suppose I could run
> > more
> > > Solr instances to reduce locking then? Currently I have 2 cores per
> > > instance and I could go 1-to-1 to simplify things.
> > >
> > > The good news is we seem to be more stable since changing to a bigger
> > > client->solr batch-size and fewer client threads updating.
> > >
> > > Cheers,
> > >
> > > Tim
> > >
> > > On 11/09/13 04:19 AM, Erick Erickson wrote:
> > >
> > >> If you use CloudSolrServer, you need to apply SOLR-4816 or use a
> recent
> > >> copy of the 4x branch. By "recent", I mean like today, it looks like
> > Mark
> > >> applied this early this morning. But several reports indicate that
> this
> > >> will
> > >> solve your problem.
> > >>
> > >> I would expect that increasing the number of shards would make the
> > problem
> > >> worse, not
> > >> better.
> > >>
> > >> There's also SOLR-5232...
> > >>
> > >> Best
> > >> Erick
> > >>
> > >>
> > >> On Tue, Sep 10, 2013 at 5:20 PM, Tim Vaillancourt > **com
> > >> >wrote:
> > >>
> > >>  Hey guys,
> > >>>
> > >>> Based on my understanding of the problem we are encountering, I feel
> > >>> we've
> > >>> been able to reduce the likelihood of this issue by making the
> > following
> > >>> changes to our app's usage of SolrCloud:
> > >>>
> > >>> 1) We increased our document batch size to 200 from 10 - our app
> > batches
> > >>> updates to reduce HTTP requests/overhead. The theory is increasing
> the
> > >>> batch size reduces the likelihood of this issue happening.
> > >>> 2) We reduced to 1 application node sending updates to SolrCloud - we
> > >>> write
> > >>> Solr updates to Redis, and have previously had 4 application nodes
> > >>> pushing
> > >>> the updates to Solr (popping off the Redis queue). Reducing the
> number
> > of
> > >>> nodes pushing to Solr reduces the concurrency on SolrCloud.
> > >>> 3) Less threads pushing to SolrCloud - due to the increase in batch
> > size,
> > >>> we were able to go down to 5 update threads on the update-pushing-app
> > >>> (from
> > >>> 10 threads).
> > >>>
> > >>> To be clear the above only reduces the likelihood of the issue
> > happening,
> > >>> and DOES NOT actually resolve the issue at hand.
> > >>>
> > >>> If we happen to encounter issues with the above 3 changes, the next
> > steps
> > >>> (I could use some advice on) are:
> > >>>
> > >>> 1) Increase the number of shards (2x) - the theory here is this
> reduces
> > >>> the
> > >>> locking on shards because there are more shards. Am I onto something
> > >>> here,
> > >>> or will this not help at all?
> > >>> 2) Use CloudSolrServer - currently we have 

Re: Get the commit time of a document in Solr

2013-09-12 Thread phanichaitanya
So, now I want to know when that document becomes searchable or when it is
committed. I've the following scenario:

1) Indexing starts at say 9:00 AM - with the above additions to the
schema.xml I'll know the indexed time of each document I send to Solr via
the update handler. Say 9:01, 9:02 and so on ... lets say I send a document
for every second between 9 - 9:30 AM and it makes it 30*60 = 1800 docs
2) Now at 9:30 AM, I issue a hard commit and now I'll be able to search
these 1800 documents which is fine.
3) Now I want to know that I can search these 1800 documents only at >=9:30
AM but not < 9:30 AM as I did not do a hard commit before 9:30 AM. 

In order to know that, is there a way in Solr rather than some application
keeping track of the documents it sends to Solr between any two commits. The
reason I'm asking is, if there are say two parallel processes indexing to
the same index and one process issues a commit - then whatever documents
process two indexed until that point of time would also be committed right ?
Now if I keep track of commit times in each process it doesn't reflect the
true commit times as they are inter-twined.



-
Phani Chaitanya
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089638.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Get the commit time of a document in Solr

2013-09-12 Thread Shawn Heisey
On 9/12/2013 12:55 PM, phanichaitanya wrote:
> I want to make sure that documents that are indexed are committed in say an
> hour. I agree that if you pass commitWithIn params and the like will make
> sure of that based on the time configurations we set. But, I want to make
> sure that the document is really committed within whatever time we set using
> commitWithIn.
> 
> It's a question asking for proof that Solr commits within that time if we
> add commitWithIn parameter to the configuration.
> 
> That is about commitWithIn parameter option that you suggested.
> 
> Now is there a way to explicitly get all the documents that are committed
> when a hard commit request is issued ? This might not make sense but we are
> pondered with that question.

If these are ongoing requirements that you need to with every commit or
with a large subset of commits, then I don't think there is any way to
do it without writing custom plugins for Solr.

If you are just trying to prove to someone that Solr is doing what you
say it is, then you can do some simple testing:

Send an update request with as many documents as you want to test, and
include commit=true on the request.  If you are planning to use
commitWithin, also include SoftCommit=true, because commitWithin is a
soft commit.

Time how long it takes for the update request to complete.  That's
approximately how long it will take for a "real" update/commit to
happen.  There will be some extra time for the indexing itself, but
unless the document count is absolutely enormous, it shouldn't matter
too much.

If you want to test just the commit time, then (after making sure
nothing else is sending updates or commits) send the update without any
commit parameters, then send a commit request by itself and time how
long the commit request takes.

With enough RAM for proper OS disk caching, commits should be very fast
even on an index with 10 million documents.  Here is a wiki page that
has a small amount of discussion about slow commits:

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits

Thanks,
Shawn



Re: Get the commit time of a document in Solr

2013-09-12 Thread Raymond Wiker
On Sep 12, 2013, at 20:55 , phanichaitanya  wrote:
> Apologies again. But here is another try :
> 
> I want to make sure that documents that are indexed are committed in say an
> hour. I agree that if you pass commitWithIn params and the like will make
> sure of that based on the time configurations we set. But, I want to make
> sure that the document is really committed within whatever time we set using
> commitWithIn.
> 
> It's a question asking for proof that Solr commits within that time if we
> add commitWithIn parameter to the configuration.
> 
> That is about commitWithIn parameter option that you suggested.
> 
> Now is there a way to explicitly get all the documents that are committed
> when a hard commit request is issued ? This might not make sense but we are
> pondered with that question.
> 

If you have a timestamp field that defaults to NOW, you could do queries for a 
single document (q=*), ranked by descending timestamp. If you're  feeding 
constantly, and run these queries regularly, you should be able to get some 
sort of feel for the latency in the system.

Re: Get the commit time of a document in Solr

2013-09-12 Thread Jack Krupansky
Sorry, but all you've done is reshuffle your previous statements but without 
telling us about the actual problem that you are trying to solve!


Repeating myself: You, the application developer can send a hard commit any 
time you want to assure that documents are searchable. Maybe not every 
millisecond, but, say, once a second with a soft commit and once a minute 
for a hard commit, using "commit within" to minimize commits when multiple 
processes are indexing data.


AFAICT, no application should ever have to care when a document is actually 
committed - and you have control with commit, anyway.


You the application developer can "tune" the commit interval to balance 
searchability and overall efficiency. There shouldn't be any problem there, 
given the variety of commit methods that Solr supports, but you have to make 
the choices.


So, what's the problem you are trying to solve? You still haven't 
articulated it.


It sounds as if you are trying to solve a non-problem. But, we can't be sure 
since you haven't articulated what the actual problem (if any) really is.


-- Jack Krupansky

-Original Message- 
From: phanichaitanya

Sent: Thursday, September 12, 2013 1:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Get the commit time of a document in Solr

Hi Jack,

 Sorry, I was not clear earlier. What I'm trying to achieve is :

I want to know when a document is committed (hard commit). There can be a
lot of time lapse (1 hour or more) between the time you indexed that
document vs you issue a commit in my case. Now, I exactly want to know when
a document is committed.

In my previous example all 1800 docs are committed at 9:30 AM and I want to
know that time for those 1800 docs. In other batch it'll be some other time.

The use-case is I've have more than 1 process sending the update requests to
Solr and each of those process has a separate commit step and I want to know
the commit time of the documents that were committed when I gave a commit
request.

I hope I'm clear now - please let me know if I'm not.



-
Phani Chaitanya
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089662.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: SolrCloud 4.x hangs under high update volume

2013-09-12 Thread Mark Miller
Right, I don't see SOLR-5232 making 4.5 unfortunately. It could perhaps make a 
4.5.1 - it does resolve a critical issue - but 4.5 is in motion and SOLR-5232 
is not quite ready - we need some testing.

- Mark

On Sep 12, 2013, at 2:12 PM, Erick Erickson  wrote:

> My take on it is this, assuming I'm reading this right:
> 1> SOLR-5216 - probably not going anywhere, 5232 will take care of it.
> 2> SOLR-5232 - expected to fix the underlying issue no matter whether
> you're using CloudSolrServer from SolrJ or sending lots of updates from
> lots of clients.
> 3> SOLR-4816 - use this patch and CloudSolrServer from SolrJ in the
> meantime.
> 
> I don't quite know whether SOLR-5232 will make it in to 4.5 or not, it
> hasn't been committed anywhere yet. The Solr 4.5 release is imminent, RC0
> is looking like it'll be ready to cut next week so it might not be included.
> 
> Best,
> Erick
> 
> 
> On Thu, Sep 12, 2013 at 1:42 PM, Tim Vaillancourt 
> wrote:
> 
>> Lol, at breaking during a demo - always the way it is! :) I agree, we are
>> just tip-toeing around the issue, but waiting for 4.5 is definitely an
>> option if we "get-by" for now in testing; patched Solr versions seem to
>> make people uneasy sometimes :).
>> 
>> Seeing there seems to be some danger to SOLR-5216 (in some ways it blows up
>> worse due to less limitations on thread), I'm guessing only SOLR-5232 and
>> SOLR-4816 are making it into 4.5? I feel those 2 in combination will make a
>> world of difference!
>> 
>> Thanks so much again guys!
>> 
>> Tim
>> 
>> 
>> 
>> On 12 September 2013 03:43, Erick Erickson 
>> wrote:
>> 
>>> Fewer client threads updating makes sense, and going to 1 core also seems
>>> like it might help. But it's all a crap-shoot unless the underlying cause
>>> gets fixed up. Both would improve things, but you'll still hit the
>> problem
>>> sometime, probably when doing a demo for your boss ;).
>>> 
>>> Adrien has branched the code for SOLR 4.5 in preparation for a release
>>> candidate tentatively scheduled for next week. You might just start
>> working
>>> with that branch if you can rather than apply individual patches...
>>> 
>>> I suspect there'll be a couple more changes to this code (looks like
>>> Shikhar already raised an issue for instance) before 4.5 is finally
>> cut...
>>> 
>>> FWIW,
>>> Erick
>>> 
>>> 
>>> 
>>> On Thu, Sep 12, 2013 at 2:13 AM, Tim Vaillancourt >>> wrote:
>>> 
 Thanks Erick!
 
 Yeah, I think the next step will be CloudSolrServer with the SOLR-4816
 patch. I think that is a very, very useful patch by the way. SOLR-5232
 seems promising as well.
 
 I see your point on the more-shards idea, this is obviously a
 global/instance-level lock. If I really had to, I suppose I could run
>>> more
 Solr instances to reduce locking then? Currently I have 2 cores per
 instance and I could go 1-to-1 to simplify things.
 
 The good news is we seem to be more stable since changing to a bigger
 client->solr batch-size and fewer client threads updating.
 
 Cheers,
 
 Tim
 
 On 11/09/13 04:19 AM, Erick Erickson wrote:
 
> If you use CloudSolrServer, you need to apply SOLR-4816 or use a
>> recent
> copy of the 4x branch. By "recent", I mean like today, it looks like
>>> Mark
> applied this early this morning. But several reports indicate that
>> this
> will
> solve your problem.
> 
> I would expect that increasing the number of shards would make the
>>> problem
> worse, not
> better.
> 
> There's also SOLR-5232...
> 
> Best
> Erick
> 
> 
> On Tue, Sep 10, 2013 at 5:20 PM, Tim Vaillancourt>> **com
>> wrote:
> 
> Hey guys,
>> 
>> Based on my understanding of the problem we are encountering, I feel
>> we've
>> been able to reduce the likelihood of this issue by making the
>>> following
>> changes to our app's usage of SolrCloud:
>> 
>> 1) We increased our document batch size to 200 from 10 - our app
>>> batches
>> updates to reduce HTTP requests/overhead. The theory is increasing
>> the
>> batch size reduces the likelihood of this issue happening.
>> 2) We reduced to 1 application node sending updates to SolrCloud - we
>> write
>> Solr updates to Redis, and have previously had 4 application nodes
>> pushing
>> the updates to Solr (popping off the Redis queue). Reducing the
>> number
>>> of
>> nodes pushing to Solr reduces the concurrency on SolrCloud.
>> 3) Less threads pushing to SolrCloud - due to the increase in batch
>>> size,
>> we were able to go down to 5 update threads on the update-pushing-app
>> (from
>> 10 threads).
>> 
>> To be clear the above only reduces the likelihood of the issue
>>> happening,
>> and DOES NOT actually resolve the issue at hand.
>> 
>> If we happen to encounter issues with the above 3 changes, the next
>>> steps
>> (I could use some advi

Re: SolrCloud 4.x hangs under high update volume

2013-09-12 Thread Tim Vaillancourt
Lol, at breaking during a demo - always the way it is! :) I agree, we are
just tip-toeing around the issue, but waiting for 4.5 is definitely an
option if we "get-by" for now in testing; patched Solr versions seem to
make people uneasy sometimes :).

Seeing there seems to be some danger to SOLR-5216 (in some ways it blows up
worse due to less limitations on thread), I'm guessing only SOLR-5232 and
SOLR-4816 are making it into 4.5? I feel those 2 in combination will make a
world of difference!

Thanks so much again guys!

Tim



On 12 September 2013 03:43, Erick Erickson  wrote:

> Fewer client threads updating makes sense, and going to 1 core also seems
> like it might help. But it's all a crap-shoot unless the underlying cause
> gets fixed up. Both would improve things, but you'll still hit the problem
> sometime, probably when doing a demo for your boss ;).
>
> Adrien has branched the code for SOLR 4.5 in preparation for a release
> candidate tentatively scheduled for next week. You might just start working
> with that branch if you can rather than apply individual patches...
>
> I suspect there'll be a couple more changes to this code (looks like
> Shikhar already raised an issue for instance) before 4.5 is finally cut...
>
> FWIW,
> Erick
>
>
>
> On Thu, Sep 12, 2013 at 2:13 AM, Tim Vaillancourt  >wrote:
>
> > Thanks Erick!
> >
> > Yeah, I think the next step will be CloudSolrServer with the SOLR-4816
> > patch. I think that is a very, very useful patch by the way. SOLR-5232
> > seems promising as well.
> >
> > I see your point on the more-shards idea, this is obviously a
> > global/instance-level lock. If I really had to, I suppose I could run
> more
> > Solr instances to reduce locking then? Currently I have 2 cores per
> > instance and I could go 1-to-1 to simplify things.
> >
> > The good news is we seem to be more stable since changing to a bigger
> > client->solr batch-size and fewer client threads updating.
> >
> > Cheers,
> >
> > Tim
> >
> > On 11/09/13 04:19 AM, Erick Erickson wrote:
> >
> >> If you use CloudSolrServer, you need to apply SOLR-4816 or use a recent
> >> copy of the 4x branch. By "recent", I mean like today, it looks like
> Mark
> >> applied this early this morning. But several reports indicate that this
> >> will
> >> solve your problem.
> >>
> >> I would expect that increasing the number of shards would make the
> problem
> >> worse, not
> >> better.
> >>
> >> There's also SOLR-5232...
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Tue, Sep 10, 2013 at 5:20 PM, Tim Vaillancourt **com
> >> >wrote:
> >>
> >>  Hey guys,
> >>>
> >>> Based on my understanding of the problem we are encountering, I feel
> >>> we've
> >>> been able to reduce the likelihood of this issue by making the
> following
> >>> changes to our app's usage of SolrCloud:
> >>>
> >>> 1) We increased our document batch size to 200 from 10 - our app
> batches
> >>> updates to reduce HTTP requests/overhead. The theory is increasing the
> >>> batch size reduces the likelihood of this issue happening.
> >>> 2) We reduced to 1 application node sending updates to SolrCloud - we
> >>> write
> >>> Solr updates to Redis, and have previously had 4 application nodes
> >>> pushing
> >>> the updates to Solr (popping off the Redis queue). Reducing the number
> of
> >>> nodes pushing to Solr reduces the concurrency on SolrCloud.
> >>> 3) Less threads pushing to SolrCloud - due to the increase in batch
> size,
> >>> we were able to go down to 5 update threads on the update-pushing-app
> >>> (from
> >>> 10 threads).
> >>>
> >>> To be clear the above only reduces the likelihood of the issue
> happening,
> >>> and DOES NOT actually resolve the issue at hand.
> >>>
> >>> If we happen to encounter issues with the above 3 changes, the next
> steps
> >>> (I could use some advice on) are:
> >>>
> >>> 1) Increase the number of shards (2x) - the theory here is this reduces
> >>> the
> >>> locking on shards because there are more shards. Am I onto something
> >>> here,
> >>> or will this not help at all?
> >>> 2) Use CloudSolrServer - currently we have a plain-old least-connection
> >>> HTTP VIP. If we go "direct" to what we need to update, this will reduce
> >>> concurrency in SolrCloud a bit. Thoughts?
> >>>
> >>> Thanks all!
> >>>
> >>> Cheers,
> >>>
> >>> Tim
> >>>
> >>>
> >>> On 6 September 2013 14:47, Tim Vaillancourt t...@elementspace.com>>
> >>>  wrote:
> >>>
> >>>  Enjoy your trip, Mark! Thanks again for the help!
> 
>  Tim
> 
> 
>  On 6 September 2013 14:18, Mark Miller  wrote:
> 
>   Okay, thanks, useful info. Getting on a plane, but ill look more at
> > this
> > soon. That 10k thread spike is good to know - that's no good and
> could
> > easily be part of the problem. We want to keep that from happening.
> >
> > Mark
> >
> > Sent from my iPhone
> >
> > On Sep 6, 2013, at 2:05 PM, Tim Vaillancourt t...@elementspace.com>
> > >
> > wrote:
> >
> >  Hey Mark,
>

Re: Get the commit time of a document in Solr

2013-09-12 Thread phanichaitanya
Hi Jack,

  Sorry, I was not clear earlier. What I'm trying to achieve is :

I want to know when a document is committed (hard commit). There can be a
lot of time lapse (1 hour or more) between the time you indexed that
document vs you issue a commit in my case. Now, I exactly want to know when
a document is committed.

In my previous example all 1800 docs are committed at 9:30 AM and I want to
know that time for those 1800 docs. In other batch it'll be some other time.

The use-case is I've have more than 1 process sending the update requests to
Solr and each of those process has a separate commit step and I want to know
the commit time of the documents that were committed when I gave a commit
request.

I hope I'm clear now - please let me know if I'm not. 



-
Phani Chaitanya
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089662.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: charset encoding

2013-09-12 Thread Andreas Owen
it was the http-header, as soon as i force a iso-8859-1 header it worked

On 12. Sep 2013, at 9:44 AM, Andreas Owen wrote:

> could it have something to do with the meta encoding tag is iso-8859-1 but 
> the http-header tag is utf8 and firefox inteprets it as utf8?
> 
> On 12. Sep 2013, at 8:36 AM, Andreas Owen wrote:
> 
>> no jetty, and yes for tomcat i've seen a couple of answers
>> 
>> On 12. Sep 2013, at 3:12 AM, Otis Gospodnetic wrote:
>> 
>>> Using tomcat by any chance? The ML archive has the solution. May be on
>>> Wiki, too.
>>> 
>>> Otis
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>> On Sep 11, 2013 8:56 AM, "Andreas Owen"  wrote:
>>> 
 i'm using solr 4.3.1 with tika to index html-pages. the html files are
 iso-8859-1 (ansi) encoded and the meta tag "content-encoding" as well. the
 server-http-header says it's utf8 and firefox-webdeveloper agrees.
 
 when i index a page with special chars like ä,ö,ü solr outputs it
 completly foreign signs, not the normal wrong chars with 1/4 or the Flag in
 it. so it seams that its not simply the normal utf8/iso-8859-1 discrepancy.
 has anyone got a idea whats wrong?
 
 



Re: Get the commit time of a document in Solr

2013-09-12 Thread Jack Krupansky
Slow down, back up, and now tell us what problem (if any!) you are really 
trying to solve. Don't leap to a proposed solution before you clearly state 
the problem to be solved.


First, why do you think there is any problem at all?

Or, what are you really trying to achieve?

-- Jack Krupansky

-Original Message- 
From: phanichaitanya

Sent: Thursday, September 12, 2013 1:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Get the commit time of a document in Solr

So, now I want to know when that document becomes searchable or when it is
committed. I've the following scenario:

1) Indexing starts at say 9:00 AM - with the above additions to the
schema.xml I'll know the indexed time of each document I send to Solr via
the update handler. Say 9:01, 9:02 and so on ... lets say I send a document
for every second between 9 - 9:30 AM and it makes it 30*60 = 1800 docs
2) Now at 9:30 AM, I issue a hard commit and now I'll be able to search
these 1800 documents which is fine.
3) Now I want to know that I can search these 1800 documents only at >=9:30
AM but not < 9:30 AM as I did not do a hard commit before 9:30 AM.

In order to know that, is there a way in Solr rather than some application
keeping track of the documents it sends to Solr between any two commits. The
reason I'm asking is, if there are say two parallel processes indexing to
the same index and one process issues a commit - then whatever documents
process two indexed until that point of time would also be committed right ?
Now if I keep track of commit times in each process it doesn't reflect the
true commit times as they are inter-twined.



-
Phani Chaitanya
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089638.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Facet counting empty as well.. how to prevent this?

2013-09-12 Thread Raheel Hasan
ok, so I got the idea... I will pull 7 fields instead and remove the empty
one...

But there must be some setting that can be done in Facet configuration to
ignore certain value if we want to


On Thu, Sep 12, 2013 at 7:44 PM, Shawn Heisey  wrote:

> On 9/12/2013 7:54 AM, Raheel Hasan wrote:
> > I got a small issue here, my facet settings are returning counts for
> empty
> > "". I.e. when no the actual field was empty.
> >
> > Here are the facet settings:
> >
> > count
> > 6
> > 1
> > false
> >
> > and this is the part of the result I dont want:
> > 4
>
> The "facet.missing" parameter has to do with whether or not to display
> counts for documents that have no value at all for that field.
>
> Even though it might seem wrong, the empty string is a valid value, so
> you can't fix this with faceting parameters.  If you don't want that to
> be in your index, then you can add the LengthFilterFactory to your
> analyzer to remove terms with a length less than 1.  You might also
> check to see whether the field definition in your schema has a default
> value set to the empty string.
>
> If you are using DocValues (Solr 4.2 and later), then the indexed terms
> aren't used for facets, and it won't matter what you do to your analysis
> chain.  With DocValues, Solr basically uses a value equivalent to the
> stored value.  To get rid of the empty string with DocValues, you'll
> need to either change your indexing process so it doesn't send empty
> strings, or use a custom UpdateProcessor to change the data before it
> gets indexed.
>
> Thanks,
> Shawn
>
>


-- 
Regards,
Raheel Hasan


Re: Facet counting empty as well.. how to prevent this?

2013-09-12 Thread Shawn Heisey
On 9/12/2013 7:54 AM, Raheel Hasan wrote:
> I got a small issue here, my facet settings are returning counts for empty
> "". I.e. when no the actual field was empty.
> 
> Here are the facet settings:
> 
> count
> 6
> 1
> false
> 
> and this is the part of the result I dont want:
> 4

The "facet.missing" parameter has to do with whether or not to display
counts for documents that have no value at all for that field.

Even though it might seem wrong, the empty string is a valid value, so
you can't fix this with faceting parameters.  If you don't want that to
be in your index, then you can add the LengthFilterFactory to your
analyzer to remove terms with a length less than 1.  You might also
check to see whether the field definition in your schema has a default
value set to the empty string.

If you are using DocValues (Solr 4.2 and later), then the indexed terms
aren't used for facets, and it won't matter what you do to your analysis
chain.  With DocValues, Solr basically uses a value equivalent to the
stored value.  To get rid of the empty string with DocValues, you'll
need to either change your indexing process so it doesn't send empty
strings, or use a custom UpdateProcessor to change the data before it
gets indexed.

Thanks,
Shawn



Re: Grouping by field substring?

2013-09-12 Thread Ken Krugler
Hi Jack,

On Sep 11, 2013, at 5:34pm, Jack Krupansky wrote:

> Do a copyField to another field, with a limit of 8 characters, and then use 
> that other field.

Thanks - I should have included a few more details in my original question.

The issue is that I've got an index with 200M records, of which about 50M have 
a unique value for this prefix (which is 32 characters long)

So adding another indexed field would be significant, which is why I was hoping 
there was a way to do it via grouping/collapsing at query time.

Or is that just not possible?

Thanks,

-- Ken

> -Original Message- From: Ken Krugler
> Sent: Wednesday, September 11, 2013 8:24 PM
> To: solr-user@lucene.apache.org
> Subject: Grouping by field substring?
> 
> Hi all,
> 
> Assuming I want to use the first N characters of a specific field for 
> grouping results, is such a thing possible out-of-the-box?
> 
> If not, then what would the next best option be? E.g. a custom function query?
> 
> Thanks,
> 
> -- Ken
> 
> --
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
> 
> 
> 
> 
> 

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr







Solr cloud shard goes down after SocketException in another shard

2013-09-12 Thread neoman
Exception in  shard1 (solr01-prod) primary
<09/12/13
13:56:46:635|http-bio-8080-exec-66|ERROR|apache.solr.servlet.SolrDispatchFilter|null:ClientAbortException:
 
java.net.SocketException: Broken pipe
at
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:406)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:342)
at
org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:431)
at
org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:419)
at
org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:91)
at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)
at
org.apache.solr.common.util.FastOutputStream.write(FastOutputStream.java:95)
at
org.apache.solr.common.util.JavaBinCodec.writeStr(JavaBinCodec.java:470)
at
org.apache.solr.common.util.JavaBinCodec.writePrimitive(JavaBinCodec.java:545)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:232)
at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
at
org.apache.solr.common.util.JavaBinCodec.writeSolrDocument(JavaBinCodec.java:320)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:257)
at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
at
org.apache.solr.common.util.JavaBinCodec.writeArray(JavaBinCodec.java:427)
at
org.apache.solr.common.util.JavaBinCodec.writeSolrDocumentList(JavaBinCodec.java:356)


Exception in  shard1 (solr08-prod) secondary

<09/12/13
13:56:46:729|http-bio-8080-exec-50|ERROR|apache.solr.core.SolrCore|org.apache.solr.common.SolrException:
ClusterState says we are the leader (http://solr08-prod:8080/solr/aq-core),
but locally we don't think so. Request came from
http://solr03-prod.phneaz:8080/solr/aq-core/
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:381)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:243)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)

Out configuration 
Solr 4.4, Tomcat 7, 3 shards
Thanks for your help



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cloud-shard-goes-down-after-SocketException-in-another-shard-tp4089576.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr cloud shard goes down after SocketException in another shard

2013-09-12 Thread Greg Walters
Neoman,

I've got ours set at 45 seconds:

${zkClientTimeout:45000}


-Original Message-
From: neoman [mailto:harira...@gmail.com] 
Sent: Thursday, September 12, 2013 9:33 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr cloud shard goes down after SocketException in another shard

Thanks greg. Currently we have 60 seconds (we reduced it recently). I may have 
to reduce it again. can you please share your timeout value.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cloud-shard-goes-down-after-SocketException-in-another-shard-tp4089576p4089582.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr cloud shard goes down after SocketException in another shard

2013-09-12 Thread neoman
Thanks greg. Currently we have 60 seconds (we reduced it recently). I may
have to reduce it again. can you please share your timeout value.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cloud-shard-goes-down-after-SocketException-in-another-shard-tp4089576p4089582.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr cloud shard goes down after SocketException in another shard

2013-09-12 Thread Greg Walters
Neoman,

Make sure that solr08-prod (or the elected leader at any time) isn't doing a 
stop-the-world garbage collection that takes long enough that the zookeeper 
connection times out. I've seen that in my cluster when I didn't have parallel 
GC enabled and my "zkClientTimeout" in solr.xml was too low.

Thanks,
Greg

-Original Message-
From: neoman [mailto:harira...@gmail.com] 
Sent: Thursday, September 12, 2013 9:19 AM
To: solr-user@lucene.apache.org
Subject: Solr cloud shard goes down after SocketException in another shard

Exception in  shard1 (solr01-prod) primary
<09/12/13
13:56:46:635|http-bio-8080-exec-66|ERROR|apache.solr.servlet.SolrDispatchFilter|null:ClientAbortException:
 
java.net.SocketException: Broken pipe
at
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:406)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:342)
at
org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:431)
at
org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:419)
at
org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:91)
at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)
at
org.apache.solr.common.util.FastOutputStream.write(FastOutputStream.java:95)
at
org.apache.solr.common.util.JavaBinCodec.writeStr(JavaBinCodec.java:470)
at
org.apache.solr.common.util.JavaBinCodec.writePrimitive(JavaBinCodec.java:545)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:232)
at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
at
org.apache.solr.common.util.JavaBinCodec.writeSolrDocument(JavaBinCodec.java:320)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:257)
at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
at
org.apache.solr.common.util.JavaBinCodec.writeArray(JavaBinCodec.java:427)
at
org.apache.solr.common.util.JavaBinCodec.writeSolrDocumentList(JavaBinCodec.java:356)


Exception in  shard1 (solr08-prod) secondary

<09/12/13
13:56:46:729|http-bio-8080-exec-50|ERROR|apache.solr.core.SolrCore|org.apache.solr.common.SolrException:
ClusterState says we are the leader (http://solr08-prod:8080/solr/aq-core),
but locally we don't think so. Request came from 
http://solr03-prod.phneaz:8080/solr/aq-core/
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:381)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:243)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)

Out configuration
Solr 4.4, Tomcat 7, 3 shards
Thanks for your help



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cloud-shard-goes-down-after-SocketException-in-another-shard-tp4089576.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Storing/indexing speed drops quickly

2013-09-12 Thread Shawn Heisey
On 9/12/2013 2:14 AM, Per Steffensen wrote:
>> Starting from an empty collection. Things are fine wrt
>> storing/indexing speed for the first two-three hours (100M docs per
>> hour), then speed goes down dramatically, to an, for us, unacceptable
>> level (max 10M per hour). At the same time as speed goes down, we see
>> that I/O wait increases dramatically. I am not 100% sure, but quick
>> investigation has shown that this is due to almost constant merging.

While constant merging is contributing to the slowdown, I would guess
that your index is simply too big for the amount of RAM that you have.
Let's ignore for a minute that you're distributed and just concentrate
on one machine.

After three hours of indexing, you have nearly 300 million documents.
If you have a replicationFactor of 1, that's still 50 million documents
per machine.  If your replicationFactor is 2, you've got 100 million
documents per machine.  Let's focus on the smaller number for a minute.

50 million documents in an index, even if they are small documents, is
probably going to result in an index size of at least 20GB, and quite
possibly larger.  In order to make Solr function with that many
documents, I would guess that you have a heap that's at least 4GB in size.

With only 8GB on the machine, this doesn't leave much RAM for the OS
disk cache.  If we assume that you have 4GB left for caching, then I
would expect to see problems about the time your per-machine indexes hit
15GB in size.  If you are making it beyond that with a total of 300
million documents, then I am impressed.

Two things are going to happen when you have enough documents:  1) You
are going to fill up your Java heap and Java will need to do frequent
collections to free up enough RAM for normal operation.  When this
problem gets bad enough, the frequent collections will be *full* GCs,
which are REALLY slow.  2) The index will be so big that the OS disk
cache cannot effectively cache it.  I suspect that the latter is more of
the problem, but both might be happening at nearly the same time.

When dealing with an index of this size, you want as much RAM as you can
possibly afford.  I don't think I would try what you are doing without
at least 64GB per machine, and I would probably use at least an 8GB heap
on each one, quite possibly larger.  With a heap that large, extreme GC
tuning becomes a necessity.

To cut down on the amount of merging, I go with a fairly large
mergeFactor, but mergeFactor is basically deprecated for
TieredMergePolicy, there's a new way to configure it now.  Here's the
indexConfig settings that I use on my dev server:


  
35
35
105
  
  
1
6
  
  48
  false


Thanks,
Shawn



Re: SolrCloud behave differently on server and local

2013-09-12 Thread cihat güzel
My problem is solved. My server default java version is 1.5 . I upgrade
java version.


2013/9/12 cihat güzel 

> hi all.
> I am trying solr cloud on my server. The server is a virtual machine.
>
> I have followed solr cloude wiki " http://wiki.apache.org/solr/SolrCloud
>  ".
> When I run solr Cloud, It si failed.  But If I try on my local ,it runs
> successfully. Why does solr behave differently on server and local?
>
> My solr.log as follows:
>
> INFO  - 2013-09-12 14:50:13.389;
> org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init() done
> ERROR - 2013-09-12 14:50:13.433; org.apache.solr.core.CoreContainer;
> CoreContainer was not shutdown prior to finalize(), indicates a bug --
> POSSIBLE RESOURCE LEAK!!!  instance=1423856966
> INFO  - 2013-09-12 14:50:13.483;
> org.eclipse.jetty.server.AbstractConnector; Started
> SocketConnector@0.0.0.0:8983
> INFO  - 2013-09-12 14:57:01.776; org.eclipse.jetty.server.Server;
> jetty-8.1.10.v20130312
> INFO  - 2013-09-12 14:57:01.838;
> org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor
> /opt/Applications/solr-4.4.0/example/contexts at interval 0
> INFO  - 2013-09-12 14:57:01.846;
> org.eclipse.jetty.deploy.DeploymentManager; Deployable added:
> /opt/Applications/solr-4.4.0/example/contexts/solr-jetty-context.xml
> INFO  - 2013-09-12 14:57:02.549;
> org.eclipse.jetty.webapp.StandardDescriptorProcessor; NO JSP Support for
> /solr, did not find org.apache.jasper.servlet.JspServlet
> INFO  - 2013-09-12 14:57:02.656;
> org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init()
> INFO  - 2013-09-12 14:57:02.797; org.apache.solr.core.SolrResourceLoader;
> JNDI not configured for solr (NoInitialContextEx)
> INFO  - 2013-09-12 14:57:02.799; org.apache.solr.core.SolrResourceLoader;
> solr home defaulted to 'solr/' (could not find system property or JNDI)
> INFO  - 2013-09-12 14:57:02.801; org.apache.solr.core.SolrResourceLoader;
> new SolrResourceLoader for directory: 'solr/'
> INFO  - 2013-09-12 14:57:02.917; org.apache.solr.core.ConfigSolr; Loading
> container configuration from
> /opt/Applications/solr-4.4.0/example/solr/solr.xml
> ERROR - 2013-09-12 14:57:03.072;
> org.apache.solr.servlet.SolrDispatchFilter; Could not start Solr. Check
> solr/home property and the logs
> ERROR - 2013-09-12 14:57:03.098; org.apache.solr.common.SolrException;
> null:org.apache.solr.common.SolrException: Could not load SOLR configuration
>at org.apache.solr.core.ConfigSolr.fromFile(ConfigSolr.java:65)
>at org.apache.solr.core.ConfigSolr.fromSolrHome(ConfigSolr.java:89)
>at org.apache.solr.core.CoreContainer.(CoreContainer.java:139)
>at org.apache.solr.core.CoreContainer.(CoreContainer.java:129)
>at
> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:139)
>at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:122)
>at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:119)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>at
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:719)
>at
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:265)
>at
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1252)
>at
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:710)
>at
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:494)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>at
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:39)
>at
> org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:186)
>at
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:494)
>at
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:141)
>at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:145)
>at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:56)
>at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:609)
>at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:540)
>at org.eclipse.jetty.util.Scanner.scan(Scanner.java:403)
>at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:337)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:121)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>at
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:555)
>at
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:230)
>at
> org.eclipse.jetty.util.componen

Re: SolrCloud behave differently on server and local

2013-09-12 Thread cihat güzel
My problem is solved. My server default java version is 1.5 . I upgrade
java version.


2013/9/12 cihat güzel 

> hi all.
> I am trying solr cloud on my server. The server is a virtual machine.
>
> I have followed solr cloude wiki " http://wiki.apache.org/solr/SolrCloud
>  ".
> When I run solr Cloud, It si failed.  But If I try on my local ,it runs
> successfully. Why does solr behave differently on server and local?
>
> My solr.log as follows:
>
> INFO  - 2013-09-12 14:50:13.389;
> org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init() done
> ERROR - 2013-09-12 14:50:13.433; org.apache.solr.core.CoreContainer;
> CoreContainer was not shutdown prior to finalize(), indicates a bug --
> POSSIBLE RESOURCE LEAK!!!  instance=1423856966
> INFO  - 2013-09-12 14:50:13.483;
> org.eclipse.jetty.server.AbstractConnector; Started
> SocketConnector@0.0.0.0:8983
> INFO  - 2013-09-12 14:57:01.776; org.eclipse.jetty.server.Server;
> jetty-8.1.10.v20130312
> INFO  - 2013-09-12 14:57:01.838;
> org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor
> /opt/Applications/solr-4.4.0/example/contexts at interval 0
> INFO  - 2013-09-12 14:57:01.846;
> org.eclipse.jetty.deploy.DeploymentManager; Deployable added:
> /opt/Applications/solr-4.4.0/example/contexts/solr-jetty-context.xml
> INFO  - 2013-09-12 14:57:02.549;
> org.eclipse.jetty.webapp.StandardDescriptorProcessor; NO JSP Support for
> /solr, did not find org.apache.jasper.servlet.JspServlet
> INFO  - 2013-09-12 14:57:02.656;
> org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init()
> INFO  - 2013-09-12 14:57:02.797; org.apache.solr.core.SolrResourceLoader;
> JNDI not configured for solr (NoInitialContextEx)
> INFO  - 2013-09-12 14:57:02.799; org.apache.solr.core.SolrResourceLoader;
> solr home defaulted to 'solr/' (could not find system property or JNDI)
> INFO  - 2013-09-12 14:57:02.801; org.apache.solr.core.SolrResourceLoader;
> new SolrResourceLoader for directory: 'solr/'
> INFO  - 2013-09-12 14:57:02.917; org.apache.solr.core.ConfigSolr; Loading
> container configuration from
> /opt/Applications/solr-4.4.0/example/solr/solr.xml
> ERROR - 2013-09-12 14:57:03.072;
> org.apache.solr.servlet.SolrDispatchFilter; Could not start Solr. Check
> solr/home property and the logs
> ERROR - 2013-09-12 14:57:03.098; org.apache.solr.common.SolrException;
> null:org.apache.solr.common.SolrException: Could not load SOLR configuration
>at org.apache.solr.core.ConfigSolr.fromFile(ConfigSolr.java:65)
>at org.apache.solr.core.ConfigSolr.fromSolrHome(ConfigSolr.java:89)
>at org.apache.solr.core.CoreContainer.(CoreContainer.java:139)
>at org.apache.solr.core.CoreContainer.(CoreContainer.java:129)
>at
> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:139)
>at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:122)
>at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:119)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>at
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:719)
>at
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:265)
>at
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1252)
>at
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:710)
>at
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:494)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>at
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:39)
>at
> org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:186)
>at
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:494)
>at
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:141)
>at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:145)
>at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:56)
>at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:609)
>at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:540)
>at org.eclipse.jetty.util.Scanner.scan(Scanner.java:403)
>at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:337)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:121)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>at
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:555)
>at
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:230)
>at
> org.eclipse.jetty.util.componen

Facet counting empty as well.. how to prevent this?

2013-09-12 Thread Raheel Hasan
Hi,

I got a small issue here, my facet settings are returning counts for empty
"". I.e. when no the actual field was empty.

Here are the facet settings:

count
6
1
false

and this is the part of the result I dont want:
4

(that is coming because the query results had 4 rows with no value in that
field whole facet counts are being called).

Rest all is working just fine

-- 
Regards,
Raheel Hasan


Re: No or limited use of FieldCache

2013-09-12 Thread Per Steffensen

On 9/12/13 3:28 PM, Toke Eskildsen wrote:

On Thu, 2013-09-12 at 14:48 +0200, Per Steffensen wrote:

Actually some months back I made PoC of a FieldCache that could expand
beyond the heap. Basically imagine a FieldCache with room for
"unlimited" data-arrays, that just behind the scenes goes to
memory-mapped files when there is no more room on heap.

That sounds a lot like disk-based DocValues.


He he

But that solution will also have the "running out of swap space"-problems.

Not really. Memory mapping works like the disk cache: There is no
requirement that a certain amount of physical memory needs to be
available, it just takes what it can get. If there are not a lot of
physical memory, it will require a lot of storage access, but it will
not over-allocate swap space.
That was also my impression, but during the work, I experienced some 
problems around swap space, but I do not remember exactly what I saw, 
and therefore how I concluded that everything in mm-files actually have 
to fit in physical mem + swap. I might very well have been wrong in that 
conclusion

It seems that different setups vary quite a lot in this area and some
systems are prone to aggressive use of the swap file, which can severely
harm responsiveness of applications with out-swapped data.

However, this should still not result in any OOM's, as the system can
always discard some of the memory mapped data if it needs more physical
memory.

I saw no OOMs

- Toke Eskildsen, State and University Library, Denmark





Re: No or limited use of FieldCache

2013-09-12 Thread Toke Eskildsen
On Thu, 2013-09-12 at 14:48 +0200, Per Steffensen wrote:
> Actually some months back I made PoC of a FieldCache that could expand 
> beyond the heap. Basically imagine a FieldCache with room for 
> "unlimited" data-arrays, that just behind the scenes goes to 
> memory-mapped files when there is no more room on heap.

That sounds a lot like disk-based DocValues.

[...]

> But that solution will also have the "running out of swap space"-problems.

Not really. Memory mapping works like the disk cache: There is no
requirement that a certain amount of physical memory needs to be
available, it just takes what it can get. If there are not a lot of
physical memory, it will require a lot of storage access, but it will
not over-allocate swap space.


It seems that different setups vary quite a lot in this area and some
systems are prone to aggressive use of the swap file, which can severely
harm responsiveness of applications with out-swapped data.

However, this should still not result in any OOM's, as the system can
always discard some of the memory mapped data if it needs more physical
memory.

- Toke Eskildsen, State and University Library, Denmark




Re: Help in resolving the below retrieval issue

2013-09-12 Thread Jack Krupansky
Question mark and asterisk are wildcard characters, so if you want them to 
be treated as punctuation, either enclose the terms in quotes or escape the 
characters.


Wildcard characters suppress the execution of some token filters if they are 
not able to cope with wildcards.


-- Jack Krupansky

-Original Message- 
From: Prathik Puthran

Sent: Thursday, September 12, 2013 7:01 AM
To: solr-user@lucene.apache.org
Subject: Re: Help in resolving the below retrieval issue

Hi,

I am also seeing this issue when the search query is something like "how
are you?" (Quotes for clarity).
The query parser splits it to the below tokens:
+text:whats +text:your +text:raashee?

However when I remove the "?" from the search query "how are you" I get the
results.
Is "?" a special character? Should it be escaped as well?


On Wed, Sep 11, 2013 at 1:50 AM, Jack Krupansky 
wrote:



Removing stray hyphens (embedded hyphens, like "CD-ROM", are okay) or
escaping them with backslash looks like your best bests. There's no query
parser option to disable the hyphen as an exlusion operator, although an
upgrade to a "modern" Solr should fix the problem.


-- Jack Krupansky

-Original Message- From: Prathik Puthran
Sent: Tuesday, September 10, 2013 4:13 PM
To: solr-user@lucene.apache.org

Subject: Re: Help in resolving the below retrieval issue

I'm using Solr 3.4.


This bug is causing the 2nd term i.e. "kumar" to be treated as an 
exclusion

operator?
Is it possible to configure the query parser to not treat the '-' as
exclusion operator ?
If not the only way is to remove the '-' from the query string?

Thanks,
Prathik


On Tue, Sep 10, 2013 at 10:36 PM, Jack Krupansky 
**wrote:

 What release of Solr are you using?


It appears that the hyphen is being treated as an exclusion operator even
though it is followed by a space. Solr 4.4 doesn't appear to do that, but
maybe earlier releases had a problem.

In any case, be careful with leading hyphen in queries since it does mean
exclude documents that contain the following term.

Or, just escape any leading hyphen with a backslash.

-- Jack Krupansky

-Original Message- From: Prathik Puthran
Sent: Tuesday, September 10, 2013 11:47 AM
To: d...@lucene.apache.org ; solr-user@lucene.apache.org
Subject: Re: Help in resolving the below retrieval issue


Thanks Erick for the response.
I tried to debug the query. Below is the response in the debug node

Rahul - kumarRahul
- kumar+text:Rahul -text:kumar+text:Rahul -text:kumarLuceneQParserRahul - kumar+text:rahul -text:kumar



Does it mean the query parser has parsed it to tokens "Rahul -" and
"kumar"?
Even if this was the case solr should be able to retrieve the documents
because I have indexed all the documents based on n-grams as well.

Thanks,
Prathik


On Tue, Sep 10, 2013 at 7:09 PM, Erick Erickson *
*wrote:


 Try adding &debug=query to the url. What I think you'll find is that


you're running into
a common issue, the difference between query parsing and analysis.

when you submit anything with whitespace in it, the query parser will
break it up
_before_ it gets to the analysis part, you should see something in the
debug
portion of the query like
field:rahul field:kumar and possibly even field:-

These are searched as separate tokens. By specifying KeywordTokenizer, 
at

index time you'll have exactly one token, rahul-kumar in the index which
will not
match any of the separated tokens

Try escaping the spaces with backslash. You could also try quoting the
input although
that has some phrase implications.

Do you really want this search to fail on just searching "rahul" though?
Perhaps
keywordTokenizer isn't best here, it depends upon your use-case...

Best,
Erick


On Tue, Sep 10, 2013 at 8:10 AM, Prathik Puthran <
prathik.puthra...@gmail.com> wrote:

 Hi,



I am facing the below issue where in Solr is not retrieving the indexed
word for some cases.

This happens whenever the indexed word has string " - " (quotes for
clarity) as substring i.e word prefix followed by a space which is
followed
by '-' again followed by a space and followed by the rest of the word
suffix.
When I search with search query being the exact string Solr returns no
results.

Example:
Indexed word --> "Rahul - kumar"  (quotes for clarity)
If I search with the search query as below Solr gives no results
Search query --> "Rahul - kumar"  (quotes for clarity)

However the below search query returns the results
Search query --> "Rahul kumar"

Can you please let me know what I am doing wrong here and what should I
do to ensure the first query i.e. "Rahul - kumar" returns the documents
indexed using it.

Below are the analyzers I am using:
Index time analyzer components:
1) 
 2) 
 3) 
 4) 
 5) 
 6) 

Query time analyzer components:
 1) 
 2) 
 3) 
 4) 


Can you please let me know how I can fix this?

Thanks,
Prathik














Re: No or limited use of FieldCache

2013-09-12 Thread Per Steffensen

Yes, thanks.

Actually some months back I made PoC of a FieldCache that could expand 
beyond the heap. Basically imagine a FieldCache with room for 
"unlimited" data-arrays, that just behind the scenes goes to 
memory-mapped files when there is no more room on heap. Never finished 
it, and it might be kinda stupid because you actually just go read the 
data from lucene indices and write them to memory-mapped files in order 
to use them. It is better to just use the data in the Lucene indices 
instead. But it had some nice features. But that solution will also have 
the "running out of swap space"-problems.


Regards, Per Steffensen

On 9/12/13 12:48 PM, Erick Erickson wrote:

Per:

One thing I'll be curious about. From my reading of DocValues, it uses
little or no heap. But it _will_ use memory from the OS if I followed
Simon's slides correctly. So I wonder if you'll hit swapping issues...
Which are better than OOMs, certainly...

Thanks,
Erick




Re: DataImportHandler oddity

2013-09-12 Thread Shalin Shekhar Mangar
Thanks. It'd be great if you can update this thread if you ever find a
workaround. We will document it on the DataImportHandlerFaq wiki page.

http://wiki.apache.org/solr/DataImportHandlerFaq

On Thu, Sep 12, 2013 at 4:56 PM, Raymond Wiker  wrote:
> That sounds reasonable. I've done some more digging, and found that the
> database instance in this case is an _OLD_ version of Oracle: 9.2.0.8.0. I
> also tried using the OCI driver (version 12), which refuses to even talk to
> this database.
>
> I have three other databases running on more recent versions of Oracle, and
> all three have worked fine with DataImportHandler.
>
>
> On Thu, Sep 12, 2013 at 9:48 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> This is probably a bug with Oracle thin JDBC driver. Google found a
>> similar issue:
>>
>> http://stackoverflow.com/questions/4168494/resultset-getstring-on-varchar2-column-returns-empty-string
>>
>> I don't think this is specific to DataImportHandler.
>>
>>
>> On Thu, Sep 12, 2013 at 12:43 PM, Raymond Wiker  wrote:
>> > Followup: I just tried modifying the select with
>> >
>> > select CAST('APPLICATION' as varchar2(100)) as sourceid, ...
>> >
>> > and that caused the sourceid field to be empty. CASTing to char(100) gave
>> > me the expected value ('APPLICATION', right-padded to 100 characters).
>> >
>> > Meanwhile, google gave me this:
>> http://bugs.caucho.com/view.php?id=4224(via
>> > http://forum.caucho.com/showthread.php?t=27574).
>> >
>> >
>> > On Thu, Sep 12, 2013 at 8:25 AM, Raymond Wiker  wrote:
>> >
>> >> I'm trying to index a view in an Oracle database, and have come across
>> >> some strange behaviour: all the VARCHAR2 fields are being returned as
>> empty
>> >> strings; this also applies to a datetime field converted to a string via
>> >> TO_CHAR, and the url field built by concatenating two constant strings
>> and
>> >> a numeric filed converted via TO_CHAR.
>> >>
>> >> If I cast the fields columns to CHAR(N), I get values back, but this is
>> >> not an acceptable workaround (the maximum length of CHAR(N) is less than
>> >> VARCHAR2(N), and the result is padded to the specified length).
>> >>
>> >> Note that this query works as it should in sqldeveloper, and also in
>> some
>> >> code that uses the .NET sqlclient api.
>> >>
>> >> The query I'm using is
>> >>
>> >> select 'APPLICATION' as sourceid,
>> >>   'http://app.company.com' || '/app/report.aspx?trsid=' ||
>> >> to_char(incident_no) as "URL",
>> >>   incident_no, trans_date, location,
>> >>   responsible_unit, process_eng, product_eng,
>> >>   case_title, case_description,
>> >>   index_lob,
>> >>   investigated, investigated_eng,
>> >>   to_char(modified_date, '-MM-DD"T"HH24:MI:SS"Z"') as modified_date
>> >>   from synx.dw_fast
>> >>   where (investigated <> 3)
>> >>
>> >> while the view is
>> >> INCIDENT_NONUMBER(38)
>> >> TRANS_DATEVARCHAR2(8)
>> >> LOCATIONVARCHAR2(4000)
>> >> RESPONSIBLE_UNITVARCHAR2(4000)
>> >> PROCESS_ENGVARCHAR2(4000)
>> >> PROCESS_NOVARCHAR2(4000)
>> >> PRODUCT_ENGVARCHAR2(4000)
>> >> PRODUCT_NOVARCHAR2(4000)
>> >> CASE_TITLEVARCHAR2(4000)
>> >> CASE_DESCRIPTIONVARCHAR2(4000)
>> >> INDEX_LOBCLOB
>> >> INVESTIGATEDNUMBER(38)
>> >> INVESTIGATED_ENGVARCHAR2(254)
>> >> INVESTIGATED_NOVARCHAR2(254)
>> >> MODIFIED_DATEDATE
>> >>
>> >>
>> >>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>



-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler oddity

2013-09-12 Thread Raymond Wiker
That sounds reasonable. I've done some more digging, and found that the
database instance in this case is an _OLD_ version of Oracle: 9.2.0.8.0. I
also tried using the OCI driver (version 12), which refuses to even talk to
this database.

I have three other databases running on more recent versions of Oracle, and
all three have worked fine with DataImportHandler.


On Thu, Sep 12, 2013 at 9:48 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> This is probably a bug with Oracle thin JDBC driver. Google found a
> similar issue:
>
> http://stackoverflow.com/questions/4168494/resultset-getstring-on-varchar2-column-returns-empty-string
>
> I don't think this is specific to DataImportHandler.
>
>
> On Thu, Sep 12, 2013 at 12:43 PM, Raymond Wiker  wrote:
> > Followup: I just tried modifying the select with
> >
> > select CAST('APPLICATION' as varchar2(100)) as sourceid, ...
> >
> > and that caused the sourceid field to be empty. CASTing to char(100) gave
> > me the expected value ('APPLICATION', right-padded to 100 characters).
> >
> > Meanwhile, google gave me this:
> http://bugs.caucho.com/view.php?id=4224(via
> > http://forum.caucho.com/showthread.php?t=27574).
> >
> >
> > On Thu, Sep 12, 2013 at 8:25 AM, Raymond Wiker  wrote:
> >
> >> I'm trying to index a view in an Oracle database, and have come across
> >> some strange behaviour: all the VARCHAR2 fields are being returned as
> empty
> >> strings; this also applies to a datetime field converted to a string via
> >> TO_CHAR, and the url field built by concatenating two constant strings
> and
> >> a numeric filed converted via TO_CHAR.
> >>
> >> If I cast the fields columns to CHAR(N), I get values back, but this is
> >> not an acceptable workaround (the maximum length of CHAR(N) is less than
> >> VARCHAR2(N), and the result is padded to the specified length).
> >>
> >> Note that this query works as it should in sqldeveloper, and also in
> some
> >> code that uses the .NET sqlclient api.
> >>
> >> The query I'm using is
> >>
> >> select 'APPLICATION' as sourceid,
> >>   'http://app.company.com' || '/app/report.aspx?trsid=' ||
> >> to_char(incident_no) as "URL",
> >>   incident_no, trans_date, location,
> >>   responsible_unit, process_eng, product_eng,
> >>   case_title, case_description,
> >>   index_lob,
> >>   investigated, investigated_eng,
> >>   to_char(modified_date, '-MM-DD"T"HH24:MI:SS"Z"') as modified_date
> >>   from synx.dw_fast
> >>   where (investigated <> 3)
> >>
> >> while the view is
> >> INCIDENT_NONUMBER(38)
> >> TRANS_DATEVARCHAR2(8)
> >> LOCATIONVARCHAR2(4000)
> >> RESPONSIBLE_UNITVARCHAR2(4000)
> >> PROCESS_ENGVARCHAR2(4000)
> >> PROCESS_NOVARCHAR2(4000)
> >> PRODUCT_ENGVARCHAR2(4000)
> >> PRODUCT_NOVARCHAR2(4000)
> >> CASE_TITLEVARCHAR2(4000)
> >> CASE_DESCRIPTIONVARCHAR2(4000)
> >> INDEX_LOBCLOB
> >> INVESTIGATEDNUMBER(38)
> >> INVESTIGATED_ENGVARCHAR2(254)
> >> INVESTIGATED_NOVARCHAR2(254)
> >> MODIFIED_DATEDATE
> >>
> >>
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: ReplicationFactor for solrcloud

2013-09-12 Thread Shalin Shekhar Mangar
You must specify maxShardsPerNode=3 for this to happen. By default
maxShardsPerNode defaults to 1 so only one shard is created per node.

On Thu, Sep 12, 2013 at 3:19 AM, Aditya Sakhuja
 wrote:
> Hi -
>
> I am trying to set the 3 shards and 3 replicas for my solrcloud deployment
> with 3 servers, specifying the replicationFactor=3 and numShards=3 when
> starting the first node. I see each of the servers allocated to 1 shard
> each.however, do not see 3 replicas allocated on each node.
>
> I specifically need to have 3 replicas across 3 servers with 3 shards. Do
> we think of any reason to not have this configuration ?
>
> --
> Regards,
> -Aditya Sakhuja



-- 
Regards,
Shalin Shekhar Mangar.


Re: Help in resolving the below retrieval issue

2013-09-12 Thread Prathik Puthran
Hi,

I am also seeing this issue when the search query is something like "how
are you?" (Quotes for clarity).
The query parser splits it to the below tokens:
+text:whats +text:your +text:raashee?

However when I remove the "?" from the search query "how are you" I get the
results.
Is "?" a special character? Should it be escaped as well?


On Wed, Sep 11, 2013 at 1:50 AM, Jack Krupansky wrote:

> Removing stray hyphens (embedded hyphens, like "CD-ROM", are okay) or
> escaping them with backslash looks like your best bests. There's no query
> parser option to disable the hyphen as an exlusion operator, although an
> upgrade to a "modern" Solr should fix the problem.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Prathik Puthran
> Sent: Tuesday, September 10, 2013 4:13 PM
> To: solr-user@lucene.apache.org
>
> Subject: Re: Help in resolving the below retrieval issue
>
> I'm using Solr 3.4.
>
>
> This bug is causing the 2nd term i.e. "kumar" to be treated as an exclusion
> operator?
> Is it possible to configure the query parser to not treat the '-' as
> exclusion operator ?
> If not the only way is to remove the '-' from the query string?
>
> Thanks,
> Prathik
>
>
> On Tue, Sep 10, 2013 at 10:36 PM, Jack Krupansky 
> **wrote:
>
>  What release of Solr are you using?
>>
>> It appears that the hyphen is being treated as an exclusion operator even
>> though it is followed by a space. Solr 4.4 doesn't appear to do that, but
>> maybe earlier releases had a problem.
>>
>> In any case, be careful with leading hyphen in queries since it does mean
>> exclude documents that contain the following term.
>>
>> Or, just escape any leading hyphen with a backslash.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Prathik Puthran
>> Sent: Tuesday, September 10, 2013 11:47 AM
>> To: d...@lucene.apache.org ; solr-user@lucene.apache.org
>> Subject: Re: Help in resolving the below retrieval issue
>>
>>
>> Thanks Erick for the response.
>> I tried to debug the query. Below is the response in the debug node
>>
>> Rahul - kumar> name="querystring">Rahul
>> - kumar+text:Rahul -text:kumar> name="parsedquery_toString">+text:Rahul -text:kumar> name="explain"/>LuceneQParser> name="filter_queries">Rahul - kumar> name="parsed_filter_queries">+text:rahul -text:kumar
>>
>>
>>
>> Does it mean the query parser has parsed it to tokens "Rahul -" and
>> "kumar"?
>> Even if this was the case solr should be able to retrieve the documents
>> because I have indexed all the documents based on n-grams as well.
>>
>> Thanks,
>> Prathik
>>
>>
>> On Tue, Sep 10, 2013 at 7:09 PM, Erick Erickson > >*
>> *wrote:
>>
>>
>>  Try adding &debug=query to the url. What I think you'll find is that
>>
>>> you're running into
>>> a common issue, the difference between query parsing and analysis.
>>>
>>> when you submit anything with whitespace in it, the query parser will
>>> break it up
>>> _before_ it gets to the analysis part, you should see something in the
>>> debug
>>> portion of the query like
>>> field:rahul field:kumar and possibly even field:-
>>>
>>> These are searched as separate tokens. By specifying KeywordTokenizer, at
>>> index time you'll have exactly one token, rahul-kumar in the index which
>>> will not
>>> match any of the separated tokens
>>>
>>> Try escaping the spaces with backslash. You could also try quoting the
>>> input although
>>> that has some phrase implications.
>>>
>>> Do you really want this search to fail on just searching "rahul" though?
>>> Perhaps
>>> keywordTokenizer isn't best here, it depends upon your use-case...
>>>
>>> Best,
>>> Erick
>>>
>>>
>>> On Tue, Sep 10, 2013 at 8:10 AM, Prathik Puthran <
>>> prathik.puthra...@gmail.com> wrote:
>>>
>>>  Hi,
>>>

 I am facing the below issue where in Solr is not retrieving the indexed
 word for some cases.

 This happens whenever the indexed word has string " - " (quotes for
 clarity) as substring i.e word prefix followed by a space which is
 followed
 by '-' again followed by a space and followed by the rest of the word
 suffix.
 When I search with search query being the exact string Solr returns no
 results.

 Example:
 Indexed word --> "Rahul - kumar"  (quotes for clarity)
 If I search with the search query as below Solr gives no results
 Search query --> "Rahul - kumar"  (quotes for clarity)

 However the below search query returns the results
 Search query --> "Rahul kumar"

 Can you please let me know what I am doing wrong here and what should I
 do to ensure the first query i.e. "Rahul - kumar" returns the documents
 indexed using it.

 Below are the analyzers I am using:
 Index time analyzer components:
 1) >>>
 pattern="([^A-Za-z0-9 ])" replacement=""/>
  2) 
  3) 
  4) >>>
 generateWordParts="1"
 preserveOriginal="1"/>
  5) >>>
 maxGramSize="50" side="front"/>
  6) >>>
 maxGramSize="50

Re: No or limited use of FieldCache

2013-09-12 Thread Erick Erickson
Per:

One thing I'll be curious about. From my reading of DocValues, it uses
little or no heap. But it _will_ use memory from the OS if I followed
Simon's slides correctly. So I wonder if you'll hit swapping issues...
Which are better than OOMs, certainly...

Thanks,
Erick


On Thu, Sep 12, 2013 at 2:07 AM, Per Steffensen  wrote:

> Thanks, guys. Now I know a little more about DocValues and realize that
> they will do the job wrt FieldCache.
>
> Regards, Per Steffensen
>
>
> On 9/12/13 3:11 AM, Otis Gospodnetic wrote:
>
>> Per,  check zee Wiki, there is a page describing docvalues. We used them
>> successfully in a solr for analytics scenario.
>>
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>> On Sep 11, 2013 9:15 AM, "Michael Sokolov" > com >
>> wrote:
>>
>>  On 09/11/2013 08:40 AM, Per Steffensen wrote:
>>>
>>>  The reason I mention sort is that we in my project, half a year ago,
 have
 dealt with the FieldCache->OOM-problem when doing sort-requests. We
 basically just reject sort-requests unless they hit below X documents -
 in
 case they do we just find them without sorting and sort them ourselves
 afterwards.

 Currently our problem is, that we have to do a group/distinct (in
 SQL-language) query and we have found that we can do what we want to do
 using group 
 (http://wiki.apache.org/solr/FieldCollapsing
 
 >)
 or facet - either will work for us. Problem is that they both use
 FieldCache and we "know" that using FieldCache will lead to
 OOM-execptions
 with the amount of data each of our Solr-nodes administrate. This time
 we
 have really no option of just "limit" usage as we did with sort.
 Therefore
 we need a group/distinct-functionality that works even on huge
 data-amounts
 (and a algorithm using FieldCache will not)

 I believe setting facet.method=enum will actually make facet not use the
 FieldCache. Is that true? Is it a bad idea?

 I do not know much about DocValues, but I do not believe that you will
 avoid FieldCache by using DocValues? Please elaborate, or point to
 documentation where I will be able to read that I am wrong. Thanks!

  There is Simon Willnauer's presentation http://www.slideshare.net/**
>>> lucenerevolution/willnauer-simon-doc-values-column-**
>>> stride-fields-in-lucene>> lucenerevolution/willnauer-**simon-doc-values-column-**
>>> stride-fields-in-lucene
>>> >
>>>
>>> and this blog post 
>>> http://blog.trifork.com/2011/
>>> 10/27/introducing-lucene-index-doc-values/>> trifork.com/2011/10/27/**introducing-lucene-index-doc-**values/
>>> >
>>>
>>> and this one that shows some performance comparisons:
>>> http://searchhub.org/2013/04/02/fun-with-docvalues-in-**solr-**4-2/
>>> 
>>> >
>>>
>>>
>>>
>>>
>>>
>


Re: SolrCloud 4.x hangs under high update volume

2013-09-12 Thread Erick Erickson
Fewer client threads updating makes sense, and going to 1 core also seems
like it might help. But it's all a crap-shoot unless the underlying cause
gets fixed up. Both would improve things, but you'll still hit the problem
sometime, probably when doing a demo for your boss ;).

Adrien has branched the code for SOLR 4.5 in preparation for a release
candidate tentatively scheduled for next week. You might just start working
with that branch if you can rather than apply individual patches...

I suspect there'll be a couple more changes to this code (looks like
Shikhar already raised an issue for instance) before 4.5 is finally cut...

FWIW,
Erick



On Thu, Sep 12, 2013 at 2:13 AM, Tim Vaillancourt wrote:

> Thanks Erick!
>
> Yeah, I think the next step will be CloudSolrServer with the SOLR-4816
> patch. I think that is a very, very useful patch by the way. SOLR-5232
> seems promising as well.
>
> I see your point on the more-shards idea, this is obviously a
> global/instance-level lock. If I really had to, I suppose I could run more
> Solr instances to reduce locking then? Currently I have 2 cores per
> instance and I could go 1-to-1 to simplify things.
>
> The good news is we seem to be more stable since changing to a bigger
> client->solr batch-size and fewer client threads updating.
>
> Cheers,
>
> Tim
>
> On 11/09/13 04:19 AM, Erick Erickson wrote:
>
>> If you use CloudSolrServer, you need to apply SOLR-4816 or use a recent
>> copy of the 4x branch. By "recent", I mean like today, it looks like Mark
>> applied this early this morning. But several reports indicate that this
>> will
>> solve your problem.
>>
>> I would expect that increasing the number of shards would make the problem
>> worse, not
>> better.
>>
>> There's also SOLR-5232...
>>
>> Best
>> Erick
>>
>>
>> On Tue, Sep 10, 2013 at 5:20 PM, Tim 
>> Vaillancourt
>> >wrote:
>>
>>  Hey guys,
>>>
>>> Based on my understanding of the problem we are encountering, I feel
>>> we've
>>> been able to reduce the likelihood of this issue by making the following
>>> changes to our app's usage of SolrCloud:
>>>
>>> 1) We increased our document batch size to 200 from 10 - our app batches
>>> updates to reduce HTTP requests/overhead. The theory is increasing the
>>> batch size reduces the likelihood of this issue happening.
>>> 2) We reduced to 1 application node sending updates to SolrCloud - we
>>> write
>>> Solr updates to Redis, and have previously had 4 application nodes
>>> pushing
>>> the updates to Solr (popping off the Redis queue). Reducing the number of
>>> nodes pushing to Solr reduces the concurrency on SolrCloud.
>>> 3) Less threads pushing to SolrCloud - due to the increase in batch size,
>>> we were able to go down to 5 update threads on the update-pushing-app
>>> (from
>>> 10 threads).
>>>
>>> To be clear the above only reduces the likelihood of the issue happening,
>>> and DOES NOT actually resolve the issue at hand.
>>>
>>> If we happen to encounter issues with the above 3 changes, the next steps
>>> (I could use some advice on) are:
>>>
>>> 1) Increase the number of shards (2x) - the theory here is this reduces
>>> the
>>> locking on shards because there are more shards. Am I onto something
>>> here,
>>> or will this not help at all?
>>> 2) Use CloudSolrServer - currently we have a plain-old least-connection
>>> HTTP VIP. If we go "direct" to what we need to update, this will reduce
>>> concurrency in SolrCloud a bit. Thoughts?
>>>
>>> Thanks all!
>>>
>>> Cheers,
>>>
>>> Tim
>>>
>>>
>>> On 6 September 2013 14:47, Tim 
>>> Vaillancourt>
>>>  wrote:
>>>
>>>  Enjoy your trip, Mark! Thanks again for the help!

 Tim


 On 6 September 2013 14:18, Mark Miller  wrote:

  Okay, thanks, useful info. Getting on a plane, but ill look more at
> this
> soon. That 10k thread spike is good to know - that's no good and could
> easily be part of the problem. We want to keep that from happening.
>
> Mark
>
> Sent from my iPhone
>
> On Sep 6, 2013, at 2:05 PM, Tim 
> Vaillancourt
> >
> wrote:
>
>  Hey Mark,
>>
>> The farthest we've made it at the same batch size/volume was 12 hours
>> without this patch, but that isn't consistent. Sometimes we would only
>>
> get
>
>> to 6 hours or less.
>>
>> During the crash I can see an amazing spike in threads to 10k which is
>> essentially our ulimit for the JVM, but I strangely see no
>>
> "OutOfMemory:
>>>
 cannot open native thread errors" that always follow this. Weird!
>>
>> We also notice a spike in CPU around the crash. The instability caused
>>
> some
>
>> shard recovery/replication though, so that CPU may be a symptom of the
>> replication, or is possibly the root cause. The CPU spikes from about
>> 20-30% utilization (system + user) to 60% fairly sharply, so the CPU,
>>
> while
>
>> spiking isn't quite "pinned" (very beefy Dell R720

SolrCloud behave differently on server and local

2013-09-12 Thread cihat güzel
hi all.
I am trying solr cloud on my server. The server is a virtual machine.

I have followed solr cloude wiki " http://wiki.apache.org/solr/SolrCloud ".
When I run solr Cloud, It si failed.  But If I try on my local ,it runs
successfully. Why does solr behave differently on server and local?

My solr.log as follows:

INFO  - 2013-09-12 14:50:13.389;
org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init() done
ERROR - 2013-09-12 14:50:13.433; org.apache.solr.core.CoreContainer;
CoreContainer was not shutdown prior to finalize(), indicates a bug --
POSSIBLE RESOURCE LEAK!!!  instance=1423856966
INFO  - 2013-09-12 14:50:13.483;
org.eclipse.jetty.server.AbstractConnector; Started
SocketConnector@0.0.0.0:8983
INFO  - 2013-09-12 14:57:01.776; org.eclipse.jetty.server.Server;
jetty-8.1.10.v20130312
INFO  - 2013-09-12 14:57:01.838;
org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor
/opt/Applications/solr-4.4.0/example/contexts at interval 0
INFO  - 2013-09-12 14:57:01.846;
org.eclipse.jetty.deploy.DeploymentManager; Deployable added:
/opt/Applications/solr-4.4.0/example/contexts/solr-jetty-context.xml
INFO  - 2013-09-12 14:57:02.549;
org.eclipse.jetty.webapp.StandardDescriptorProcessor; NO JSP Support for
/solr, did not find org.apache.jasper.servlet.JspServlet
INFO  - 2013-09-12 14:57:02.656;
org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init()
INFO  - 2013-09-12 14:57:02.797; org.apache.solr.core.SolrResourceLoader;
JNDI not configured for solr (NoInitialContextEx)
INFO  - 2013-09-12 14:57:02.799; org.apache.solr.core.SolrResourceLoader;
solr home defaulted to 'solr/' (could not find system property or JNDI)
INFO  - 2013-09-12 14:57:02.801; org.apache.solr.core.SolrResourceLoader;
new SolrResourceLoader for directory: 'solr/'
INFO  - 2013-09-12 14:57:02.917; org.apache.solr.core.ConfigSolr; Loading
container configuration from
/opt/Applications/solr-4.4.0/example/solr/solr.xml
ERROR - 2013-09-12 14:57:03.072;
org.apache.solr.servlet.SolrDispatchFilter; Could not start Solr. Check
solr/home property and the logs
ERROR - 2013-09-12 14:57:03.098; org.apache.solr.common.SolrException;
null:org.apache.solr.common.SolrException: Could not load SOLR configuration
   at org.apache.solr.core.ConfigSolr.fromFile(ConfigSolr.java:65)
   at org.apache.solr.core.ConfigSolr.fromSolrHome(ConfigSolr.java:89)
   at org.apache.solr.core.CoreContainer.(CoreContainer.java:139)
   at org.apache.solr.core.CoreContainer.(CoreContainer.java:129)
   at
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:139)
   at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:122)
   at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:119)
   at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
   at
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:719)
   at
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:265)
   at
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1252)
   at
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:710)
   at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:494)
   at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
   at
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:39)
   at
org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:186)
   at
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:494)
   at
org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:141)
   at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:145)
   at
org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:56)
   at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:609)
   at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:540)
   at org.eclipse.jetty.util.Scanner.scan(Scanner.java:403)
   at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:337)
   at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
   at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:121)
   at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
   at
org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:555)
   at
org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:230)
   at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
   at
org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:81)
   at
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:58)
   at
org.eclipse.jetty.server.handler.HandlerWrapper.doStart(Handl

Not able to deploy SOLR after applying OpenNLP patch

2013-09-12 Thread rashi gandhi
Hi,



My Question is related to OpenNLP Integration with SOLR.

I have successfully applied OpenNLP LUCENE-2899-x.patch to latest solr
branch checkout from here:

http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x

And also iam able to compile source code, generated all realted binaries
and able to create war file.

But facing issues while deployment of SOLR.

Here is the error

Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType "text_opennlp": Plugin init failure for [schema.xml]
a

nalyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFactory'

at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)

at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)

... 15 more

Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/tokenizer: Error loading class
'solr.OpenNLPTokenizerFa

ctory'

at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)

at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:362)

at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)

at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)

at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)

... 16 more

Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.OpenNLPTokenizerFactory'

at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:449)

at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:543)

at
org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:342)

at
org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:335)

at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)

... 20 more

Caused by: java.lang.ClassNotFoundException: solr.OpenNLPTokenizerFactory

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:423)

at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)

at java.lang.ClassLoader.loadClass(ClassLoader.java:356)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:264)

at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:433)

... 24 more

4446 [coreLoadExecutor-3-thread-1] ERROR
org.apache.solr.core.CoreContainer  û
null:org.apache.solr.common.SolrException: Unable to create core: colle

ction1

at
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:931)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:563)

at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:244)

at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:236)

at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)

at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

at java.lang.Thread.run(Thread.java:722)

Please help me on this.



Waiting for your reply.
Thanks in advance.


Re: Regarding improving performance of the solr

2013-09-12 Thread prabu palanisamy
Hi

I tried to reindex the solr. I get the regular expression problem. The
steps I followed are

I started the java -jar start.jar
http://localhost:8983/solr/update?stream.body=
*:*
http://localhost:8983/solr/update?stream.body=
I stopped the solr server

I changed indexed and stored tags as false for some of the fields in
schema.xml
 













id


My data-config.xml


















   



I tried the http://localhost:8983/solr/dataimport?command=full-import.  At
50,000 document, I get some error related to regular expression.

at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
at java.util.regex.Pattern$Branch.match(Pattern.java:4114)

I do not how to proceed. Please help me out.

Thanks and Regards
Prabu


On Wed, Sep 11, 2013 at 11:31 AM, Erick Erickson wrote:

> Be a little careful when extrapolating from disk to memory.
> Any fields where you've set stored="true" will put data in
> segment files with extensions .fdt and .fdx, see
> These are the compressed verbatim copy of the data
> for stored fields and have very little impact on
> memory required for searching. I've seen indexes where
> 75% of the data is stored and indexes where 5% of the
> data is stored.
>
> "Summary of File Extensions" here:
>
> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html
>
> Best,
> Erick
>
>
> On Wed, Sep 11, 2013 at 2:57 AM, prabu palanisamy  >wrote:
>
> > @Shawn: Correctly I am trying to reduce the index size. I am working on
> > reindex the solr with some of the features as indexed and not stored
> >
> > @Jean: I tried with  different caches. It did not show much improvement.
> >
> >
> > On Fri, Sep 6, 2013 at 3:17 PM, Shawn Heisey  wrote:
> >
> > > On 9/6/2013 2:54 AM, prabu palanisamy wrote:
> > > > I am currently using solr -3.5.0,  indexed  wikipedia dump (50 gb)
> with
> > > > java 1.6.
> > > > I am searching the solr with text (which is actually twitter tweets)
> .
> > > > Currently it takes average time of 210 millisecond for each post, out
> > of
> > > > which 200 millisecond is consumed by solr server (QTime).  I used the
> > > > jconsole monitor tool.
> > >
> > > If the size of all your Solr indexes on disk is in the 50GB range of
> > > your wikipedia dump, then for ideal performance, you'll want to have
> > > 50GB of free memory so the OS can cache your index.  You might be able
> > > to get by with 25-30GB of free memory, depending on your index
> > composition.
> > >
> > > Note that this is memory over and above what you allocate to the Solr
> > > JVM, and memory used by other processes on the machine.  If you do have
> > > other services on the same machine, note that those programs might ALSO
> > > require OS disk cache RAM.
> > >
> > > http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


Re: Storing/indexing speed drops quickly

2013-09-12 Thread Per Steffensen

Seems like the attachments didnt make it through to this mailing list

https://dl.dropboxusercontent.com/u/25718039/doccount.png
https://dl.dropboxusercontent.com/u/25718039/iowait.png


On 9/12/13 8:25 AM, Per Steffensen wrote:

Hi

SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node 
on each, one collection across the 6 nodes, 4 shards per node
Storing/indexing from 100 threads on external machines, each thread 
one doc at the time, full speed (they always have a new doc to 
store/index)

See attached images
* iowait.png: Measured I/O wait on the Solr machines
* doccount.png: Measured number of doc in Solr collection

Starting from an empty collection. Things are fine wrt 
storing/indexing speed for the first two-three hours (100M docs per 
hour), then speed goes down dramatically, to an, for us, unacceptable 
level (max 10M per hour). At the same time as speed goes down, we see 
that I/O wait increases dramatically. I am not 100% sure, but quick 
investigation has shown that this is due to almost constant merging.


What to do about this problem?
Know that you can play around with mergeFactor and commit-rate, but 
earlier tests shows that this really do not seem to do the job - it 
might postpone the time where the problem occurs, but basically it is 
just a matter of time before merging exhaust the system.
Is there a way to totally avoid merging, and keep indexing speed at a 
high level, while still making sure that searches will perform fairly 
well when data-amounts become big? (guess without merging you will end 
up with lots and lots of "small" files, and I guess this is not good 
for search response-time)


Regards, Per Steffensen




Re: Storing/indexing speed drops quickly

2013-09-12 Thread Per Steffensen
Maybe the fact that we are never ever going to delete or update 
documents, can be used for something. If we delete we will delete entire 
collections.


Regards, Per Steffensen

On 9/12/13 8:25 AM, Per Steffensen wrote:

Hi

SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node 
on each, one collection across the 6 nodes, 4 shards per node
Storing/indexing from 100 threads on external machines, each thread 
one doc at the time, full speed (they always have a new doc to 
store/index)

See attached images
* iowait.png: Measured I/O wait on the Solr machines
* doccount.png: Measured number of doc in Solr collection

Starting from an empty collection. Things are fine wrt 
storing/indexing speed for the first two-three hours (100M docs per 
hour), then speed goes down dramatically, to an, for us, unacceptable 
level (max 10M per hour). At the same time as speed goes down, we see 
that I/O wait increases dramatically. I am not 100% sure, but quick 
investigation has shown that this is due to almost constant merging.


What to do about this problem?
Know that you can play around with mergeFactor and commit-rate, but 
earlier tests shows that this really do not seem to do the job - it 
might postpone the time where the problem occurs, but basically it is 
just a matter of time before merging exhaust the system.
Is there a way to totally avoid merging, and keep indexing speed at a 
high level, while still making sure that searches will perform fairly 
well when data-amounts become big? (guess without merging you will end 
up with lots and lots of "small" files, and I guess this is not good 
for search response-time)


Regards, Per Steffensen




Re: DataImportHandler oddity

2013-09-12 Thread Shalin Shekhar Mangar
This is probably a bug with Oracle thin JDBC driver. Google found a
similar issue:
http://stackoverflow.com/questions/4168494/resultset-getstring-on-varchar2-column-returns-empty-string

I don't think this is specific to DataImportHandler.


On Thu, Sep 12, 2013 at 12:43 PM, Raymond Wiker  wrote:
> Followup: I just tried modifying the select with
>
> select CAST('APPLICATION' as varchar2(100)) as sourceid, ...
>
> and that caused the sourceid field to be empty. CASTing to char(100) gave
> me the expected value ('APPLICATION', right-padded to 100 characters).
>
> Meanwhile, google gave me this: http://bugs.caucho.com/view.php?id=4224(via
> http://forum.caucho.com/showthread.php?t=27574).
>
>
> On Thu, Sep 12, 2013 at 8:25 AM, Raymond Wiker  wrote:
>
>> I'm trying to index a view in an Oracle database, and have come across
>> some strange behaviour: all the VARCHAR2 fields are being returned as empty
>> strings; this also applies to a datetime field converted to a string via
>> TO_CHAR, and the url field built by concatenating two constant strings and
>> a numeric filed converted via TO_CHAR.
>>
>> If I cast the fields columns to CHAR(N), I get values back, but this is
>> not an acceptable workaround (the maximum length of CHAR(N) is less than
>> VARCHAR2(N), and the result is padded to the specified length).
>>
>> Note that this query works as it should in sqldeveloper, and also in some
>> code that uses the .NET sqlclient api.
>>
>> The query I'm using is
>>
>> select 'APPLICATION' as sourceid,
>>   'http://app.company.com' || '/app/report.aspx?trsid=' ||
>> to_char(incident_no) as "URL",
>>   incident_no, trans_date, location,
>>   responsible_unit, process_eng, product_eng,
>>   case_title, case_description,
>>   index_lob,
>>   investigated, investigated_eng,
>>   to_char(modified_date, '-MM-DD"T"HH24:MI:SS"Z"') as modified_date
>>   from synx.dw_fast
>>   where (investigated <> 3)
>>
>> while the view is
>> INCIDENT_NONUMBER(38)
>> TRANS_DATEVARCHAR2(8)
>> LOCATIONVARCHAR2(4000)
>> RESPONSIBLE_UNITVARCHAR2(4000)
>> PROCESS_ENGVARCHAR2(4000)
>> PROCESS_NOVARCHAR2(4000)
>> PRODUCT_ENGVARCHAR2(4000)
>> PRODUCT_NOVARCHAR2(4000)
>> CASE_TITLEVARCHAR2(4000)
>> CASE_DESCRIPTIONVARCHAR2(4000)
>> INDEX_LOBCLOB
>> INVESTIGATEDNUMBER(38)
>> INVESTIGATED_ENGVARCHAR2(254)
>> INVESTIGATED_NOVARCHAR2(254)
>> MODIFIED_DATEDATE
>>
>>
>>



-- 
Regards,
Shalin Shekhar Mangar.


Re: charset encoding

2013-09-12 Thread Andreas Owen
could it have something to do with the meta encoding tag is iso-8859-1 but the 
http-header tag is utf8 and firefox inteprets it as utf8?

On 12. Sep 2013, at 8:36 AM, Andreas Owen wrote:

> no jetty, and yes for tomcat i've seen a couple of answers
> 
> On 12. Sep 2013, at 3:12 AM, Otis Gospodnetic wrote:
> 
>> Using tomcat by any chance? The ML archive has the solution. May be on
>> Wiki, too.
>> 
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>> On Sep 11, 2013 8:56 AM, "Andreas Owen"  wrote:
>> 
>>> i'm using solr 4.3.1 with tika to index html-pages. the html files are
>>> iso-8859-1 (ansi) encoded and the meta tag "content-encoding" as well. the
>>> server-http-header says it's utf8 and firefox-webdeveloper agrees.
>>> 
>>> when i index a page with special chars like ä,ö,ü solr outputs it
>>> completly foreign signs, not the normal wrong chars with 1/4 or the Flag in
>>> it. so it seams that its not simply the normal utf8/iso-8859-1 discrepancy.
>>> has anyone got a idea whats wrong?
>>> 
>>> 



Re: DataImportHandler oddity

2013-09-12 Thread Raymond Wiker
Followup: I just tried modifying the select with

select CAST('APPLICATION' as varchar2(100)) as sourceid, ...

and that caused the sourceid field to be empty. CASTing to char(100) gave
me the expected value ('APPLICATION', right-padded to 100 characters).

Meanwhile, google gave me this: http://bugs.caucho.com/view.php?id=4224(via
http://forum.caucho.com/showthread.php?t=27574).


On Thu, Sep 12, 2013 at 8:25 AM, Raymond Wiker  wrote:

> I'm trying to index a view in an Oracle database, and have come across
> some strange behaviour: all the VARCHAR2 fields are being returned as empty
> strings; this also applies to a datetime field converted to a string via
> TO_CHAR, and the url field built by concatenating two constant strings and
> a numeric filed converted via TO_CHAR.
>
> If I cast the fields columns to CHAR(N), I get values back, but this is
> not an acceptable workaround (the maximum length of CHAR(N) is less than
> VARCHAR2(N), and the result is padded to the specified length).
>
> Note that this query works as it should in sqldeveloper, and also in some
> code that uses the .NET sqlclient api.
>
> The query I'm using is
>
> select 'APPLICATION' as sourceid,
>   'http://app.company.com' || '/app/report.aspx?trsid=' ||
> to_char(incident_no) as "URL",
>   incident_no, trans_date, location,
>   responsible_unit, process_eng, product_eng,
>   case_title, case_description,
>   index_lob,
>   investigated, investigated_eng,
>   to_char(modified_date, '-MM-DD"T"HH24:MI:SS"Z"') as modified_date
>   from synx.dw_fast
>   where (investigated <> 3)
>
> while the view is
> INCIDENT_NONUMBER(38)
> TRANS_DATEVARCHAR2(8)
> LOCATIONVARCHAR2(4000)
> RESPONSIBLE_UNITVARCHAR2(4000)
> PROCESS_ENGVARCHAR2(4000)
> PROCESS_NOVARCHAR2(4000)
> PRODUCT_ENGVARCHAR2(4000)
> PRODUCT_NOVARCHAR2(4000)
> CASE_TITLEVARCHAR2(4000)
> CASE_DESCRIPTIONVARCHAR2(4000)
> INDEX_LOBCLOB
> INVESTIGATEDNUMBER(38)
> INVESTIGATED_ENGVARCHAR2(254)
> INVESTIGATED_NOVARCHAR2(254)
> MODIFIED_DATEDATE
>
>
>


Re: number of replicas in Cloud

2013-09-12 Thread Anshum Gupta
Can you specify what do you mean by 'problem'? I don't think there should
be any issues with that.
Hope this is what you followed in your attempt so far:

http://wiki.apache.org/solr/SolrCloud#Example_B:_Simple_two_shard_cluster_with_shard_replicas



On Thu, Sep 12, 2013 at 11:31 AM, Prasi S  wrote:

> Hi Anshum,
> Im using solr 4.4. Is there a problem with using replicationFactor of 2
>
>
>
>
> On Thu, Sep 12, 2013 at 11:20 AM, Anshum Gupta  >wrote:
>
> > Prasi, a replicationFactor of 2 is what you want. However, as of the
> > current releases, this is not persisted.
> >
> >
> >
> > On Thu, Sep 12, 2013 at 11:17 AM, Prasi S  wrote:
> >
> > > Hi,
> > > I want to setup solrcloud with 2 shards and 1 replica for each shard.
> > >
> > > MyCollection
> > >
> > > shard1 , shard2
> > > shard1-replica , shard2-replica
> > >
> > > In this case, i would "numShards=2". For replicationFactor , should
> give
> > > replicationFactor=1 or replicationFActor=2 ?
> > >
> > >
> > > Pls suggest me.
> > >
> > > thanks,
> > > Prasi
> > >
> >
> >
> >
> > --
> >
> > Anshum Gupta
> > http://www.anshumgupta.net
> >
>



-- 

Anshum Gupta
http://www.anshumgupta.net