Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-06 Thread Summer Shire
Hi All,

Here’s more update on where I am at with this.
I enabled infoStream logging and quickly figured that I need to get rid of 
maxBufferedDocs. So Erick you 
were absolutely right on that.
I increased my ramBufferSize to 100MB
and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
My config looks like this 

indexConfig
useCompoundFilefalse/useCompoundFile
ramBufferSizeMB100/ramBufferSizeMB


!--maxMergeSizeForForcedMerge9223372036854775807/maxMergeSizeForForcedMerge--
mergePolicy class=org.apache.lucene.index.TieredMergePolicy
  int name=maxMergeAtOnce3/int
  int name=segmentsPerTier3/int
/mergePolicy
mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler/
infoStream file=“/tmp/INFOSTREAM.txt”true/infoStream
  /indexConfig

I am attaching a sample infostream log file.
In the infoStream logs though you an see how the segments keep on adding
and it shows (just an example )
allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0

I looked at TieredMergePolicy.java to see how allowedSegmentCount is getting 
calculated
// Compute max allowed segs in the index
long levelSize = minSegmentBytes;
long bytesLeft = totIndexBytes;
double allowedSegCount = 0;
while(true) {
  final double segCountLevel = bytesLeft / (double) levelSize;
  if (segCountLevel  segsPerTier) {
allowedSegCount += Math.ceil(segCountLevel);
break;
  }
  allowedSegCount += segsPerTier;
  bytesLeft -= segsPerTier * levelSize;
  levelSize *= maxMergeAtOnce;
}
int allowedSegCountInt = (int) allowedSegCount;
and the minSegmentBytes is calculated as follows
 // Compute total index bytes  print details about the index
long totIndexBytes = 0;
long minSegmentBytes = Long.MAX_VALUE;
for(SegmentInfoPerCommit info : infosSorted) {
  final long segBytes = size(info);
  if (verbose()) {
String extra = merging.contains(info) ?  [merging] : ;
if (segBytes = maxMergedSegmentBytes/2.0) {
  extra +=  [skip: too large];
} else if (segBytes  floorSegmentBytes) {
  extra +=  [floored];
}
message(  seg= + writer.get().segString(info) +  size= + 
String.format(Locale.ROOT, %.3f, segBytes/1024/1024.) +  MB + extra);
  }

  minSegmentBytes = Math.min(segBytes, minSegmentBytes);
  // Accum total byte size
  totIndexBytes += segBytes;
}


any input is welcome. 




thanks,
Summer


 On Mar 5, 2015, at 8:11 AM, Erick Erickson erickerick...@gmail.com wrote:
 
 I would, BTW, either just get rid of the maxBufferedDocs all together or
 make it much higher, i.e. 10. I don't think this is really your
 problem, but you're creating a lot of segments here.
 
 But I'm kind of at a loss as to what would be different about your setup.
 Is there _any_ chance that you have some secondary process looking at
 your index that's maintaining open searchers? Any custom code that's
 perhaps failing to close searchers? Is this a Unix or Windows system?
 
 And just to be really clear, you _only_ seeing more segments being
 added, right? If you're only counting files in the index directory, it's
 _possible_ that merging is happening, you're just seeing new files take
 the place of old ones.
 
 Best,
 Erick
 
 On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey apa...@elyograg.org wrote:
 On 3/4/2015 4:12 PM, Erick Erickson wrote:
 I _think_, but don't know for sure, that the merging stuff doesn't get
 triggered until you commit, it doesn't just happen.
 
 Shot in the dark...
 
 I believe that new segments are created when the indexing buffer
 (ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
 anytime a new segment is created, the merge policy is checked to see
 whether a merge is needed.
 
 Thanks,
 Shawn
 



Re: solr cloud does not start with many collections

2015-03-06 Thread didier deshommes
It would be a huge step forward if one could have several hundreds of Solr
collections, but only have a small portion of them opened/loaded at the
same time. This is similar to ElasticSearch's close index api, listed here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-open-close.html
. I've opened an issue to implement the same in Solr here a few months ago:
https://issues.apache.org/jira/browse/SOLR-6399

On Thu, Mar 5, 2015 at 4:42 PM, Damien Kamerman dami...@gmail.com wrote:

 I've tried a few variations, with 3 x ZK, 6 X nodes, solr 4.10.3, solr 5.0
 without any success and no real difference. There is a tipping point at
 around 3,000-4,000 cores (varies depending on hardware) from where I can
 restart the cloud OK within ~4min, to the cloud not working and
 continuous 'conflicting
 information about the leader of shard' warnings.

 On 5 March 2015 at 14:15, Shawn Heisey apa...@elyograg.org wrote:

  On 3/4/2015 5:37 PM, Damien Kamerman wrote:
   I'm running on Solaris x86, I have plenty of memory and no real limits
   # plimit 15560
   15560:  /opt1/jdk/bin/java -d64 -server -Xss512k -Xms32G -Xmx32G
   -XX:MaxMetasp
  resource  current maximum
 time(seconds) unlimited   unlimited
 file(blocks)  unlimited   unlimited
 data(kbytes)  unlimited   unlimited
 stack(kbytes) unlimited   unlimited
 coredump(blocks)  unlimited   unlimited
 nofiles(descriptors)  65536   65536
 vmemory(kbytes)   unlimited   unlimited
  
   I've been testing with 3 nodes, and that seems OK up to around 3,000
  cores
   total. I'm thinking of testing with more nodes.
 
  I have opened an issue for the problems I encountered while recreating a
  config similar to yours, which I have been doing on Linux.
 
  https://issues.apache.org/jira/browse/SOLR-7191
 
  It's possible that the only thing the issue will lead to is improvements
  in the documentation, but I'm hopeful that there will be code
  improvements too.
 
  Thanks,
  Shawn
 
 


 --
 Damien Kamerman



Solr query to match document templates - sort of a reverse wildcard match

2015-03-06 Thread Robert Stewart
If I have SOLR document with field value such as:

a ? c ? e

And I want a phrase query such as a b c d e to match that document.

So:

q:a b c d e  -- return doc with a ? c ? e as field value for q field.

Is this possible, or is there a way it can be done with a plug-in using
lower level Lucene SDK?  Maybe some custom implementation of TermQuery
where value of ? always matches any term in the query?

Thanks!
Robert Stewart


Re: ExpandComponent not expanding

2015-03-06 Thread Dario Rigolin
I did more testing following your question... And now all makes sense 
and I think that maybe a more clear explanation on documentation can help.
I was using Grouping where a group is created also if only one element 
is present, I have inferred that expanded section was showing ALL 
collapsed records not ALL collapsed record after the one returned as 
group head.

A this point ExpandComponent works well sorry for the false allarm.

Regards.

Dario

On 6/03/2015 14:26, Joel Bernstein wrote:

The expand component only displays the groups heads when it finds expanded
documents in the group. And it only expands for the current page.

Are you finding situations where there are group heads on the page, that
have child documents that are not being expanded?

Joel Bernstein
Search Engineer at Heliosearch

On Fri, Mar 6, 2015 at 7:17 AM, Dario Rigolin da...@comperio.it wrote:


I'm using Solr 4.10.1 and FieldCollapsing but when adding expand=true and
activating ExpandComponent the expanded section into result contains only
one group head and not all group heads present into the result.
I don't know if this is the intended behaviour. Using a query q=*:* the
expanded section increase the number of group heads but not all 10 heads
group are present. Also removing max= parameter on !collapse makes display
couple of more heads but not all .

Regards

Example of response with only one group head into expanded but 10 are
returned.

response
script/
lstname=responseHeader
intname=status
0
/int
intname=QTime
20
/int
lstname=params
strname=expand.rows
2
/str
strname=expand.sortsdate asc/str
strname=fl
id
/str
strname=q
title:(test search)
/str
strname=expand
true
/str
strname=fq
{!collapse field=group_key max=sdate}
/str
/lst
/lst
resultname=responsenumFound=120start=0
doc
strname=id
test:catalog:713515
/str
/doc
doc
strname=id
test:catalog:126861
/str
/doc
doc
strname=id
test:catalog:88797
/str
/doc
doc
strname=id
test:catalog:91760
/str
/doc
doc
strname=id
test:catalog:14095
/str
/doc
doc
strname=id
test:catalog:60616
/str
/doc
doc
strname=id
test:catalog:31539
/str
/doc
doc
strname=id
test:catalog:29449
/str
/doc
doc
strname=id
test:catalog:146638
/str
/doc
doc
strname=id
test:catalog:137554
/str
/doc
/result
lstname=expanded
resultname=collapse_value_2342numFound=3start=0
doc
strname=id
test:catalog:21
/str
/doc
doc
strname=id
test:catalog:330659
/str
/doc
/result
/lst
head/
/response





Re: Solrcloud Index corruption

2015-03-06 Thread Erick Erickson
bq: You say in our case some docs didn't made it to the node, but
that's not really true: the docs can be found on the corrupted nodes
when I search on ID. The docs are also complete. The problem is that
the docs do not appear when I filter on certain fields

this _sounds_ like you somehow don't have indexed=true set for the
field in question. But it also sounds like you're saying that search
on that field works on some nodes but not on others, I'm assuming
you're adding distrib=false to verify this. It shouldn't be
possible to have different schema.xml files on the different nodes,
but you might try checking through the admin UI.

Network burps shouldn't be related here. If the content is stored,
then the info made it to Solr intact, so this issue shouldn't be
related to that.

Sounds like it may just be the bugs Mark is referencing, sorry I don't
have the JIRA numbers right off.

Best,
Erick

On Thu, Mar 5, 2015 at 4:46 PM, Shawn Heisey apa...@elyograg.org wrote:
 On 3/5/2015 3:13 PM, Martin de Vries wrote:
 I understand there is not a master in SolrCloud. In our case we use
 haproxy as a load balancer for every request. So when indexing every
 document will be sent to a different solr server, immediately after
 each other. Maybe SolrCloud is not able to handle that correctly?

 SolrCloud can handle that correctly, but currently sending index updates
 to a core that is not the leader of the shard will incur a significant
 performance hit, compared to always sending updates to the correct
 core.  A small performance penalty would be understandable, because the
 request must be redirected, but what actually happens is a much larger
 penalty than anyone expected.  We have an issue in Jira to investigate
 that performance issue and make it work as efficiently as possible.

 Indexing batches of documents is recommended, not sending one document
 per update request.

 General performance problems with Solr itself can lead to extremely odd
 and unpredictable behavior from SolrCloud.  Most often these kinds of
 performance problems are related in some way to memory, either the java
 heap or available memory in the system.

 http://wiki.apache.org/solr/SolrPerformanceProblems

 Thanks,
 Shawn



Re: Frequency of Suggestion are varying from original Frequency in index

2015-03-06 Thread gaohang wang
do you use solrcloud?maybe your suggestion is not support distribute

2015-03-04 22:39 GMT+08:00 Nitin Solanki nitinml...@gmail.com:

 Hi..
I have a term(who) where original frequency of who is 191 but
 when I get suggestion of who it gives me 90. Why?

 Example :

 *Original Frequency* comes like:

 spellcheck:{
 suggestions:[
   who,{
 numFound:1,
 startOffset:1,
 endOffset:4,
 origFreq:*191*,
   correctlySpelled,false]}}

 While In *Suggestion*, it gives like:

 spellcheck:{
 suggestions:[
   whs,{
 numFound:1,
 startOffset:1,
 endOffset:4,
 origFreq:0,
 suggestion:[{
 word:who,
 freq:*90*}]},
   correctlySpelled,false]}}



 Why it is so?

 I am using StandardTokenizerFactory with ShingleFilterFactory in
 Schema.xml..



Re: Core admin: create new core

2015-03-06 Thread Erik Hatcher
Try -

   bin/solr create -c inventory




 On Mar 6, 2015, at 05:25, manju16832003 manju16832...@gmail.com wrote:
 
 Solr 5 has been released. I was just giving a try and come across the same
 issue. As I heard over from some documentation, Solr 5 doesn't come with
 default core (example in earlier versions).  And this requires us to create
 a core from Solr Admin. When I tried to create the core, I get the following
 error
 
 Error CREATEing SolrCore 'inventory': Unable to create core [inventory]
 Caused by: Can't find resource 'solrconfig.xml' in classpath or
 '/Users/manjunath.reddy/Programming/Solr/solr-5.0.0/server/solr/inventory/conf'
 
 
 So I had to manually create the core based on my previous experience with
 Solr 4.10 version.
 I guess its quite misleading for new users of Solr. I like the older
 versions of Solr that comes with default cores so that It would be easier to
 follow up on.
 
 I have attached screen shots for the reference. Is there a work around for
 this?
 http://lucene.472066.n3.nabble.com/file/n4191378/solr-1.png 
 http://lucene.472066.n3.nabble.com/file/n4191378/solr-2.png 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Core-admin-create-new-core-tp4099127p4191378.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Calling solr Page with search query

2015-03-06 Thread Jochen Lenz

Hello,
i'm looking for a solution for the following situation:
I have a website consisting of two sites.
One site is called home and one is called search.
On the search site i have embeded solr via iframe.
On the home site there should be a search field.
When the search field is fired it should open the search site with solr 
showing the search query and result (it should look like as if i would 
have used the search field directly on the search site).

I would be very thankfull for any hints in connecting these parts!
Regards
Jochen


RE: Cores and and ranking (search quality)

2015-03-06 Thread johnmunir
Help me understand this better (regarding ranking).

If I have two docs that are 100% identical with the exception of uid (which is 
stored but not indexed).  In a single core setup, if I search xyz such that 
those 2 docs end up ranking as #1 and #2.  When I switch over to two core 
setup, doc-A goes to core-A (which has 10 records) and doc-B goes to core-B 
(which has 100,000 records).

Now, are you saying in 2 core setup if I search on xyz (just like in singe 
core setup) this time I will not see doc-A and doc-B as #1 and #2 in ranking?  
That is, are you saying doc-A may now be somewhere at the top / bottom far away 
from doc-B?  If so, which will be #1: the doc off core-A (that has 10 records) 
or doc-B off core-B (that has 100,000 records)?

If I got all this right, are you saying SOLR-1632 will fix this issue such that 
the end result will now be as if I had 1 core?

- MJ


-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: Thursday, March 5, 2015 9:06 AM
To: solr-user@lucene.apache.org
Subject: Re: Cores and and ranking (search quality)

On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote:
 My question is this: if I put my data in multiple cores and use 
 distributed search will the ranking be different if I had all my data 
 in a single core?

Yes, it will be different. The practical impact depends on how homogeneous your 
data are across the shards and how large your shards are. If you have small and 
dissimilar shards, your ranking will suffer a lot.

Work is being done to remedy this:
https://issues.apache.org/jira/browse/SOLR-1632

 Also, will facet and more-like-this quality / result be the same?

It is not formally guaranteed, but for most practical purposes, faceting on 
multi-shards will give you the same results as single-shards.

I don't know about more-like-this. My guess is that it will be affected in the 
same way that standard searches are.

 Also, reading the distributed search wiki
 (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr 
 does the search and result merging (all I have to do is issue a 
 search), is this correct?

Yes. From a user-perspective, searches are no different.

- Toke Eskildsen, State and University Library, Denmark



Check the return of suggestions

2015-03-06 Thread ale42
Hello everyone.

I'm working with Solr 4.3. I use the Spellechecker component which gives me
suggestions as i expect.

I will explain my problem with an example : 

I am querying /cartouchhe/instead of /cartouche/.

I obtain these suggestions 

array (size=5)
  0 = 
array (size=2)
  'word' = *string 'cartouche' (length=9)*
  'freq' = *int 1519*
  1 = 
array (size=2)
  'word' = *string 'touches' (length=7)*
  'freq' =* int 55*
  2 = 
array (size=2)
  'word' = *string 'cartouches' (length=10)*
  'freq' =*int 32*
  3 = 
array (size=2)
  'word' =* string 'caoutchoucs' (length=11)*
  'freq' =* int 16*
  4 = 
array (size=2)
  'word' = *string 'cartonnees' (length=10)*
  'freq' =* int 15*

This is what I want == OK.

The problem is that when I query /cartouche/or /cartouches/, I exactly
have the same results because for both query, the term that will be
searching into my index is /cartouch/.

Is there a way with Solr to fix this kind of problem ie check that 2
collations will not return exactly the same results?

Thanks for your answers,
Alex.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Check-the-return-of-suggestions-tp4191383.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Frequency of Suggestion are varying from original Frequency in index

2015-03-06 Thread ale42
I think these frequencies are not the frequence of the term in the same index
:

- original frequency represents the number of results that you have in
lucene index when you query who.

- suggestion frequency is the number of results of this term in the
spellcheck dictionnary.

I guess you're using /solr.IndexBasedSpellChecker/ !



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Frequency-of-Suggestion-are-varying-from-original-Frequency-in-index-tp4190927p4191397.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ExpandComponent not expanding

2015-03-06 Thread Joel Bernstein
The expand component only displays the groups heads when it finds expanded
documents in the group. And it only expands for the current page.

Are you finding situations where there are group heads on the page, that
have child documents that are not being expanded?

Joel Bernstein
Search Engineer at Heliosearch

On Fri, Mar 6, 2015 at 7:17 AM, Dario Rigolin da...@comperio.it wrote:

 I'm using Solr 4.10.1 and FieldCollapsing but when adding expand=true and
 activating ExpandComponent the expanded section into result contains only
 one group head and not all group heads present into the result.
 I don't know if this is the intended behaviour. Using a query q=*:* the
 expanded section increase the number of group heads but not all 10 heads
 group are present. Also removing max= parameter on !collapse makes display
 couple of more heads but not all .

 Regards

 Example of response with only one group head into expanded but 10 are
 returned.

 response
 script/
 lstname=responseHeader
 intname=status
 0
 /int
 intname=QTime
 20
 /int
 lstname=params
 strname=expand.rows
 2
 /str
 strname=expand.sortsdate asc/str
 strname=fl
 id
 /str
 strname=q
 title:(test search)
 /str
 strname=expand
 true
 /str
 strname=fq
 {!collapse field=group_key max=sdate}
 /str
 /lst
 /lst
 resultname=responsenumFound=120start=0
 doc
 strname=id
 test:catalog:713515
 /str
 /doc
 doc
 strname=id
 test:catalog:126861
 /str
 /doc
 doc
 strname=id
 test:catalog:88797
 /str
 /doc
 doc
 strname=id
 test:catalog:91760
 /str
 /doc
 doc
 strname=id
 test:catalog:14095
 /str
 /doc
 doc
 strname=id
 test:catalog:60616
 /str
 /doc
 doc
 strname=id
 test:catalog:31539
 /str
 /doc
 doc
 strname=id
 test:catalog:29449
 /str
 /doc
 doc
 strname=id
 test:catalog:146638
 /str
 /doc
 doc
 strname=id
 test:catalog:137554
 /str
 /doc
 /result
 lstname=expanded
 resultname=collapse_value_2342numFound=3start=0
 doc
 strname=id
 test:catalog:21
 /str
 /doc
 doc
 strname=id
 test:catalog:330659
 /str
 /doc
 /result
 /lst
 head/
 /response



Re: Core admin: create new core

2015-03-06 Thread Shawn Heisey
On 3/6/2015 3:25 AM, manju16832003 wrote:
 Solr 5 has been released. I was just giving a try and come across the same
 issue. As I heard over from some documentation, Solr 5 doesn't come with
 default core (example in earlier versions).  And this requires us to create
 a core from Solr Admin. When I tried to create the core, I get the following
 error
 
 Error CREATEing SolrCore 'inventory': Unable to create core [inventory]
 Caused by: Can't find resource 'solrconfig.xml' in classpath or
 '/Users/manjunath.reddy/Programming/Solr/solr-5.0.0/server/solr/inventory/conf'
 
 
 So I had to manually create the core based on my previous experience with
 Solr 4.10 version.
 I guess its quite misleading for new users of Solr. I like the older
 versions of Solr that comes with default cores so that It would be easier to
 follow up on.

Unless you are in SolrCloud mode, creating cores via the admin UI (or
the /admin/cores HTTP API) requires that the core directory and its conf
subdirectory with solrconfig.xml, schema.xml, and other potential files
must already exist in the indicated location.  There's a note right on
the screen for Add Core that says this: instanceDir and dataDir need
to exist before you can create the core.  This was the case for 4.x as
well as 5.0.

That note is slightly misleading ... the dataDir does not need to exist,
just instanceDir and the conf directory.  Solr will create the dataDir
and its contents, if the user running Solr has permission.

There is a configsets functionality that's new in recent versions which
very likely will make it possible to create a core completely from
scratch within the admin UI in non-cloud mode, but I do not know
anything about using it, and I do not think the functionality is exposed
in the admin UI yet.

Learning about cores and/or collections and how to create them is a
hugely important part of using Solr.  In 4.x, users did not need to do
anything to get their first core, and that fact has led to many
problems.  New users don't know how to add a core, and many do not even
know about cores at all.  This requires that they must learn about the
core/collection concepts, and many of them cannot find any info about
the procedure, so they ask for help.  I am glad to help out both here
and on the IRC channel, but it improves the experience of everyone
involved if users become familiar with the concept and methods on their own.

Thanks,
Shawn



ExpandComponent not expanding

2015-03-06 Thread Dario Rigolin
I'm using Solr 4.10.1 and FieldCollapsing but when adding expand=true 
and activating ExpandComponent the expanded section into result contains 
only one group head and not all group heads present into the result.
I don't know if this is the intended behaviour. Using a query q=*:* the 
expanded section increase the number of group heads but not all 10 heads 
group are present. Also removing max= parameter on !collapse makes 
display couple of more heads but not all .


Regards

Example of response with only one group head into expanded but 10 are 
returned.


response
script/
lstname=responseHeader
intname=status
0
/int
intname=QTime
20
/int
lstname=params
strname=expand.rows
2
/str
strname=expand.sortsdate asc/str
strname=fl
id
/str
strname=q
title:(test search)
/str
strname=expand
true
/str
strname=fq
{!collapse field=group_key max=sdate}
/str
/lst
/lst
resultname=responsenumFound=120start=0
doc
strname=id
test:catalog:713515
/str
/doc
doc
strname=id
test:catalog:126861
/str
/doc
doc
strname=id
test:catalog:88797
/str
/doc
doc
strname=id
test:catalog:91760
/str
/doc
doc
strname=id
test:catalog:14095
/str
/doc
doc
strname=id
test:catalog:60616
/str
/doc
doc
strname=id
test:catalog:31539
/str
/doc
doc
strname=id
test:catalog:29449
/str
/doc
doc
strname=id
test:catalog:146638
/str
/doc
doc
strname=id
test:catalog:137554
/str
/doc
/result
lstname=expanded
resultname=collapse_value_2342numFound=3start=0
doc
strname=id
test:catalog:21
/str
/doc
doc
strname=id
test:catalog:330659
/str
/doc
/result
/lst
head/
/response


Order of defining fields and dynamic fields in schema.xml

2015-03-06 Thread Tom Devel
Hi,

I am running solr 5 using basic_configs and have a questions about the
order of defining fields and dynamic fields in the schema.xml file?

For example, there is a field hierarchy.of.fields.Project I am capturing
as below as text_en_splitting, but the rest of the fields in this
hierarchy, I would like as text_en

Since the dynamicField with * is technically spanning over the Project
field, should its definition go above, or below the Project field?

field name=hierarchy.of.fields.Project type=text_en_splitting
indexed=true  stored=true  multiValued=true required=false /
dynamicField name=hierarchy.of.fields.* type=text_en
indexed=true  stored=true  multiValued=true required=false /


Or this case, I have a hierarchy where currently only one field should be
captured another.hierarchy.of.fields.Description, the rest for now should
be just ignored. Is here any significance of which definition comes first?

dynamicField name=another.hierarchy.of.* type=text_en
indexed=false  stored=false  multiValued=true required=false /
dynamicField name=another.hierarchy.of.fields.Description
type=text_enindexed=true  stored=true  multiValued=true
required=false /

Thanks for any hints,
Tom


Re: Order of defining fields and dynamic fields in schema.xml

2015-03-06 Thread Alexandre Rafalovitch
I don't believe the order in file matters for anything apart from
initParams section. The longer - more specific one - matches first.


Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 6 March 2015 at 11:21, Tom Devel deve...@gmail.com wrote:
 Hi,

 I am running solr 5 using basic_configs and have a questions about the
 order of defining fields and dynamic fields in the schema.xml file?

 For example, there is a field hierarchy.of.fields.Project I am capturing
 as below as text_en_splitting, but the rest of the fields in this
 hierarchy, I would like as text_en

 Since the dynamicField with * is technically spanning over the Project
 field, should its definition go above, or below the Project field?

 field name=hierarchy.of.fields.Project type=text_en_splitting
 indexed=true  stored=true  multiValued=true required=false /
 dynamicField name=hierarchy.of.fields.* type=text_en
 indexed=true  stored=true  multiValued=true required=false /


 Or this case, I have a hierarchy where currently only one field should be
 captured another.hierarchy.of.fields.Description, the rest for now should
 be just ignored. Is here any significance of which definition comes first?

 dynamicField name=another.hierarchy.of.* type=text_en
 indexed=false  stored=false  multiValued=true required=false /
 dynamicField name=another.hierarchy.of.fields.Description
 type=text_enindexed=true  stored=true  multiValued=true
 required=false /

 Thanks for any hints,
 Tom


Re: SolrCloud default shard assignment order not correct

2015-03-06 Thread Shawn Heisey
On 3/6/2015 1:34 AM, Shawn Heisey wrote:
 In Solr 5.0, the cloud graph is sorting the collections by name.  The
 shard names also appear to be sorted -- all the collections I have on
 the example cloud setup only have two shards, so I really can't be sure.
  It might also be sorting the replicas within each shard.

I built a collection that would tell me what exactly is sorted in Solr
5.0.  The collections are sorted and the shards are sorted, but the
replicas are NOT sorted.  Because there are normally only a few replicas
and the leader is clearly marked, I don't see that as a problem, but if
you really want them sorted, feel free to open an issue in Jira.

Screenshot:

https://www.dropbox.com/s/yzkubdbj86dbkda/solr5-cloud-graph-sorting.png?dl=0

SOLR project in Jira:

https://issues.apache.org/jira/browse/SOLR

Thanks,
Shawn



Apache Solr Reference Guide 5.0

2015-03-06 Thread Patrick Durusau

Greetings,

I was looking at the PDF version of the Apache Solr Reference Guide 5.0 
and noticed that it has no TOC nor any section numbering. 
http://apache.claz.org/lucene/solr/ref-guide/apache-solr-ref-guide-5.0.pdf


The lack of a TOC and section headings makes navigation difficult.

I have just started making suggestions on the documentation and was 
wondering if there is a reason why the TOC and section headings are 
missing? (that isn't apparent from the document)


Thanks!

Hope everyone is near a great weekend!

Patrick


Re: Order of defining fields and dynamic fields in schema.xml

2015-03-06 Thread Tom Devel
Thats good to know.

On http://wiki.apache.org/solr/SchemaXml it also states about dynamicFields
that you can create field rules that Solr will use to understand what
datatype should be used whenever it is given a field name that is not
explicitly defined, but matches a prefix or suffix used in a dynamicField. 

Thanks

On Fri, Mar 6, 2015 at 10:43 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 I don't believe the order in file matters for anything apart from
 initParams section. The longer - more specific one - matches first.


 Regards,
Alex.
 
 Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
 http://www.solr-start.com/


 On 6 March 2015 at 11:21, Tom Devel deve...@gmail.com wrote:
  Hi,
 
  I am running solr 5 using basic_configs and have a questions about the
  order of defining fields and dynamic fields in the schema.xml file?
 
  For example, there is a field hierarchy.of.fields.Project I am
 capturing
  as below as text_en_splitting, but the rest of the fields in this
  hierarchy, I would like as text_en
 
  Since the dynamicField with * is technically spanning over the Project
  field, should its definition go above, or below the Project field?
 
  field name=hierarchy.of.fields.Project type=text_en_splitting
  indexed=true  stored=true  multiValued=true required=false /
  dynamicField name=hierarchy.of.fields.* type=text_en
  indexed=true  stored=true  multiValued=true required=false /
 
 
  Or this case, I have a hierarchy where currently only one field should be
  captured another.hierarchy.of.fields.Description, the rest for now
 should
  be just ignored. Is here any significance of which definition comes
 first?
 
  dynamicField name=another.hierarchy.of.* type=text_en
  indexed=false  stored=false  multiValued=true required=false /
  dynamicField name=another.hierarchy.of.fields.Description
  type=text_enindexed=true  stored=true  multiValued=true
  required=false /
 
  Thanks for any hints,
  Tom



Re: How to start solr in solr cloud mode using external zookeeper ?

2015-03-06 Thread Rajesh Hazari
zkhost=hostnames,
port=some port
variables in your solr.xml should work?
I have tested this with tomcat not with jetty, this stays with your config.

Rajesh.
On Mar 5, 2015 9:20 PM, Aman Tandon amantandon...@gmail.com wrote:

 Thanks shamik :)

 With Regards
 Aman Tandon

 On Fri, Mar 6, 2015 at 3:30 AM, shamik sham...@gmail.com wrote:

  The other way you can do that is to specify the startup parameters in
  solr.in.sh.
 
  Example :
 
  SOLR_MODE=solrcloud
 
  ZK_HOST=zoohost1:2181,zoohost2:2181,zoohost3:2181
 
  SOLR_PORT=4567
 
  You can simply start solr by running ./solr start
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/How-to-start-solr-in-solr-cloud-mode-using-external-zookeeper-tp4190630p4191286.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Core admin: create new core

2015-03-06 Thread manju16832003
Solr 5 has been released. I was just giving a try and come across the same
issue. As I heard over from some documentation, Solr 5 doesn't come with
default core (example in earlier versions).  And this requires us to create
a core from Solr Admin. When I tried to create the core, I get the following
error

Error CREATEing SolrCore 'inventory': Unable to create core [inventory]
Caused by: Can't find resource 'solrconfig.xml' in classpath or
'/Users/manjunath.reddy/Programming/Solr/solr-5.0.0/server/solr/inventory/conf'


So I had to manually create the core based on my previous experience with
Solr 4.10 version.
I guess its quite misleading for new users of Solr. I like the older
versions of Solr that comes with default cores so that It would be easier to
follow up on.

I have attached screen shots for the reference. Is there a work around for
this?
http://lucene.472066.n3.nabble.com/file/n4191378/solr-1.png 
http://lucene.472066.n3.nabble.com/file/n4191378/solr-2.png 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Core-admin-create-new-core-tp4099127p4191378.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Apache Solr Reference Guide 5.0

2015-03-06 Thread Patrick Durusau

Shawn,

Thanks!

I was using Document Viewer and not Adobe Acrobat so was unclear.

The TOC I meant was as in a traditional print publication with section 
#s, etc. Not a navigation TOC sans numbering as in Adobe.


The Confluence documentation (I can't see the actual stylesheet in use, 
I don't think) here:


https://confluence.atlassian.com/display/DOC/Customising+Exports+to+PDF

Says:

*
Disabling the Table of Contents

To prevent the table of contents from being generated in your PDF 
document, add the div.toc-macro rule to the PDF Stylesheet and set its 
display property to none:

*

Which is why I was asking if there was a reason for the TOC and section 
numbering not appearing.


They can be defeated but that doesn't appear to be the default setting.

This came up because a section said it would cover topics N - S and I 
could not determine if all those topics fell in that section or not.


Thanks!

Hope you are having a great day!

Patrick

On 03/06/2015 12:28 PM, Shawn Heisey wrote:

On 3/6/2015 10:20 AM, Patrick Durusau wrote:

I was looking at the PDF version of the Apache Solr Reference Guide
5.0 and noticed that it has no TOC nor any section numbering.
http://apache.claz.org/lucene/solr/ref-guide/apache-solr-ref-guide-5.0.pdf

The lack of a TOC and section headings makes navigation difficult.

I have just started making suggestions on the documentation and was
wondering if there is a reason why the TOC and section headings are
missing? (that isn't apparent from the document)

The TOC is built into the PDF and it's up to the PDF viewer to display it.

Here's a screenshot of the ref guide in Adobe Reader with a clickable
TOC open.

https://www.dropbox.com/s/3ajuri1emj61imu/refguide-5.0-TOC.png?dl=0

Section numbering might be a good idea, if it's not too intrusive or
difficult.

Thanks,
Shawn






RE: Delimited payloads input issue

2015-03-06 Thread Markus Jelsma
Well, the only work-around we found to actually work properly is to override 
the problem causing tokenizer implementations on by one. Regarding the 
WordDelimiterFilter, the quickest fix is enabling keepOriginal, if you don't 
want the original to stick around, the filter implementation must be modified 
to carry the original PayloadAttribute to its descendants.

Markus
 
 
-Original message-
 From:Markus Jelsma markus.jel...@openindex.io
 Sent: Friday 27th February 2015 17:28
 To: solr-user solr-user@lucene.apache.org
 Subject: Delimited payloads input issue
 
 Hi - we attempt to use payloads to identify different parts of extracted HTML 
 pages and use the DelimitedPayloadTokenFilter to assign the correct payload 
 to the tokens. However, we are having issues for some language analyzers and 
 issues with some types of content for most regular analyzers.
 
 If we, for example, want to assign payloads to the text within an H1 field 
 that contains non-alphanumerics such as `Hello, i am a heading!`, and use |5 
 as delimiter and payload, we send the following to Solr, `Hello,|5 i|5 am|5 
 a|5 heading!|5`.
 This is not going to work because due to a WordDelimiterFilter, the tokens 
 Hello and heading obviously loose their payload. We also cannot put the 
 payload between the last alphanumeric and the following comma or exlamation 
 mark because then those characters would become part of the payload if we use 
 identity encoder, or it should fail if we use another encoder. We could solve 
 this using a custom encoder that only takes the first character and ignores 
 the rest, but this seems rather ugly.
 
 On the other hand, we have issues using language specific tokenizers such as 
 Kuromoji, i will immediately dump the delimited payload so it never reaches 
 the DelimitedPayloadTokenFilter. And if we try chinese and have the 
 StandardTokenizer enabled, we also loose the delimited payload.
 
 Any of you have dealt with this before? Hints to share?
 
 Many thanks,
 Markus
 


Re: Apache Solr Reference Guide 5.0

2015-03-06 Thread Shawn Heisey
On 3/6/2015 10:20 AM, Patrick Durusau wrote:
 I was looking at the PDF version of the Apache Solr Reference Guide
 5.0 and noticed that it has no TOC nor any section numbering.
 http://apache.claz.org/lucene/solr/ref-guide/apache-solr-ref-guide-5.0.pdf

 The lack of a TOC and section headings makes navigation difficult.

 I have just started making suggestions on the documentation and was
 wondering if there is a reason why the TOC and section headings are
 missing? (that isn't apparent from the document)

The TOC is built into the PDF and it's up to the PDF viewer to display it.

Here's a screenshot of the ref guide in Adobe Reader with a clickable
TOC open.

https://www.dropbox.com/s/3ajuri1emj61imu/refguide-5.0-TOC.png?dl=0

Section numbering might be a good idea, if it's not too intrusive or
difficult.

Thanks,
Shawn



How to direct SOLR 4.9 log output to regular Tomcat logs

2015-03-06 Thread tuxedomoon
I want SOLR 4.9 to log to my rolling tomcat logs like
catalina.2015-03-06.log.  Instead I'm just getting a solr.log with no
timestamp.  Maybe this is this just the way it has to be now?

I'm also not sure if I need to copy more SOLR jars into my tomcat lib.  

This is my setup.


tomcat6/conf/log4j.properties

log4j.rootLogger=debug, R
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=${catalina.home}/logs/tomcat.log
log4j.appender.R.MaxFileSize=10MB
log4j.appender.R.MaxBackupIndex=10
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n
log4j.logger.org.apache.catalina=DEBUG, R
log4j.logger.org.apache.catalina.core.ContainerBase.[Catalina].[localhost]=DEBUG,
R
log4j.logger.org.apache.catalina.core=DEBUG, R
log4j.logger.org.apache.catalina.session=DEBUG, R


tomcat6/conf/logging.properties
-
handlers = 1catalina.org.apache.juli.FileHandler,
2localhost.org.apache.juli.FileHandler,
3manager.org.apache.juli.FileHandler,
4host-manager.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler

.handlers = 1catalina.org.apache.juli.FileHandler,
java.util.logging.ConsoleHandler

1catalina.org.apache.juli.FileHandler.level = FINE
1catalina.org.apache.juli.FileHandler.directory = /data/tomcatlogs
1catalina.org.apache.juli.FileHandler.prefix = catalina.

2localhost.org.apache.juli.FileHandler.level = FINE
2localhost.org.apache.juli.FileHandler.directory = /data/tomcatlogs
2localhost.org.apache.juli.FileHandler.prefix = localhost.

3manager.org.apache.juli.FileHandler.level = FINE
3manager.org.apache.juli.FileHandler.directory = /data/tomcatlogs
3manager.org.apache.juli.FileHandler.prefix = manager.

4host-manager.org.apache.juli.FileHandler.level = FINE
4host-manager.org.apache.juli.FileHandler.directory = /data/tomcatlogs
4host-manager.org.apache.juli.FileHandler.prefix = host-manager.

java.util.logging.ConsoleHandler.level = FINE
java.util.logging.ConsoleHandler.formatter =
java.util.logging.SimpleFormatter

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = INFO
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].handlers =
2localhost.org.apache.juli.FileHandler

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].level
= INFO
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].handlers
= 3manager.org.apache.juli.FileHandler

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host-manager].level
= INFO
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host-manager].handlers
= 4host-manager.org.apache.juli.FileHandler


copied solr-4.9.0/example/lib/ext/*.jar to tomcat6/lib, not the solrj-lib +
dist jars as some tutorials suggested
--
jcl-over-slf4j-1.7.6.jar
jul-to-slf4j-1.7.6.jar
log4j-1.2.17.jar
slf4j-api-1.7.6.jar
slf4j-log4j12-1.7.6.jar


copied ./solr-4.9.0/example/resources/log4j.properties to tomcat6/lib and
pointed solr.log to my chosen directory.  I also have a
tomcat6/conf/log4j.properties and don't know if I should delete it.
--
#  Logging level
solr.log=/data/tomcatlogs
log4j.rootLogger=INFO, file, CONSOLE

log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender

log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%-4r [%t] %-5p %c %x \u2013
%m%n

#- size rotation with log cleanup.
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.MaxFileSize=4MB
log4j.appender.file.MaxBackupIndex=9

#- File to log to and log format
log4j.appender.file.File=${solr.log}/solr.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%-5p - %d{-MM-dd
HH:mm:ss.SSS}; %C; %m\n

log4j.logger.org.apache.zookeeper=WARN
log4j.logger.org.apache.hadoop=WARN

# set to INFO to enable infostream log messages
log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-direct-SOLR-4-9-log-output-to-regular-Tomcat-logs-tp4191502.html
Sent from the Solr - User mailing list archive at Nabble.com.


PostBind method for DocumentObjectBinder?

2015-03-06 Thread Karl Kildén
Hello,

DocumentObjectBinder would benefit from a post bind call imo. Something
like:

  public T ListT getBeans(ClassT clazz, SolrDocumentList solrDocList,
boolean postBind) {
ListDocField fields = getDocFields(clazz);
ListT result = new ArrayList(solrDocList.size());

for (SolrDocument sdoc : solrDocList) {
  T bean = getBean(clazz, fields, sdoc);
  if (postBind) {
  runAnnotatedMethod(bean, PostBind.class);
  }
result.add(bean);
}
return result;
  }
private void runAnnotatedMethod(final Object instance, Class? extends
Annotation annotation) {
for (Method m : instance.getClass().getDeclaredMethods()) {
if (m.isAnnotationPresent(annotation)) {
m.setAccessible(true);
try {
m.invoke(instance, new Object[] {});
} catch (Exception e) {
throw new BindingException(Could not run postbind  + instance.getClass(),
e);
}
}
}
}


Can probably take some thinking on how to do the API pretty staying
backwards compatible and the found annotated method should be cached.

WDYT?


Re: SolrCloud default shard assignment order not correct

2015-03-06 Thread Shawn Heisey
On 3/2/2015 2:12 PM, spillane wrote:
 Since the order is consistently 1,4,2,3 it sounds like I can start the
 leaders in 1,4,2,3 order and then replicas in 1,4,2,3 order and expect the
 relationships to stick 
 
 leader1 - replica1
 leader4 - replica4
 leader2 - replica2
 leader3 - replica3

In Solr 5.0, the cloud graph is sorting the collections by name.  The
shard names also appear to be sorted -- all the collections I have on
the example cloud setup only have two shards, so I really can't be sure.
 It might also be sorting the replicas within each shard.

I looked for an issue so I would know what version first included the
sort, but I could not find one.  I only know that 4.2 does not have the
sort, and 5.0 does.

Thanks,
Shawn



Re: Labels for facets on Velocity

2015-03-06 Thread gaohang wang
you can write a macro in you velocity template,which is used to show your
query response

2015-03-06 1:14 GMT+08:00 Henrique O. Santos hensan...@gmail.com:

 Hello,

 I’ve been trying to have a pretty name for my facets on Velocity Response
 Writer. Do you know how can I do that?

 For example, suppose that I am faceting field1. My query returns 3 facets:
 uglyfacet1, uglyfacet2 and uglyfacet3. I want to show them to the user a
 pretty name, like Pretty Facet 1, Pretty Facet 2 and Pretty Facet 3.

 The thing is that linking on velocity should still work, so the user can
 navigate the results.

 Thank you.
 Henrique.