Negative Query Behaviour in Solr 3.2

2013-07-31 Thread karanjindal
Hi All,

I am using solr 3.2 and confused how a particular query is executed.
q=name:memory OR -name:encoded
separately firing q=name:memory gives 3 results 
and q=-name:encoded gives 25 results and result sets are disjoint sets.

Since I am doing OR query it should return 28 results, but it is only
returning 3 results same as query (name:memory).

Can anyone explain?

-Karan




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Negative-Query-Behaviour-in-Solr-3-2-tp4081538.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-31 Thread Dotan Cohen
On Wed, Jul 31, 2013 at 4:56 AM, Bill Bell billnb...@gmail.com wrote:
 On Jul 30, 2013, at 12:34 PM, Dotan Cohen dotanco...@gmail.com wrote:
 On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal alghos...@gmail.com wrote:
 Does adding facet.mincount=2 help?

 In fact, when adding facet.mincount=20 (I know that some dupes are in
 the hundreds) I got the OutOfMemoryError in seconds instead of
 minutes.

 Dotan Cohen

 This seems like a fairly large issue. Can you create a Jira issue ?

 Bill Bell

I'll file an issue, but on what? What information should I include?
How is this different that what you would expect?

Thanks.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Solr Cloud Setup

2013-07-31 Thread Flavio Pompermaier
What was the problem..?


On Tue, Jul 30, 2013 at 10:33 PM, AdityaR aditya.ravinuth...@gmail.comwrote:

 I was able to get the setup to work.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Cloud-Setup-tp4080182p4081434.html
 Sent from the Solr - User mailing list archive at Nabble.com.



EmbeddedSolrServer Solr 4.4.0 bug?

2013-07-31 Thread Luis Cappa Banda
Hello guys,

Since I upgrade from 4.1.0 to 4.4.0 version I've noticed that
EmbeddedSolrServer has changed a little the way of construction:

*Solr 4.1.0 style:*

CoreContainer coreContainer = new CoreContainer(*solrHome, new
File(solrHome+/solr.xml*));
EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer,
core);

*Solr 4.4.0 new style:
*

CoreContainer coreContainer = new CoreContainer(*solrHome*);
EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer,
core);


However, it's not working. I've got the following solr.xml configuration
file:

*cores adminPath=/admin/cores defaultCoreName=core host=${host:}
hostPort=${jetty.port:8983} hostContext=${hostContext:solr}
zkClientTimeout=${zkClientTimeout:15000}
*
*core name=core instanceDir=core /*
*  /cores  *
*/solr*


And resources appears to be loaded correctly:

*2013-07-31 09:46:37,583 47889 [main] INFO  org.apache.solr.core.ConfigSolr
 - Loading container configuration from /opt/solr/solr.xml*


But when indexing into core with coreName 'core', it throws an Exception:

*2013-07-31 09:50:49,409 5189 [main] ERROR
com.buguroo.solr.index.WriteIndex  - No such core: core*

Or I am sleppy, something that's possible, or there is some kind of bug
here.

Best regards,

-- 
- Luis Cappa


SimplePostTool: FATAL: Solr returned an error #400 Bad Request

2013-07-31 Thread Vineet Mishra
Hi All

Currently I am in a mid of a project which Index some data to Solrs
multiple instance.

I have the Configuration as, on the same machine I have made multiple
instances of Solr

http://localhost:8080/solr1
http://localhost:8080/solr2
http://localhost:8080/solr3
http://localhost:8080/solr4
http://localhost:8080/solr5
http://localhost:8080/solr6

Now when I am posting the Data to Solr through SimplePostTool by passing a
xml file in spt.postFile(file) method and committing it there after.

This all process is Multithreaded and works fine till 1 Million of data
record but there after it suddenly stops saying,

*SimplePostTool: FATAL: Solr returned an error #400 Bad Request*
*
*
in the Tomcat Catalina I found

*WARNING: Failed to register info bean: searcher*
*javax.management.InstanceAlreadyExistsException:
solr/:type=searcher,id=org.apache.solr.search.SolrIndexSearcher*
* at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)*
* at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
*
* at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
*
* at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
*
* at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
*
* at
com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:513)
*
* at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:141)*
* at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)*
* at
org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:220)
*
* at org.apache.solr.core.SolrCore.registerSearcher(SolrCore.java:1349)*
* at org.apache.solr.core.SolrCore.access$000(SolrCore.java:84)*
* at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1247)*
* at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)*
* at java.util.concurrent.FutureTask.run(FutureTask.java:166)*
* at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
*
* at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
*
* at java.lang.Thread.run(Thread.java:722)*
*
*
*Jul 31, 2013 12:46:00 PM org.apache.solr.core.SolrCore registerSearcher*
*INFO: [] Registered new searcher Searcher@5fa1891b main*
*Jul 31, 2013 12:46:00 PM org.apache.solr.search.SolrIndexSearcher close*

Has anybody traced such issue. Please this is really very Urgent and
Important. Waiting for your response.

Thanks and Regards
Vineet


Re: EmbeddedSolrServer Solr 4.4.0 bug?

2013-07-31 Thread Alan Woodward
Hi Luis,

You need to call coreContainer.load() after construction for it to load the 
cores.  Previously the CoreContainer(solrHome, configFile) constructor also 
called load(), but this was the only constructor to do that.

I probably need to put something in CHANGES.txt to point this out...

Alan Woodward
www.flax.co.uk


On 31 Jul 2013, at 08:53, Luis Cappa Banda wrote:

 Hello guys,
 
 Since I upgrade from 4.1.0 to 4.4.0 version I've noticed that
 EmbeddedSolrServer has changed a little the way of construction:
 
 *Solr 4.1.0 style:*
 
 CoreContainer coreContainer = new CoreContainer(*solrHome, new
 File(solrHome+/solr.xml*));
 EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer,
 core);
 
 *Solr 4.4.0 new style:
 *
 
 CoreContainer coreContainer = new CoreContainer(*solrHome*);
 EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer,
 core);
 
 
 However, it's not working. I've got the following solr.xml configuration
 file:
 
 *cores adminPath=/admin/cores defaultCoreName=core host=${host:}
 hostPort=${jetty.port:8983} hostContext=${hostContext:solr}
 zkClientTimeout=${zkClientTimeout:15000}
 *
 *core name=core instanceDir=core /*
 *  /cores  *
 */solr*
 
 
 And resources appears to be loaded correctly:
 
 *2013-07-31 09:46:37,583 47889 [main] INFO  org.apache.solr.core.ConfigSolr
 - Loading container configuration from /opt/solr/solr.xml*
 
 
 But when indexing into core with coreName 'core', it throws an Exception:
 
 *2013-07-31 09:50:49,409 5189 [main] ERROR
 com.buguroo.solr.index.WriteIndex  - No such core: core*
 
 Or I am sleppy, something that's possible, or there is some kind of bug
 here.
 
 Best regards,
 
 -- 
 - Luis Cappa



result grouping and paging, solr 4.21

2013-07-31 Thread Gunnar

Hello,

I'm trying to page results with grouping /field collapsing. My query is:

?q=myKeywordsstart=0rows=100group=truegroup.field=myGroupFieldgroup.format=simplegroup.limit=1

The result will contain 70 groups, is there a way to get 100 records 
returned, means 70 from each group first doc and second docs

from the first 30 groups?

Thanks,

Gunnar


Re: EmbeddedSolrServer Solr 4.4.0 bug?

2013-07-31 Thread Luis Cappa Banda
Thank you very much, Alan. Now it's working! I agree with you: this kind of
things should be documented at least in CHANGELOG.txt, because when
upgrading from one version to another all should be compatible between
versions, but this is not the case, thus people should be noticed of that.

Regards,


2013/7/31 Alan Woodward a...@flax.co.uk

 Hi Luis,

 You need to call coreContainer.load() after construction for it to load
 the cores.  Previously the CoreContainer(solrHome, configFile) constructor
 also called load(), but this was the only constructor to do that.

 I probably need to put something in CHANGES.txt to point this out...

 Alan Woodward
 www.flax.co.uk


 On 31 Jul 2013, at 08:53, Luis Cappa Banda wrote:

  Hello guys,
 
  Since I upgrade from 4.1.0 to 4.4.0 version I've noticed that
  EmbeddedSolrServer has changed a little the way of construction:
 
  *Solr 4.1.0 style:*
 
  CoreContainer coreContainer = new CoreContainer(*solrHome, new
  File(solrHome+/solr.xml*));
  EmbeddedSolrServer localSolrServer = new
 EmbeddedSolrServer(coreContainer,
  core);
 
  *Solr 4.4.0 new style:
  *
 
  CoreContainer coreContainer = new CoreContainer(*solrHome*);
  EmbeddedSolrServer localSolrServer = new
 EmbeddedSolrServer(coreContainer,
  core);
 
 
  However, it's not working. I've got the following solr.xml configuration
  file:
 
  *cores adminPath=/admin/cores defaultCoreName=core host=${host:}
  hostPort=${jetty.port:8983} hostContext=${hostContext:solr}
  zkClientTimeout=${zkClientTimeout:15000}
  *
  *core name=core instanceDir=core /*
  *  /cores  *
  */solr*
 
 
  And resources appears to be loaded correctly:
 
  *2013-07-31 09:46:37,583 47889 [main] INFO
  org.apache.solr.core.ConfigSolr
  - Loading container configuration from /opt/solr/solr.xml*
 
 
  But when indexing into core with coreName 'core', it throws an Exception:
 
  *2013-07-31 09:50:49,409 5189 [main] ERROR
  com.buguroo.solr.index.WriteIndex  - No such core: core*
 
  Or I am sleppy, something that's possible, or there is some kind of bug
  here.
 
  Best regards,
 
  --
  - Luis Cappa




-- 
- Luis Cappa


Solr PolyField

2013-07-31 Thread Luís Portela Afonso
Hi, I'm trying to create a field with multiple fields inside, that is:

origin: {
htmlUrl: http://www.gazzetta.it/;,
streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
title: Gazzetta.it
},

Get something like this. Is that possible? I'm using Solr 4.4.0.

Thanks

smime.p7s
Description: S/MIME cryptographic signature


Sharding with a SolrCloud

2013-07-31 Thread Oliver Goldschmidt
Hi list,

I have a Solr server, which uses sharding to make distributed search
with another Solr server. The other Solr server now migrates to a Solr
Cloud system. I've been trying recently to continue searching the Solr
Cloud as a shard for my Solr server, but this is failing with mysterious
effects. I am getting a result with a number of hits, when I perform a
search, but the results are not displayed at all. This is the resonse
header I am getting from Solr:

{
  responseHeader:{
status:0,
QTime:305,
params:{
  facet:true,
  indent:yes,
  facet.mincount:1,
  facet.limit:30,
  qf:title_short^750 title_full_unstemmed^600,
  json.nl:arrarr,
  wt:json,
  rows:20,
  shards:ourindex.nowhere.de/solr/index,
  bq:format:Book^500,
  fl:*,score,
  facet.sort:count,
  start:0,
  q:xml,
  shards.info:true,
  facet.prefix:,
  facet.field:[publishDate],
  qt:dismax}},
  shards.info:{
ourindex.nowhere.de/solr/index:{
  numFound:10076,
  maxScore:8.507474,
  time:263}},
  response:{numFound:10056,start:0,maxScore:8.507474,docs:[]
  }

As you can see, there are no docs in the result. This result is not 100%
reproducable: sometimes I get no results displayed, other times it works
(with the same query URL!). As you also can see in the result, the
number of hits in the response is a little bit less than the number of
hits sent from the shard.

This makes me wonder if it is not possible to use a Solr Cloud as a
shard for another standalone Solr server?

Any hint is appreciated!

Best
- Oliver

-- 
Oliver Goldschmidt
TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
Denickestr. 22
21071 Hamburg - Harburg
Tel.+49 (0)40 / 428 78 - 32 91
eMail   o.goldschm...@tuhh.de
--
GPG/PGP-Schlüssel: 
http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc



Solr show total row count in response of full import

2013-07-31 Thread Sandro Zbinden
Hey there

Is there a way to show the total row count (documents that will be inserted) 
when executing a full import through the Data Import Request handler ?

Currently after executing a full import and pointing to solrcore/dataimport 
you can get  the total rows processed

str name=Total Documents Processed6354/str

It would be nice if you could receive a total row count like

str name=Total Documents10100/str

With this information we could add another information like

str name=Imported in Percent 62.91/str

This would make it easier to generate a progress bar for the end user.


Best regards

Sandro Zbinden



Re: Negative Query Behaviour in Solr 3.2

2013-07-31 Thread Mikhail Khludnev
Can you try:

q=+name:memory -name:encoded
or
q=name:memory AND -name:encoded



On Wed, Jul 31, 2013 at 10:14 AM, karanjindal 
karan_jin...@students.iiit.ac.in wrote:

 Hi All,

 I am using solr 3.2 and confused how a particular query is executed.
 q=name:memory OR -name:encoded
 separately firing q=name:memory gives 3 results
 and q=-name:encoded gives 25 results and result sets are disjoint sets.

 Since I am doing OR query it should return 28 results, but it is only
 returning 3 results same as query (name:memory).

 Can anyone explain?

 -Karan




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Negative-Query-Behaviour-in-Solr-3-2-tp4081538.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


TrieField and FieldCache confusion

2013-07-31 Thread Paul Masurel
Hello everyone,

I have a question about Solr TrieField and Lucene FieldCache.

From my understanding, Solr added the implementation of TrieField to
perform faster range queries.
For each value it will index multiple terms. The n-th term being a masked
version of our value,
showing only it first (precisionStep * n) bits.

When uninverting the field to populate a FieldCache, the last value with
regard
to the lexicographical order will be retained ; which from my understanding
should
be the term with the highest precision.

Can I expect the FieldCache of Lucene to return the correct values when
working
with TrieField with the precisionStep higher than 0. If not, what did I get
wrong?

Regards,

Paul Masurel
e-mail: paul.masu...@gmail.com


Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-31 Thread Mikhail Khludnev
fwiw,

this code won't capture uncommitted duplicates.


On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen dotanco...@gmail.com wrote:

 On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
 j...@basetechnology.com wrote:
  The Solr SignatureUpdateProcessorFactory is designed to facilitate
 dedupe...
  any particular reason you did not use it?
 
  See:
  http://wiki.apache.org/solr/Deduplication
 
  and
 
  https://cwiki.apache.org/confluence/display/solr/De-Duplication
 

 Actually, the guy who made the changes (a coworker) did in fact write
 an alternative UpdateHandler. I've just noticed that there are a bunch
 of dupes right now, though.

 public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 {

 public DiscoAPIUpdateHandler(SolrCore core) {
 super(core);
 }

 @Override
 public int  addDoc(AddUpdateCommand cmd) throws IOException{

 // if overwrite is set to false we'll use the
 DefaultUpdateHandler2 , this is done for debugging to insert
 duplicates to solr
 if (!cmd.overwrite) return super.addDoc(cmd);


 // when using ref counted objects you have!! to decrement the
 ref count when your done
 RefCountedSolrIndexSearcher indexSearcher =
 this.core.getNewestSearcher(false);

 // the idea is like this we'll make an internal lucene query
 and check if that id already exists

 Term updateTerm = null;


 if (cmd.updateTerm != null){
 updateTerm = cmd.updateTerm;
 } else {
 updateTerm = new Term(id,cmd.getIndexedId());
 }


 Query query = new TermQuery(updateTerm);
 TopDocs docs = indexSearcher.get().search(query,2);

 if (docs.totalHits0){
 // index searcher is no longer needed
 indexSearcher.decref();
 // don't add the new document
 return 0;
 }

 // index searcher is no longer needed
 indexSearcher.decref();

 // if i'm here then it's a new document
 return super.addDoc(cmd);

 }

 }


  And I give a bunch of examples in my book.
 

 I anticipate the book with esteem!

 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Performance question on Spatial Search

2013-07-31 Thread Mikhail Khludnev
On Wed, Jul 31, 2013 at 1:10 AM, Steven Bower sbo...@alcyon.net wrote:


 not sure what you mean by good hit raitio?


I mean such queries are really expensive (even on cache hit), so if the
list of ids changes every time, it never hit cache and hence executes these
heavy queries every time. It's well known performance problem.


 Here are the stacks...

they seems like hotspots, and shows index reading that's reasonable. But I
can't see what caused these readings, to get that I need whole stack of hot
thread.



   Name Time (ms) Own Time (ms)

 org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext,
 Bits) 300879 203478

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc()
 45539 19

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs()
 45519 40

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readVIntBlock(IndexInput,
 int[], int[], int, boolean) 24352 0
 org.apache.lucene.store.DataInput.readVInt() 24352 24352
 org.apache.lucene.codecs.lucene41.ForUtil.readBlock(IndexInput, byte[],
 int[]) 21126 14976
 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
 6150 0  java.nio.DirectByteBuffer.get(byte[], int, int)
 6150 0
 java.nio.Bits.copyToArray(long, Object, long, long, long) 6150 6150

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(Bits,
 DocsEnum, int) 35342 421

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.decodeMetaData()
 34920 27939

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.nextTerm(FieldInfo,
 BlockTermState) 6980 6980

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next()
 14129 1053

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadNextFloorBlock()
 5948 261

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
 5686 199
 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
 3606 0  java.nio.DirectByteBuffer.get(byte[], int, int)
 3606 0
 java.nio.Bits.copyToArray(long, Object, long, long, long) 3606 3606

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput,
 FieldInfo, BlockTermState) 1879 80
 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
 1798 0java.nio.DirectByteBuffer.get(byte[], int, int)
 1798 0
 java.nio.Bits.copyToArray(long, Object, long, long, long) 1798 1798

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next()
 4010 3324

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextNonLeaf()
 685 685

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
 3117 144
 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
 1861 0java.nio.DirectByteBuffer.get(byte[], int, int) 1861
 0
 java.nio.Bits.copyToArray(long, Object, long, long, long) 1861 1861

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput,
 FieldInfo, BlockTermState) 1090 19
 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
 1070 0  java.nio.DirectByteBuffer.get(byte[], int, int)
 1070 0
 java.nio.Bits.copyToArray(long, Object, long, long, long) 1070 1070

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.initIndexInput()
 20 0org.apache.lucene.store.ByteBufferIndexInput.clone()
 20 0
 org.apache.lucene.store.ByteBufferIndexInput.clone() 20 0
 org.apache.lucene.store.ByteBufferIndexInput.buildSlice(long, long) 20
 0
 org.apache.lucene.util.WeakIdentityMap.put(Object, Object) 20 0
 org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference.init(Object,
 ReferenceQueue) 20 0
 java.lang.System.identityHashCode(Object) 20 20
 org.apache.lucene.index.FilteredTermsEnum.docs(Bits, DocsEnum, int)
 1485 527

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(Bits,
 DocsEnum, int) 957 0

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.decodeMetaData()
 957 513

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.nextTerm(FieldInfo,
 BlockTermState) 443 443
 org.apache.lucene.index.FilteredTermsEnum.next() 874 324

 org.apache.lucene.search.NumericRangeQuery$NumericRangeTermsEnum.accept(BytesRef)
 368 0

 org.apache.lucene.util.BytesRef$UTF8SortedAsUnicodeComparator.compare(Object,
 Object) 368 368

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next()
 160 0

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadNextFloorBlock()
 160 0

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
 160 0
 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
 120
 0

 

Re: Improper shutdown of Solr in Jetty 9

2013-07-31 Thread Artem Karpenko

Hello Dmitry,

it's Windows 7. I'm starting Jetty with java -jar start.jar

31.07.2013 12:36, Dmitry Kan пишет:

Artem,

Whats the OS are using?
So far jetty 9 with solr 4.3.1 works ok under ubuntu 12.04.
On 30 Jul 2013 17:23, Alexandre Rafalovitch arafa...@gmail.com wrote:


Of course, I meant Jetty (not Tomcat). So apologies for spam and confusion
of my own. The rest of the statement stands.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Jul 30, 2013 at 10:20 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:


Thanks for letting us know. See if you can add it to the documentation
somewhere.

Solr is not using Tomcat 9, but I believe that was primarily because
Tomcat 9 requires Java 7 and Solr 4.x is staying with Java 6 as minimum
requirement.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Jul 30, 2013 at 10:09 AM, Artem Karpenko a.karpe...@oxseed.com
wrote:


Uh, sorry for spamming, but if anyone interested there is a way to
properly shutdown Jetty when it's launched with --exec flag.
You can use JMX to invoke method stop() on the Jetty's Server MBean.

This

triggers a proper shutdown with all Solr's close() callbacks executed.
I wonder why it's not noted at least in Jetty documentation.

Regards,
Artem Karpenko.

30.07.2013 16:58, Artem Karpenko пишет:

  After some investigation I found that the problem is not with Jetty's

version but usage of --exec flag.
Namely, when --exec is used (to specify JVM args) then shutdown is not
graceful, it seems that Java process that is just killed.
Not sure how to handle this...

Regards,
Artem Karpenko.

29.07.2013 16:51, Artem Karpenko пишет:


Hi,

I can't make Solr shut down properly when using Jetty 9. Tested this
with a simple plugin that only extends DirectUpdateHandler2, creates a
file in constructor and deletes it in close(). While it's working fine
in the example installation (the one that can be downloaded from Solr
site) and in the simple custom installation with Jetty 8, it won't in
Jetty 9. There is not much logging at shutdown at all, just Jetty's
closing selector or smth., unlike with Jetty 8 where it prints

various

Graceful shutdown messages from Solr.

Installation procedure I used for both Jettys is rather simple: just

put

solr.war into webapps/ directory, plugin JAR into {core}/lib/ and
configure update handler in solrconfig.xml.
OS is Windows 7, Solr 4.4.
I tried to stop Jetty with both Ctrl+C and java start.jar [port/key
params] --stop. For Jetty 8 it works fine even with Ctrl+C.

Did anybody stumble on this issue?

Best regards,
Artem.






Re: Trying to determine the benefit of spellcheck-based suggester vs. using terms component?

2013-07-31 Thread Erick Erickson
The biggest thing is that the spellchecker has lots of knobs
to tune, all the stuff in
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate

TermsComponent, on the other hand, just gives you
what's in the index with essentially no knobs to tune.

So it depends on your goal. Typeahead or spelling
correction? In the first case I'd go for TermsComponent
and the second spell check as an example.

Best
Erick

On Tue, Jul 30, 2013 at 2:07 PM, Timothy Potter thelabd...@gmail.com wrote:
 Going over the comments in SOLR-1316, I seemed to have lost the
 forrest for the trees. What is the benefit of using the spellcheck
 based suggester over something like the terms component to get
 suggestions as the user types?

 Maybe it is faster because it builds the in-memory data structure on
 commit? Seems like the terms component is pretty fast too.

 I'd appreciate any additional insights about this. There are so many
 solutions to auto-suggest for Solr, it's hard to decide what
 approach to take.

 Cheers,
 Tim


Re: SimplePostTool: FATAL: Solr returned an error #400 Bad Request

2013-07-31 Thread Erick Erickson
Probably not the root of your problem, but
bq: and committing it there after.

Does that mean you're calling  commit after every
document? This is usually poor practice, I'd set
the autocommit intervals on solrconfig.xml and NOT
call commit explicitly.

Does the same document fail every time? What does
it look like?

You really haven't provided much information
to go on.

Best
Erick

On Wed, Jul 31, 2013 at 3:55 AM, Vineet Mishra clearmido...@gmail.com wrote:
 Hi All

 Currently I am in a mid of a project which Index some data to Solrs
 multiple instance.

 I have the Configuration as, on the same machine I have made multiple
 instances of Solr

 http://localhost:8080/solr1
 http://localhost:8080/solr2
 http://localhost:8080/solr3
 http://localhost:8080/solr4
 http://localhost:8080/solr5
 http://localhost:8080/solr6

 Now when I am posting the Data to Solr through SimplePostTool by passing a
 xml file in spt.postFile(file) method and committing it there after.

 This all process is Multithreaded and works fine till 1 Million of data
 record but there after it suddenly stops saying,

 *SimplePostTool: FATAL: Solr returned an error #400 Bad Request*
 *
 *
 in the Tomcat Catalina I found

 *WARNING: Failed to register info bean: searcher*
 *javax.management.InstanceAlreadyExistsException:
 solr/:type=searcher,id=org.apache.solr.search.SolrIndexSearcher*
 * at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)*
 * at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
 *
 * at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
 *
 * at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
 *
 * at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
 *
 * at
 com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:513)
 *
 * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:141)*
 * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)*
 * at
 org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:220)
 *
 * at org.apache.solr.core.SolrCore.registerSearcher(SolrCore.java:1349)*
 * at org.apache.solr.core.SolrCore.access$000(SolrCore.java:84)*
 * at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1247)*
 * at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)*
 * at java.util.concurrent.FutureTask.run(FutureTask.java:166)*
 * at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 *
 * at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 *
 * at java.lang.Thread.run(Thread.java:722)*
 *
 *
 *Jul 31, 2013 12:46:00 PM org.apache.solr.core.SolrCore registerSearcher*
 *INFO: [] Registered new searcher Searcher@5fa1891b main*
 *Jul 31, 2013 12:46:00 PM org.apache.solr.search.SolrIndexSearcher close*

 Has anybody traced such issue. Please this is really very Urgent and
 Important. Waiting for your response.

 Thanks and Regards
 Vineet


Re: result grouping and paging, solr 4.21

2013-07-31 Thread Erick Erickson
Not that I know of. Grouping pretty much treats all groups the same...

Best
Erick

On Wed, Jul 31, 2013 at 4:14 AM, Gunnar glus...@akitogo.com wrote:
 Hello,

 I'm trying to page results with grouping /field collapsing. My query is:

 ?q=myKeywordsstart=0rows=100group=truegroup.field=myGroupFieldgroup.format=simplegroup.limit=1

 The result will contain 70 groups, is there a way to get 100 records
 returned, means 70 from each group first doc and second docs
 from the first 30 groups?

 Thanks,

 Gunnar


Working with solr over two different db schemas

2013-07-31 Thread Mysurf Mail
Been working on it for quitre some time.

this is my config



dataConfig
dataSource type=JdbcDataSource name=ds1
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://...:1433;databaseName=A
user=XX password=XX /
document

  entity name=PackageVersion pk=PackageVersionId
query= /*PackageVersion.Query*/ select PackageVersion.Id PackageVersionId,
PackageVersion.VersionNumber, CONVERT(char(19),
PackageVersion.LastModificationTime ,126) + 'Z' LastModificationTime,
Package.Id PackageId, Package.Name
PackageName, PackageVersion.Comments PackageVersionComments,
Package.CreatedBy CreatedBy
from [dbo].[Package] Package inner join [dbo].[PackageVersion]
PackageVersion on Package.Id = PackageVersion.PackageId
where Package.RecordStatusId=0 and PackageVersion.RecordStatusId=0
 entity name=PackageTag pk=ResourceId
processor=CachedSqlEntityProcessor cacheKey=ResourceId
cacheLookup=PackageVersion.PackageId
query=/*PackageTag.Query*/
select ResourceId,[Text] PackageTag
from [dbo].[Tag] Tag
Where ResourceType = 0/
/entity
  /document
/dataConfig

Now, this runs in my test env and the only thing I do is change the
configuration to another db( and as a result also the schema name from
[dbo] to another )
This result in a totally different behavior.
In the first configuration the selects were done be this order - inner
object and then outer object. which means that the cache works.
In the second configuration - over the other db the order was first the
outer and then the inner. cache did not work at all.
the inner query is not stored at all.

What could be the problem?


queryResultCache showing all zeros

2013-07-31 Thread Jean-Sebastien Vachon
Hi,

We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran about 200 
000 queries taken from our production environment and measured the performance 
of the cloud over a collection of 14M documents with the default Solr settings. 
We are now trying to tune the different caches and when I look at each node of 
the cloud, all of them are showing no activity (see below) regarding the 
queryResultCache... all other caches are showing some activity. Any idea what 
could cause this?


  *

org.apache.solr.search.LRUCache
  *

version:
1.0
  *

description:
LRU Cache(maxSize=512, initialSize=512)
  *

src:
$URL: 
https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_3/?solr/?core/?src/?java/?org/?apache/?solr/?search/?LRUCache.javahttps://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java
 $
  *   stats:
 *

lookups:
0
 *

hits:
0
 *

hitratio:
0.00
 *

inserts:
0
 *

evictions:
0
 *

size:
0
 *

warmupTime:
0
 *

cumulative_lookups:
0
 *

cumulative_hits:
0
 *

cumulative_hitratio:
0.00
 *

cumulative_inserts:
0
 *

cumulative_evictions:
0




Re: Negative Query Behaviour in Solr 3.2

2013-07-31 Thread Jack Krupansky
Since there are no parentheses, the terms and operators are all at the save 
level and the OR is essentially a redundant operator and ignored, so:


q=name:memory OR -name:encoded

is treated as:

q=name:memory -name:encoded

When what you probably want is:

q=name:memory OR (-name:encoded)

BUT... a bug/deficiency prevents Solr from handling pure-negative 
sub-queries properly, so you have to add a *:*:


q=name:memory OR (*:* -name:encoded)

So that reads ... or any documents that do not contain encoded in the name 
field, which is equivalent to ... or all documents except those that have 
encoded in the name field.


-- Jack Krupansky

-Original Message- 
From: karanjindal

Sent: Wednesday, July 31, 2013 2:14 AM
To: solr-user@lucene.apache.org
Subject: Negative Query Behaviour in Solr 3.2

Hi All,

I am using solr 3.2 and confused how a particular query is executed.
q=name:memory OR -name:encoded
separately firing q=name:memory gives 3 results
and q=-name:encoded gives 25 results and result sets are disjoint sets.

Since I am doing OR query it should return 28 results, but it is only
returning 3 results same as query (name:memory).

Can anyone explain?

-Karan




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Negative-Query-Behaviour-in-Solr-3-2-tp4081538.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: SimplePostTool: FATAL: Solr returned an error #400 Bad Request

2013-07-31 Thread Vineet Mishra
I got it resolved, actually the error trace was even more above this one.
It was just that the posting XML was not forming properly for the Solr
field *Date* which usually takes the format

*2006-07-15T22:18:48Z*
*
*
This is the standard format for the Solr date(datatype) which follows
specifically some of the pattern mentioned.


   - 1995-12-31T23:59:59Z
   - 1995-12-31T23:59:59.9Z
   - 1995-12-31T23:59:59.99Z
   - 1995-12-31T23:59:59.999Z


As documented by Solr http://www.meticent.com/DAt( *www.meticent.com/DAt*)

By the way thanks!
Vineet


On Wed, Jul 31, 2013 at 4:47 PM, Erick Erickson erickerick...@gmail.comwrote:

 Probably not the root of your problem, but
 bq: and committing it there after.

 Does that mean you're calling  commit after every
 document? This is usually poor practice, I'd set
 the autocommit intervals on solrconfig.xml and NOT
 call commit explicitly.

 Does the same document fail every time? What does
 it look like?

 You really haven't provided much information
 to go on.

 Best
 Erick

 On Wed, Jul 31, 2013 at 3:55 AM, Vineet Mishra clearmido...@gmail.com
 wrote:
  Hi All
 
  Currently I am in a mid of a project which Index some data to Solrs
  multiple instance.
 
  I have the Configuration as, on the same machine I have made multiple
  instances of Solr
 
  http://localhost:8080/solr1
  http://localhost:8080/solr2
  http://localhost:8080/solr3
  http://localhost:8080/solr4
  http://localhost:8080/solr5
  http://localhost:8080/solr6
 
  Now when I am posting the Data to Solr through SimplePostTool by passing
 a
  xml file in spt.postFile(file) method and committing it there after.
 
  This all process is Multithreaded and works fine till 1 Million of data
  record but there after it suddenly stops saying,
 
  *SimplePostTool: FATAL: Solr returned an error #400 Bad Request*
  *
  *
  in the Tomcat Catalina I found
 
  *WARNING: Failed to register info bean: searcher*
  *javax.management.InstanceAlreadyExistsException:
  solr/:type=searcher,id=org.apache.solr.search.SolrIndexSearcher*
  * at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)*
  * at
 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
  *
  * at
 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
  *
  * at
 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
  *
  * at
 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
  *
  * at
 
 com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:513)
  *
  * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:141)*
  * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)*
  * at
 
 org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:220)
  *
  * at org.apache.solr.core.SolrCore.registerSearcher(SolrCore.java:1349)*
  * at org.apache.solr.core.SolrCore.access$000(SolrCore.java:84)*
  * at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1247)*
  * at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)*
  * at java.util.concurrent.FutureTask.run(FutureTask.java:166)*
  * at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  *
  * at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  *
  * at java.lang.Thread.run(Thread.java:722)*
  *
  *
  *Jul 31, 2013 12:46:00 PM org.apache.solr.core.SolrCore registerSearcher*
  *INFO: [] Registered new searcher Searcher@5fa1891b main*
  *Jul 31, 2013 12:46:00 PM org.apache.solr.search.SolrIndexSearcher close*
 
  Has anybody traced such issue. Please this is really very Urgent and
  Important. Waiting for your response.
 
  Thanks and Regards
  Vineet



Unexpected character '' (code 60) expected '='

2013-07-31 Thread Vineet Mishra
Hi All

I am currently stuck in a Solr Issue while Posting some data to Solr Server.

I have some record from Hbase which I am posting to Solr, but after posting
some 1 Million of data records, it suddenly stopped. Checking the Catalina
log trace it showed,

*org.apache.solr.common.SolrException: Unexpected character '' (code 60)
expected '='*
*
*
*
*
I am not sure whether its the issue with some malformed data for the
posting, because whatever xml file which I am generating before posting I
have tried posting that specific file to the solr and its going well.

Below is the whole log trace,


*SEVERE: org.apache.solr.common.SolrException: Unexpected character ''
(code 60) expected '='*
* at [row,col {unknown-source}]: [20281,18]*
* at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)*
* at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
*
* at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
*
* at org.apache.solr.core.SolrCore.execute(SolrCore.java:1398)*
* at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
*
* at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
*
* at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
*
* at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
*
* at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
*
* at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
*
* at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
*
* at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
*
* at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
*
* at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)*
* at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)*
* at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
*
* at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
*
* at java.lang.Thread.run(Thread.java:722)*
*Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
character '' (code 60) expected '='*
* at [row,col {unknown-source}]: [20281,18]*
* at
com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)*
* at
com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3001)
*
* at
com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
*
* at
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)*
* at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)*
* at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:295)*
* at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)*
* at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)*
* ... 17 more*
*
*
Has anybody faced this issue.

Thanks and Regards
Vineet


RE: Unexpected character '' (code 60) expected '='

2013-07-31 Thread Markus Jelsma
This file is malformed:

*SEVERE: org.apache.solr.common.SolrException: Unexpected character ''
(code 60) expected '='*
* at [row,col {unknown-source}]: [20281,18]*

Check row 20281 column 18
 
 
-Original message-
 From:Vineet Mishra clearmido...@gmail.com
 Sent: Wednesday 31st July 2013 15:05
 To: solr-user@lucene.apache.org
 Subject: Unexpected character 'lt;' (code 60) expected '='
 
 Hi All
 
 I am currently stuck in a Solr Issue while Posting some data to Solr Server.
 
 I have some record from Hbase which I am posting to Solr, but after posting
 some 1 Million of data records, it suddenly stopped. Checking the Catalina
 log trace it showed,
 
 *org.apache.solr.common.SolrException: Unexpected character '' (code 60)
 expected '='*
 *
 *
 *
 *
 I am not sure whether its the issue with some malformed data for the
 posting, because whatever xml file which I am generating before posting I
 have tried posting that specific file to the solr and its going well.
 
 Below is the whole log trace,
 
 
 *SEVERE: org.apache.solr.common.SolrException: Unexpected character ''
 (code 60) expected '='*
 * at [row,col {unknown-source}]: [20281,18]*
 * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)*
 * at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
 *
 * at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 *
 * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1398)*
 * at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 *
 * at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 *
 * at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 *
 * at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 *
 * at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 *
 * at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 *
 * at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 *
 * at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
 *
 * at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 *
 * at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)*
 * at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)*
 * at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
 *
 * at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 *
 * at java.lang.Thread.run(Thread.java:722)*
 *Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
 character '' (code 60) expected '='*
 * at [row,col {unknown-source}]: [20281,18]*
 * at
 com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)*
 * at
 com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3001)
 *
 * at
 com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
 *
 * at
 com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)*
 * at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)*
 * at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:295)*
 * at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)*
 * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)*
 * ... 17 more*
 *
 *
 Has anybody faced this issue.
 
 Thanks and Regards
 Vineet
 


RE: new field type - enum field

2013-07-31 Thread Elran Dvir
Hi,

I have managed to attach the patch in Jira.

Thanks.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, July 29, 2013 2:15 PM
To: solr-user@lucene.apache.org
Subject: Re: new field type - enum field

OK, if you can attach it to an e-mail, I'll attach it.

Just to check, though, make sure you're logged in. I've been fooled once or 
twice by being automatically signed out...

Erick

On Mon, Jul 29, 2013 at 3:17 AM, Elran Dvir elr...@checkpoint.com wrote:
 Thanks, Erick.

 I have tried it four times. It keeps failing.
 The problem reoccurred today.

 Thanks.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Monday, July 29, 2013 2:44 AM
 To: solr-user@lucene.apache.org
 Subject: Re: new field type - enum field

 You should be able to attach a patch, wonder if there was some temporary 
 glitch in the JIRA. Is this persisting.

 Let us know if this continues...

 Erick

 On Sun, Jul 28, 2013 at 12:11 PM, Elran Dvir elr...@checkpoint.com wrote:
 Hi,

 I have created an issue:
 https://issues.apache.org/jira/browse/SOLR-5084
 I tried to attach my patch, but it failed:  Cannot attach file 
 Solr-5084.patch: Unable to communicate with JIRA.
 What am I doing wrong?

 Thanks.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Thursday, July 25, 2013 3:25 PM
 To: solr-user@lucene.apache.org
 Subject: Re: new field type - enum field

 Start here: http://wiki.apache.org/solr/HowToContribute

 Then, when your patch is ready submit a JIRA and attach your patch. Then 
 nudge (gently) if none of the committers picks it up and applies it

 NOTE: It is _not_ necessary that the first version of your patch is 
 completely polished. I often put up partial/incomplete patches (comments 
 with //nocommit are explicitly caught by the ant precommit target for 
 instance) to see if anyone has any comments before polishing.

 Best
 Erick

 On Thu, Jul 25, 2013 at 5:04 AM, Elran Dvir elr...@checkpoint.com wrote:
 Hi,

 I have implemented like Chris described it:
 The field is indexed as numeric, but displayed as string, according to 
 configuration.
 It applies to facet, pivot, group and query.

 How do we proceed? How do I contribute it?

 Thanks.

 -Original Message-
 From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
 Sent: Thursday, July 25, 2013 4:40 AM
 To: solr-user@lucene.apache.org
 Subject: Re: new field type - enum field


 : Doable at Lucene level by any chance?

 Given how well the Trie fields compress (ByteField and ShortField have been 
 deprecated in favor of TrieIntField for this reason) it probably just makes 
 sense to treat it as a numeric at the Lucene level.

 :  If there's positive feedback, I'll open an issue with a patch for the 
 functionality.

 I've typically dealt with this sort of thing at the client layer 
 using a simple numeric field in Solr, or used an UpdateProcessor to 
 convert the
 String-numeric mapping when indexing  used clinet logic of a
 DocTransformer to handle the stored value at query time -- but having a 
 built in FieldType that handles that for you automatically (and helps 
 ensure the indexed values conform to the enum) would certainly be cool if 
 you'd like to contribute it.


 -Hoss

 Email secured by Check Point

 Email secured by Check Point

 Email secured by Check Point

Email secured by Check Point


Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-31 Thread Jack Krupansky

Good to note!

But... any search will not detect dupe IDs for uncommitted documents.

-- Jack Krupansky

-Original Message- 
From: Mikhail Khludnev

Sent: Wednesday, July 31, 2013 6:11 AM
To: solr-user
Subject: Re: How might one search for dupe IDs other than faceting on the ID 
field?


fwiw,

this code won't capture uncommitted duplicates.


On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen dotanco...@gmail.com wrote:


On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
j...@basetechnology.com wrote:
 The Solr SignatureUpdateProcessorFactory is designed to facilitate
dedupe...
 any particular reason you did not use it?

 See:
 http://wiki.apache.org/solr/Deduplication

 and

 https://cwiki.apache.org/confluence/display/solr/De-Duplication


Actually, the guy who made the changes (a coworker) did in fact write
an alternative UpdateHandler. I've just noticed that there are a bunch
of dupes right now, though.

public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 {

public DiscoAPIUpdateHandler(SolrCore core) {
super(core);
}

@Override
public int  addDoc(AddUpdateCommand cmd) throws IOException{

// if overwrite is set to false we'll use the
DefaultUpdateHandler2 , this is done for debugging to insert
duplicates to solr
if (!cmd.overwrite) return super.addDoc(cmd);


// when using ref counted objects you have!! to decrement the
ref count when your done
RefCountedSolrIndexSearcher indexSearcher =
this.core.getNewestSearcher(false);

// the idea is like this we'll make an internal lucene query
and check if that id already exists

Term updateTerm = null;


if (cmd.updateTerm != null){
updateTerm = cmd.updateTerm;
} else {
updateTerm = new Term(id,cmd.getIndexedId());
}


Query query = new TermQuery(updateTerm);
TopDocs docs = indexSearcher.get().search(query,2);

if (docs.totalHits0){
// index searcher is no longer needed
indexSearcher.decref();
// don't add the new document
return 0;
}

// index searcher is no longer needed
indexSearcher.decref();

// if i'm here then it's a new document
return super.addDoc(cmd);

}

}


 And I give a bunch of examples in my book.


I anticipate the book with esteem!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com





--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com 



Re: Unexpected character '' (code 60) expected '='

2013-07-31 Thread Vineet Mishra
I checked the File. . .nothing is there. I mean the formatting is correct,
its a valid XML file.


On Wed, Jul 31, 2013 at 6:38 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 This file is malformed:

 *SEVERE: org.apache.solr.common.SolrException: Unexpected character ''
 (code 60) expected '='*
 * at [row,col {unknown-source}]: [20281,18]*

 Check row 20281 column 18


 -Original message-
  From:Vineet Mishra clearmido...@gmail.com
  Sent: Wednesday 31st July 2013 15:05
  To: solr-user@lucene.apache.org
  Subject: Unexpected character 'lt;' (code 60) expected '='
 
  Hi All
 
  I am currently stuck in a Solr Issue while Posting some data to Solr
 Server.
 
  I have some record from Hbase which I am posting to Solr, but after
 posting
  some 1 Million of data records, it suddenly stopped. Checking the
 Catalina
  log trace it showed,
 
  *org.apache.solr.common.SolrException: Unexpected character '' (code 60)
  expected '='*
  *
  *
  *
  *
  I am not sure whether its the issue with some malformed data for the
  posting, because whatever xml file which I am generating before posting I
  have tried posting that specific file to the solr and its going well.
 
  Below is the whole log trace,
 
 
  *SEVERE: org.apache.solr.common.SolrException: Unexpected character ''
  (code 60) expected '='*
  * at [row,col {unknown-source}]: [20281,18]*
  * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)*
  * at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
  *
  * at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  *
  * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1398)*
  * at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
  *
  * at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
  *
  * at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  *
  * at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  *
  * at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  *
  * at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  *
  * at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
  *
  * at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
  *
  * at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  *
  * at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)*
  * at
 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)*
  * at
 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
  *
  * at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
  *
  * at java.lang.Thread.run(Thread.java:722)*
  *Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
  character '' (code 60) expected '='*
  * at [row,col {unknown-source}]: [20281,18]*
  * at
 
 com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)*
  * at
 
 com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3001)
  *
  * at
 
 com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
  *
  * at
 
 com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)*
  * at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)*
  * at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:295)*
  * at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)*
  * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)*
  * ... 17 more*
  *
  *
  Has anybody faced this issue.
 
  Thanks and Regards
  Vineet
 



Re: Trying to determine the benefit of spellcheck-based suggester vs. using terms component?

2013-07-31 Thread Timothy Potter
Thanks for the reply Erick. I'm looking for type-ahead support; using
spell checking too via the DirectSolrSpellChecker. Seems like the
spell check based suggester is designed for type-head or am I not
understanding something? Here's my config:

requestHandler
class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
str name=echoParamsexplicit/str
str name=wtjson/str
str name=indenttrue/str
str name=dfsuggest/str
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggestDictionary/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count5/str
str name=spellcheck.collatefalse/str
/lst
arr name=components
strsuggest/str
/arr
/requestHandler

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
str name=namesuggestDictionary/str
str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
str name=fieldsuggest/str
float name=threshold0./float
str name=buildOnCommittrue/str
/lst
/searchComponent

I was confused why this approach was needed because using terms
component is so easy and doesn't require any build step. From your
answer, it seems like either approach is valid in Solr 4.4 but the
spellcheck based suggester has more knobs, such as loading in an
external dictionary in addition to data in my index, etc.

Cheers,
Tim


On Wed, Jul 31, 2013 at 5:08 AM, Erick Erickson erickerick...@gmail.com wrote:
 The biggest thing is that the spellchecker has lots of knobs
 to tune, all the stuff in
 http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate

 TermsComponent, on the other hand, just gives you
 what's in the index with essentially no knobs to tune.

 So it depends on your goal. Typeahead or spelling
 correction? In the first case I'd go for TermsComponent
 and the second spell check as an example.

 Best
 Erick

 On Tue, Jul 30, 2013 at 2:07 PM, Timothy Potter thelabd...@gmail.com wrote:
 Going over the comments in SOLR-1316, I seemed to have lost the
 forrest for the trees. What is the benefit of using the spellcheck
 based suggester over something like the terms component to get
 suggestions as the user types?

 Maybe it is faster because it builds the in-memory data structure on
 commit? Seems like the terms component is pretty fast too.

 I'd appreciate any additional insights about this. There are so many
 solutions to auto-suggest for Solr, it's hard to decide what
 approach to take.

 Cheers,
 Tim


Autowarming last 15 days data

2013-07-31 Thread Cool Techi
Hi,

We have a solr master slave set up with close to 30 million records. Our index 
changes/updates very frequently and replication is set up at 60 seconds delay.

Now every time replication completes, the new searches take a time. How can 
this be improved? I have come across that warming would help this scenario, I 
our case we cannot warm some queries, but most of the users use the last 15 
days data only. 

So would it be possible to auto warm only last 15 days data?

Regards,
Ayush
  

Re: Measuring SOLR performance

2013-07-31 Thread Dmitry Kan
Hi Roman,

What  version and config of SOLR does the tool expect?

Tried to run, but got:

**ERROR**
  File solrjmeter.py, line 1390, in module
main(sys.argv)
  File solrjmeter.py, line 1296, in main
check_prerequisities(options)
  File solrjmeter.py, line 351, in check_prerequisities
error('Cannot contact: %s' % options.query_endpoint)
  File solrjmeter.py, line 66, in error
traceback.print_stack()
Cannot contact: http://localhost:8983/solr


complains about URL, clicking which leads properly to the admin page...
solr 4.3.1, 2 cores shard

Dmitry


On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla roman.ch...@gmail.com wrote:

 Hello,

 I have been wanting some tools for measuring performance of SOLR, similar
 to Mike McCandles' lucene benchmark.

 so yet another monitor was born, is described here:
 http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/

 I tested it on the problem of garbage collectors (see the blogs for
 details) and so far I can't conclude whether highly customized G1 is better
 than highly customized CMS, but I think interesting details can be seen
 there.

 Hope this helps someone, and of course, feel free to improve the tool and
 share!

 roman



Re: Improper shutdown of Solr in Jetty 9

2013-07-31 Thread Dmitry Kan
OK. On ubuntu there are shell scripts that come with jetty 9. They seem to
do the proper job (disclaimer: not yet extensive testing with solr done,
but looks good so far).
Not sure, how well jetty supports win environment on the life-cycle
automation side.


On Wed, Jul 31, 2013 at 1:43 PM, Artem Karpenko a.karpe...@oxseed.comwrote:

 Hello Dmitry,

 it's Windows 7. I'm starting Jetty with java -jar start.jar

 31.07.2013 12:36, Dmitry Kan пишет:

  Artem,

 Whats the OS are using?
 So far jetty 9 with solr 4.3.1 works ok under ubuntu 12.04.
 On 30 Jul 2013 17:23, Alexandre Rafalovitch arafa...@gmail.com wrote:

  Of course, I meant Jetty (not Tomcat). So apologies for spam and
 confusion
 of my own. The rest of the statement stands.

 Personal website: http://www.outerthoughts.com/
 LinkedIn: 
 http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Tue, Jul 30, 2013 at 10:20 AM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:

  Thanks for letting us know. See if you can add it to the documentation
 somewhere.

 Solr is not using Tomcat 9, but I believe that was primarily because
 Tomcat 9 requires Java 7 and Solr 4.x is staying with Java 6 as minimum
 requirement.

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 LinkedIn: 
 http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Tue, Jul 30, 2013 at 10:09 AM, Artem Karpenko a.karpe...@oxseed.com
 wrote:

  Uh, sorry for spamming, but if anyone interested there is a way to
 properly shutdown Jetty when it's launched with --exec flag.
 You can use JMX to invoke method stop() on the Jetty's Server MBean.

 This

 triggers a proper shutdown with all Solr's close() callbacks executed.
 I wonder why it's not noted at least in Jetty documentation.

 Regards,
 Artem Karpenko.

 30.07.2013 16:58, Artem Karpenko пишет:

   After some investigation I found that the problem is not with Jetty's

 version but usage of --exec flag.
 Namely, when --exec is used (to specify JVM args) then shutdown is not
 graceful, it seems that Java process that is just killed.
 Not sure how to handle this...

 Regards,
 Artem Karpenko.

 29.07.2013 16:51, Artem Karpenko пишет:

  Hi,

 I can't make Solr shut down properly when using Jetty 9. Tested this
 with a simple plugin that only extends DirectUpdateHandler2, creates
 a
 file in constructor and deletes it in close(). While it's working
 fine
 in the example installation (the one that can be downloaded from Solr
 site) and in the simple custom installation with Jetty 8, it won't in
 Jetty 9. There is not much logging at shutdown at all, just Jetty's
 closing selector or smth., unlike with Jetty 8 where it prints

 various

 Graceful shutdown messages from Solr.

 Installation procedure I used for both Jettys is rather simple: just

 put

 solr.war into webapps/ directory, plugin JAR into {core}/lib/ and
 configure update handler in solrconfig.xml.
 OS is Windows 7, Solr 4.4.
 I tried to stop Jetty with both Ctrl+C and java start.jar
 [port/key
 params] --stop. For Jetty 8 it works fine even with Ctrl+C.

 Did anybody stumble on this issue?

 Best regards,
 Artem.






Re: Measuring SOLR performance

2013-07-31 Thread Dmitry Kan
Ok, got the error fixed by modifying the base solr ulr in solrjmeter.py
(added core name after /solr part).
Next error is:

WARNING: no test name(s) supplied nor found in:
['/home/dmitry/projects/lab/solrjmeter/demo/queries/demo.queries']

It is a 'slow start with new tool' symptom I guess.. :)


On Wed, Jul 31, 2013 at 4:39 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi Roman,

 What  version and config of SOLR does the tool expect?

 Tried to run, but got:

 **ERROR**
   File solrjmeter.py, line 1390, in module
 main(sys.argv)
   File solrjmeter.py, line 1296, in main
 check_prerequisities(options)
   File solrjmeter.py, line 351, in check_prerequisities
 error('Cannot contact: %s' % options.query_endpoint)
   File solrjmeter.py, line 66, in error
 traceback.print_stack()
 Cannot contact: http://localhost:8983/solr


 complains about URL, clicking which leads properly to the admin page...
 solr 4.3.1, 2 cores shard

 Dmitry


 On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla roman.ch...@gmail.comwrote:

 Hello,

 I have been wanting some tools for measuring performance of SOLR, similar
 to Mike McCandles' lucene benchmark.

 so yet another monitor was born, is described here:
 http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/

 I tested it on the problem of garbage collectors (see the blogs for
 details) and so far I can't conclude whether highly customized G1 is
 better
 than highly customized CMS, but I think interesting details can be seen
 there.

 Hope this helps someone, and of course, feel free to improve the tool and
 share!

 roman





Re: SolrCloud Exception

2013-07-31 Thread Shawn Heisey
On 7/31/2013 4:27 AM, Sinduja Rajendran wrote:
 I am running solr 4.0 in a cloud. We have close to 100Mdocuments. The data
 is from a single DB table. I use dih.
 Our solrCloud has 3 zookeepers, one tomcat, 2 solr instances in same
 tomcat. We have 8 GB Ram.
 
 After indexing 14M, my indexing fails witht the below exception.
 
 solr org.apache.lucene.index.MergePolicy$MergeException:
 java.lang.OutOfMemoryError: GC overhead limit exceeded
 
 I tried increasing the GC value to the App server
 
  -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80
 
 But after giving the command, my indexing went drastically down. Its
 was indexing only 15k documents for 20 minutes. Earlier it was 300k
 for 20 min.

First thing to mention is that Solr 4.0 was extremely buggy, upgrading
would be advisable.  In the meantime:

An OutOfMemoryError means that Solr needs more heap memory than the JVM
is allowed to use.  The Solr Admin UI dashboard will tell you how much
memory is allocated to your JVM, which you can increase with the -Xmx
parameter.  Real RAM must be available from the system in order to
increase the heap size.

The options you have given just change the GC collector and tune one
aspect of the new collector, they don't increase anything.  Here are
some things that may help you:

http://wiki.apache.org/solr/SolrPerformanceProblems
http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

After looking over that information and making adjustments, if you are
still having trouble, we can go over your config and all your details to
see what can be done.

You said that both of your Solr instances are running in the same
tomcat.  Just FYI - because you aren't running all functions on separate
hardware, your setup is not fault tolerant.  Machine failures DO happen,
no matter how much redundancy you build into that server.  If you are
running all this on a redundant VM solution that has live migration of
running VMs, then my statement isn't accurate.

Thanks,
Shawn



SolrCloud - Replica 'down'. How to get it back as 'active'? - Solr 4.3.0

2013-07-31 Thread Jeroen Steggink

Hi,

After the following error, one of the replicas of the leader went down.
Error opening new searcher. exceeded limit of maxWarmingSearchers=2, 
try again later.

I increased the autoCommit time to 5000ms and restarted Solr.

However, the status is still set to down.
How do I get it back to active?

Regards,
Jeroen




Re: Solr PolyField

2013-07-31 Thread Erick Erickson
Nope. Solr fields are flat. Why do you want to do this? I'm
asking because this might be an XY problems and there
may be other possibilities.

Best
Erick

On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso
meligalet...@gmail.com wrote:
 Hi, I'm trying to create a field with multiple fields inside, that is:

 origin:
 {

 htmlUrl: http://www.gazzetta.it/;,
 streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
 title: Gazzetta.it

 },


 Get something like this. Is that possible? I'm using Solr 4.4.0.

 Thanks


Re: Sharding with a SolrCloud

2013-07-31 Thread Erick Erickson
You're in uncharted territory. I can imagine you use
a SolrCloud cluster as a separate Solr for a federated
search, but using it as a single shard just seems wrong.

If nothing else, indexing to the shards will require that
the documents be routed correctly. But having one
shard in SolrCloud and another shard managed
externally seems ripe for getting the docs indexed
to various shards you're not expecting, unless you're
using explicit routing

All in all, this _really_ sounds like something you should
not be attempting. Why are you trying to do this? Is it
possible to just set up a SolrCloud cluster and index
all the docs to it and be done with it?

'cause I think you'll end up with endless problems given
what you've described.

Best
Erick

On Wed, Jul 31, 2013 at 5:16 AM, Oliver Goldschmidt
o.goldschm...@tuhh.de wrote:
 Hi list,

 I have a Solr server, which uses sharding to make distributed search
 with another Solr server. The other Solr server now migrates to a Solr
 Cloud system. I've been trying recently to continue searching the Solr
 Cloud as a shard for my Solr server, but this is failing with mysterious
 effects. I am getting a result with a number of hits, when I perform a
 search, but the results are not displayed at all. This is the resonse
 header I am getting from Solr:

 {
   responseHeader:{
 status:0,
 QTime:305,
 params:{
   facet:true,
   indent:yes,
   facet.mincount:1,
   facet.limit:30,
   qf:title_short^750 title_full_unstemmed^600,
   json.nl:arrarr,
   wt:json,
   rows:20,
   shards:ourindex.nowhere.de/solr/index,
   bq:format:Book^500,
   fl:*,score,
   facet.sort:count,
   start:0,
   q:xml,
   shards.info:true,
   facet.prefix:,
   facet.field:[publishDate],
   qt:dismax}},
   shards.info:{
 ourindex.nowhere.de/solr/index:{
   numFound:10076,
   maxScore:8.507474,
   time:263}},
   response:{numFound:10056,start:0,maxScore:8.507474,docs:[]
   }

 As you can see, there are no docs in the result. This result is not 100%
 reproducable: sometimes I get no results displayed, other times it works
 (with the same query URL!). As you also can see in the result, the
 number of hits in the response is a little bit less than the number of
 hits sent from the shard.

 This makes me wonder if it is not possible to use a Solr Cloud as a
 shard for another standalone Solr server?

 Any hint is appreciated!

 Best
 - Oliver

 --
 Oliver Goldschmidt
 TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
 Denickestr. 22
 21071 Hamburg - Harburg
 Tel.+49 (0)40 / 428 78 - 32 91
 eMail   o.goldschm...@tuhh.de
 --
 GPG/PGP-Schlüssel:
 http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc



Re: SolrCloud - Replica 'down'. How to get it back as 'active'? - Solr 4.3.0

2013-07-31 Thread Anshum Gupta
It perhaps is just replaying the transaction logs and coming up. Wait for
it is what I'd say.
The admin UI as of now doesn't show replaying of transaction log as
'recovering', it does so only during peer sync.

Also, you may want to add autoSoftCommit and increase the autoCommit to a
few minutes.


On Wed, Jul 31, 2013 at 7:55 PM, Jeroen Steggink jer...@stegg-inc.comwrote:

 Hi,

 After the following error, one of the replicas of the leader went down.
 Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try
 again later.
 I increased the autoCommit time to 5000ms and restarted Solr.

 However, the status is still set to down.
 How do I get it back to active?

 Regards,
 Jeroen





-- 

Anshum Gupta
http://www.anshumgupta.net


Re: Solr PolyField

2013-07-31 Thread Luís Portela Afonso
Hi,

I'm trying to index information of RSS Feeds.

So in a more detailed explanation:

The RSS feed has something like: 
enclosure url=http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; 
length=32642192 type=audio/mpeg/

With my current configuration, this is working and i get a result like that:

enclosure: [
audio/mpeg,
http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
37521428
],

BUT, this is not the result that i'm trying to reach. With that i'm not able to 
know in a correct way, if audio/mpeg is the type, or the url, or the length.

I want to reach something like:

enclosure: {
type: audio/mpeg,
url: http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
length: 37521428
},


So, how i intend this, this should be 3 fields inside of another field, no?


Many Thanks for the answer and the help.


On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com wrote:

 Nope. Solr fields are flat. Why do you want to do this? I'm
 asking because this might be an XY problems and there
 may be other possibilities.
 
 Best
 Erick
 
 On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso
 meligalet...@gmail.com wrote:
 Hi, I'm trying to create a field with multiple fields inside, that is:
 
 origin:
 {
 
 htmlUrl: http://www.gazzetta.it/;,
 streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
 title: Gazzetta.it
 
 },
 
 
 Get something like this. Is that possible? I'm using Solr 4.4.0.
 
 Thanks



smime.p7s
Description: S/MIME cryptographic signature


Re: Autowarming last 15 days data

2013-07-31 Thread Shawn Heisey
On 7/31/2013 7:30 AM, Cool Techi wrote:
 We have a solr master slave set up with close to 30 million records. Our 
 index changes/updates very frequently and replication is set up at 60 seconds 
 delay.
 
 Now every time replication completes, the new searches take a time. How can 
 this be improved? I have come across that warming would help this scenario, I 
 our case we cannot warm some queries, but most of the users use the last 15 
 days data only. 
 
 So would it be possible to auto warm only last 15 days data?

Autowarming is generally done automatically when a new searcher is
opened, according to the cache config.  It will take the most recent N
queries in the cache (according to the autowarmCount) and re-execute
those queries against the index to populate the cache.  The document
cache cannot be warmed directly, but when the query result cache is
warmed, that will also populate the document cache.

Because you have a potentially very frequent interval for opening new
searchers (possibly replicating every 60 seconds), you will want to
avoid large autowarmCount values.  If your autowarming ends up taking
too long, the system will try to open a new searcher while the previous
one is being warmed, which can lead to problems.  I have found that the
filterCache is particularly slow to warm.

Thanks,
Shawn



Re: Solr PolyField

2013-07-31 Thread Michael Della Bitta
Luís,

Is there a reason why splitting this up into enclosure_type, enclosure_url,
and enclosure_length would not work?


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso 
meligalet...@gmail.com wrote:

 Hi,

 I'm trying to index information of RSS Feeds.

 So in a more detailed explanation:

 The RSS feed has something like:
 enclosure url=http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3;
  length=32642192 type=audio/mpeg/

 *With my current configuration, this is working and i get a result like
 that:*


- enclosure:
[
   - audio/mpeg,
   - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
   - 37521428
   ],


 *BUT,* this is not the result that i'm trying to reach. With that i'm not
 able to know in a correct way, if audio/mpeg is the *type*, or the *
 url,* or the *length*.
 *
 *
 *I want to reach something like:*

-
- enclosure:
{
   - type: a http://www.gazzetta.it/udio/mpeg,
   - url:
   http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
   - length: 37521428
   },



 So, how i intend this, this should be 3 fields inside of another field, no?


 Many Thanks for the answer and the help.


 On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Nope. Solr fields are flat. Why do you want to do this? I'm
 asking because this might be an XY problems and there
 may be other possibilities.

 Best
 Erick

 On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso
 meligalet...@gmail.com wrote:

 Hi, I'm trying to create a field with multiple fields inside, that is:

 origin:
 {

 htmlUrl: http://www.gazzetta.it/;,
 streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
 title: Gazzetta.it

 },


 Get something like this. Is that possible? I'm using Solr 4.4.0.

 Thanks





Re: Solr PolyField

2013-07-31 Thread Luís Portela Afonso
This fields can be multiValued.
I the rss standart there is not correct to do that, but some sources do and i 
like to grab it all. Is there any way that make it possible?

Once again, Many thanks :)

On Jul 31, 2013, at 3:54 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Luís,
 
 Is there a reason why splitting this up into enclosure_type, enclosure_url,
 and enclosure_length would not work?
 
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 w: appinions.com http://www.appinions.com/
 
 
 On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso 
 meligalet...@gmail.com wrote:
 
 Hi,
 
 I'm trying to index information of RSS Feeds.
 
 So in a more detailed explanation:
 
 The RSS feed has something like:
 enclosure url=http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3;
 length=32642192 type=audio/mpeg/
 
 *With my current configuration, this is working and i get a result like
 that:*
 
 
   - enclosure:
   [
  - audio/mpeg,
  - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
  - 37521428
  ],
 
 
 *BUT,* this is not the result that i'm trying to reach. With that i'm not
 able to know in a correct way, if audio/mpeg is the *type*, or the *
 url,* or the *length*.
 *
 *
 *I want to reach something like:*
 
   -
   - enclosure:
   {
  - type: a http://www.gazzetta.it/udio/mpeg,
  - url:
  http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
  - length: 37521428
  },
 
 
 
 So, how i intend this, this should be 3 fields inside of another field, no?
 
 
 Many Thanks for the answer and the help.
 
 
 On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 Nope. Solr fields are flat. Why do you want to do this? I'm
 asking because this might be an XY problems and there
 may be other possibilities.
 
 Best
 Erick
 
 On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso
 meligalet...@gmail.com wrote:
 
 Hi, I'm trying to create a field with multiple fields inside, that is:
 
 origin:
 {
 
 htmlUrl: http://www.gazzetta.it/;,
 streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
 title: Gazzetta.it
 
 },
 
 
 Get something like this. Is that possible? I'm using Solr 4.4.0.
 
 Thanks
 
 
 



smime.p7s
Description: S/MIME cryptographic signature


Re: Sharding with a SolrCloud

2013-07-31 Thread Oliver Goldschmidt
Thank you very much for that information, Erick. That was what I was
fearing...

Well, the problem, why I am trying to do this is, that the SolrCloud is
managed by someone else. We are indexing some content to a pretty small
local index. To this index we have complete access and can do whatever
we want to do. But we also need the seperate index, which is now moving
into the cloud. Its not possible to put our local content into the
cloud, because we are not maintaining it and have no write permission to it.

But why shouldn't that work? Isn't Solr Cloud acting like one solr
server? The indices have to be maintained seperately - can't I just
continue using them as shards and get one result list from both of them
(thats how I did it before they wanted to switch to Solr Cloud)?

Though, if there is no way to use the cloud as a shard, we will have to
think about how to solve that. Of course we can split up the queries and
make two queries (one for the cloud and one for our local index). But
this might be a bit confusing for the user.

Thank you again, best
- Oliver

Am 31.07.2013 16:39, schrieb Erick Erickson:
 You're in uncharted territory. I can imagine you use
 a SolrCloud cluster as a separate Solr for a federated
 search, but using it as a single shard just seems wrong.

 If nothing else, indexing to the shards will require that
 the documents be routed correctly. But having one
 shard in SolrCloud and another shard managed
 externally seems ripe for getting the docs indexed
 to various shards you're not expecting, unless you're
 using explicit routing

 All in all, this _really_ sounds like something you should
 not be attempting. Why are you trying to do this? Is it
 possible to just set up a SolrCloud cluster and index
 all the docs to it and be done with it?

 'cause I think you'll end up with endless problems given
 what you've described.

 Best
 Erick

 On Wed, Jul 31, 2013 at 5:16 AM, Oliver Goldschmidt
 o.goldschm...@tuhh.de wrote:
 Hi list,

 I have a Solr server, which uses sharding to make distributed search
 with another Solr server. The other Solr server now migrates to a Solr
 Cloud system. I've been trying recently to continue searching the Solr
 Cloud as a shard for my Solr server, but this is failing with mysterious
 effects. I am getting a result with a number of hits, when I perform a
 search, but the results are not displayed at all. This is the resonse
 header I am getting from Solr:

 {
   responseHeader:{
 status:0,
 QTime:305,
 params:{
   facet:true,
   indent:yes,
   facet.mincount:1,
   facet.limit:30,
   qf:title_short^750 title_full_unstemmed^600,
   json.nl:arrarr,
   wt:json,
   rows:20,
   shards:ourindex.nowhere.de/solr/index,
   bq:format:Book^500,
   fl:*,score,
   facet.sort:count,
   start:0,
   q:xml,
   shards.info:true,
   facet.prefix:,
   facet.field:[publishDate],
   qt:dismax}},
   shards.info:{
 ourindex.nowhere.de/solr/index:{
   numFound:10076,
   maxScore:8.507474,
   time:263}},
   response:{numFound:10056,start:0,maxScore:8.507474,docs:[]
   }

 As you can see, there are no docs in the result. This result is not 100%
 reproducable: sometimes I get no results displayed, other times it works
 (with the same query URL!). As you also can see in the result, the
 number of hits in the response is a little bit less than the number of
 hits sent from the shard.

 This makes me wonder if it is not possible to use a Solr Cloud as a
 shard for another standalone Solr server?

 Any hint is appreciated!

 Best
 - Oliver

 --
 Oliver Goldschmidt
 TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
 Denickestr. 22
 21071 Hamburg - Harburg
 Tel.+49 (0)40 / 428 78 - 32 91
 eMail   o.goldschm...@tuhh.de
 --
 GPG/PGP-Schlüssel:
 http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc



-- 
Oliver Goldschmidt
TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
Denickestr. 22
21071 Hamburg - Harburg
Tel.+49 (0)40 / 428 78 - 32 91
eMail   o.goldschm...@tuhh.de
--
GPG/PGP-Schlüssel: 
http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc



Re: Measuring SOLR performance

2013-07-31 Thread Shawn Heisey
On 7/31/2013 12:24 AM, William Bell wrote:
 But that link does not tell me which on you are using?
 
 You are listing like 4 versions on your site.
 
 Also, what did it fix? Pause times?
 
 
 Any other words of wisdom ?

I'm not sure whether that was directed at me or Roman, but here's my
answers:

I run one copy of my index on Solr 3.5.0 and another copy on Solr 4.2.1.
 I have a completely separate (and much smaller) index using SolrCloud
on 4.2.1.

I was seeing GC pause times of 8-10 seconds on both 3.5.0 and 4.2.1 with
an untuned CMS collector.  When I switched that to G1 (also untuned), I
was seeing pause times of 12 seconds.  The average GC time did go down,
but the long stop-the-world pauses were worse.  I used the jHiccup tool
to see the problem.

I went to a CMS config much like what Roman used on his benchmarks, and
that improved things greatly, but I was still seeing occasional pauses
long enough to make my load balancer ping check (5 second timeout) think
that the index had gone down.

I later tried the CMS config that's on my wiki page.  That seems to have
fixed my load balancer problem.  I do still see pauses of up to a
second, but they are not frequent.  We have more page load delay from
our webapp than we do from Solr, so users aren't noticing when searches
occasionally take a little longer.

Thanks,
Shawn



Re: Solr PolyField

2013-07-31 Thread Luís Portela Afonso
As a single record? Hum, no.

So an Rss has /rss/channel/ and then lot of /rss/channel/item, right?
Each /rss/channel/item is a new document on Solr. I start with the solr example 
rss, but i change that to has more fields, other fields and get the feed url 
from a database.

So each /rss/channel/item is a document to the indexing, bue each 
/rss/channel/item can have more than on enclosure tag.

Many thanks

On Jul 31, 2013, at 4:05 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 So you're trying to index a RSS feed as a single record, but you want to be
 able to search for and retrieve individual entries from within the feed? Is
 that the issue?
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 w: appinions.com http://www.appinions.com/
 
 
 On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso 
 meligalet...@gmail.com wrote:
 
 This fields can be multiValued.
 I the rss standart there is not correct to do that, but some sources do
 and i like to grab it all. Is there any way that make it possible?
 
 Once again, Many thanks :)
 
 On Jul 31, 2013, at 3:54 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:
 
 Luís,
 
 Is there a reason why splitting this up into enclosure_type,
 enclosure_url,
 and enclosure_length would not work?
 
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/
 
 
 On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso 
 meligalet...@gmail.com wrote:
 
 Hi,
 
 I'm trying to index information of RSS Feeds.
 
 So in a more detailed explanation:
 
 The RSS feed has something like:
 enclosure url=
 http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3;
 length=32642192 type=audio/mpeg/
 
 *With my current configuration, this is working and i get a result like
 that:*
 
 
  - enclosure:
  [
 - audio/mpeg,
 - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
 - 37521428
 ],
 
 
 *BUT,* this is not the result that i'm trying to reach. With that i'm
 not
 able to know in a correct way, if audio/mpeg is the *type*, or the *
 url,* or the *length*.
 *
 *
 *I want to reach something like:*
 
  -
  - enclosure:
  {
 - type: a http://www.gazzetta.it/udio/mpeg,
 - url:
 http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
 - length: 37521428
 },
 
 
 
 So, how i intend this, this should be 3 fields inside of another field,
 no?
 
 
 Many Thanks for the answer and the help.
 
 
 On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 Nope. Solr fields are flat. Why do you want to do this? I'm
 asking because this might be an XY problems and there
 may be other possibilities.
 
 Best
 Erick
 
 On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso
 meligalet...@gmail.com wrote:
 
 Hi, I'm trying to create a field with multiple fields inside, that is:
 
 origin:
 {
 
 htmlUrl: http://www.gazzetta.it/;,
 streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
 title: Gazzetta.it
 
 },
 
 
 Get something like this. Is that possible? I'm using Solr 4.4.0.
 
 Thanks
 
 
 
 
 



smime.p7s
Description: S/MIME cryptographic signature


Re: SolrCloud - Replica 'down'. How to get it back as 'active'? - Solr 4.3.0

2013-07-31 Thread Jeroen Steggink

Thanks Anshum,

autoSoftCommit was alread set to 1000ms, but I changed the autoCommit to 
3 minutes.


I'll wait for it to come back. The index contains about 200.000 
documents and the last commit was 14 hours ago. So I wonder how long it 
will take.

I would have thought it would be back up already.

On 31-7-2013 16:40, Anshum Gupta wrote:

It perhaps is just replaying the transaction logs and coming up. Wait for
it is what I'd say.
The admin UI as of now doesn't show replaying of transaction log as
'recovering', it does so only during peer sync.

Also, you may want to add autoSoftCommit and increase the autoCommit to a
few minutes.


On Wed, Jul 31, 2013 at 7:55 PM, Jeroen Steggink jer...@stegg-inc.comwrote:


Hi,

After the following error, one of the replicas of the leader went down.
Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try
again later.
I increased the autoCommit time to 5000ms and restarted Solr.

However, the status is still set to down.
How do I get it back to active?

Regards,
Jeroen










Re: Solr PolyField

2013-07-31 Thread Luís Portela Afonso
Hum, ok.

It's possible to add to a field, static text? Text that i write on the 
configuration and then append another field? I saw something like 
CloneFieldProcessor but when i'm starting solr, it says that could not find the 
class.
I was trying to use processors to move one field to another.

I saw this:
processor class=solr.FieldCopyProcessorFactory
  str name=sourcelastname firstname/str
  str name=destfullname/str
  bool name=appendtrue/bool
  str name=append.delim, /str
/processor

But when i try to use it solr says that he cannot find the 
solr.FieldCopyProcessorFactory. I'm using solr 4.4.0

Thanks ;)

On Jul 31, 2013, at 4:16 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 OK,
 
 Then I would suggest creating multiValued enclosure_type, etc. tags for
 searching, and then one string-typed field to store the JSON snippet you've
 been showing.
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 w: appinions.com http://www.appinions.com/
 
 
 On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso 
 meligalet...@gmail.com wrote:
 
 As a single record? Hum, no.
 
 So an Rss has /rss/channel/ and then lot of /rss/channel/item, right?
 Each /rss/channel/item is a new document on Solr. I start with the solr
 example rss, but i change that to has more fields, other fields and get the
 feed url from a database.
 
 So each /rss/channel/item is a document to the indexing, bue each
 /rss/channel/item can have more than on enclosure tag.
 
 Many thanks
 
 On Jul 31, 2013, at 4:05 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:
 
 So you're trying to index a RSS feed as a single record, but you want to
 be
 able to search for and retrieve individual entries from within the feed?
 Is
 that the issue?
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/
 
 
 On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso 
 meligalet...@gmail.com wrote:
 
 This fields can be multiValued.
 I the rss standart there is not correct to do that, but some sources do
 and i like to grab it all. Is there any way that make it possible?
 
 Once again, Many thanks :)
 
 On Jul 31, 2013, at 3:54 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:
 
 Luís,
 
 Is there a reason why splitting this up into enclosure_type,
 enclosure_url,
 and enclosure_length would not work?
 
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/
 
 
 On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso 
 meligalet...@gmail.com wrote:
 
 Hi,
 
 I'm trying to index information of RSS Feeds.
 
 So in a more detailed explanation:
 
 The RSS feed has something like:
 enclosure url=
 http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3;
 length=32642192 type=audio/mpeg/
 
 *With my current configuration, this is working and i get a result
 like
 that:*
 
 
 - enclosure:
 [
- audio/mpeg,
- http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
- 37521428
],
 
 
 *BUT,* this is not the result that i'm trying to reach. With that i'm
 not
 able to know in a correct way, if audio/mpeg is the *type*, or
 the *
 url,* or the *length*.
 *
 *
 *I want to reach something like:*
 
 -
 - enclosure:
 {
- type: a http://www.gazzetta.it/udio/mpeg,
- url:
http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
- length: 37521428
},
 
 
 
 So, how i intend this, this should be 3 fields inside of another
 field,
 no?
 
 
 Many Thanks for the answer and the help.
 
 
 On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 Nope. Solr fields are flat. Why do you want to do this? I'm
 asking because this might be an XY problems and there
 may be other possibilities.
 
 Best
 Erick
 
 On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso
 meligalet...@gmail.com wrote:
 
 Hi, I'm trying to create a field with multiple fields inside, that is:
 
 origin:
 {
 
 htmlUrl: http://www.gazzetta.it/;,
 streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
 title: Gazzetta.it
 
 

Re: Sharding with a SolrCloud

2013-07-31 Thread Erick Erickson
Well, assuming you have solved the differences
in statistics between the index you maintain and
the one in the cloud with respect to the scoring...

My comment about indexing is probably
irrelevant, you're not indexing anything to the
SolrCloud cluster.

But still doubt this will work. Here's the problem:

Internally, the round-trip looks like this:
node 1 receives request
node 1 sends requests to all the shards
node 1 receives the top N docs from each shard
node 1 collates those to the real top N
node1 then queries each shard for the docs hosted on those shards.

This last step is where I'd expect just adding shard to
the list that happened to be a separate SolrCloud instance
to fall down, the originating node would expect to just get
the documents from the shard it knew about.

And if you list _all_ the shards in the SolrCloud instance,
then each of them will distribute the request to all shards
in the SolrCloud instance, confusing things even more.

Much of this is speculation, but I can imagine a number
of ways this scenario would go bad, it wasn't one of the
design goals as far as I know.

Best
Erick

On Wed, Jul 31, 2013 at 11:01 AM, Oliver Goldschmidt
o.goldschm...@tuhh.de wrote:
 Thank you very much for that information, Erick. That was what I was
 fearing...

 Well, the problem, why I am trying to do this is, that the SolrCloud is
 managed by someone else. We are indexing some content to a pretty small
 local index. To this index we have complete access and can do whatever
 we want to do. But we also need the seperate index, which is now moving
 into the cloud. Its not possible to put our local content into the
 cloud, because we are not maintaining it and have no write permission to it.

 But why shouldn't that work? Isn't Solr Cloud acting like one solr
 server? The indices have to be maintained seperately - can't I just
 continue using them as shards and get one result list from both of them
 (thats how I did it before they wanted to switch to Solr Cloud)?

 Though, if there is no way to use the cloud as a shard, we will have to
 think about how to solve that. Of course we can split up the queries and
 make two queries (one for the cloud and one for our local index). But
 this might be a bit confusing for the user.

 Thank you again, best
 - Oliver

 Am 31.07.2013 16:39, schrieb Erick Erickson:
 You're in uncharted territory. I can imagine you use
 a SolrCloud cluster as a separate Solr for a federated
 search, but using it as a single shard just seems wrong.

 If nothing else, indexing to the shards will require that
 the documents be routed correctly. But having one
 shard in SolrCloud and another shard managed
 externally seems ripe for getting the docs indexed
 to various shards you're not expecting, unless you're
 using explicit routing

 All in all, this _really_ sounds like something you should
 not be attempting. Why are you trying to do this? Is it
 possible to just set up a SolrCloud cluster and index
 all the docs to it and be done with it?

 'cause I think you'll end up with endless problems given
 what you've described.

 Best
 Erick

 On Wed, Jul 31, 2013 at 5:16 AM, Oliver Goldschmidt
 o.goldschm...@tuhh.de wrote:
 Hi list,

 I have a Solr server, which uses sharding to make distributed search
 with another Solr server. The other Solr server now migrates to a Solr
 Cloud system. I've been trying recently to continue searching the Solr
 Cloud as a shard for my Solr server, but this is failing with mysterious
 effects. I am getting a result with a number of hits, when I perform a
 search, but the results are not displayed at all. This is the resonse
 header I am getting from Solr:

 {
   responseHeader:{
 status:0,
 QTime:305,
 params:{
   facet:true,
   indent:yes,
   facet.mincount:1,
   facet.limit:30,
   qf:title_short^750 title_full_unstemmed^600,
   json.nl:arrarr,
   wt:json,
   rows:20,
   shards:ourindex.nowhere.de/solr/index,
   bq:format:Book^500,
   fl:*,score,
   facet.sort:count,
   start:0,
   q:xml,
   shards.info:true,
   facet.prefix:,
   facet.field:[publishDate],
   qt:dismax}},
   shards.info:{
 ourindex.nowhere.de/solr/index:{
   numFound:10076,
   maxScore:8.507474,
   time:263}},
   response:{numFound:10056,start:0,maxScore:8.507474,docs:[]
   }

 As you can see, there are no docs in the result. This result is not 100%
 reproducable: sometimes I get no results displayed, other times it works
 (with the same query URL!). As you also can see in the result, the
 number of hits in the response is a little bit less than the number of
 hits sent from the shard.

 This makes me wonder if it is not possible to use a Solr Cloud as a
 shard for another standalone Solr server?

 Any hint is appreciated!

 Best
 - Oliver

 --
 Oliver Goldschmidt
 TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
 Denickestr. 22
 21071 Hamburg - Harburg
 Tel.+49 (0)40 

Re: Autowarming last 15 days data

2013-07-31 Thread Shawn Heisey
On 7/31/2013 9:21 AM, Cool Techi wrote:
 Would it make sense if we open a newSearcher with the last 15 days documents? 
 since these is the documents which are mostly used by the users. Also, how 
 could i do this if this is possible?

When you open a searcher, it's for the entire index.  You may want to go
distributed and keep the newest 15 days of data in a separate index from
the rest.  For my own index, I use this hot/cold shard setup.  I have a
nightly process that indexes data that needs to be moved into the cold
shards and deletes it from the hot shard.

http://wiki.apache.org/solr/DistributedSearch

SolrCloud is the future of distributed search, but it does not have
built-in support for a hot/cold shard setup.  You'd need to manage that
yourself with manual sharding.  A custom sharding plugin to automate
indexing would likely be very very involved, it would probably be easier
to manage it outside of SolrCloud.

Thanks,
Shawn



solr 4.4 multiple datasource connection

2013-07-31 Thread Carmine Paternoster
in my db-data-config.xml i have configured two datasource, each with his
parameter name, for example:

dataSource name=test1
 type=JdbcDataSource
 driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost/firstdb
 user=username1
 password=psw1/

dataSource name=test2
 type=JdbcDataSource
 driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost/seconddb
 user=username2
 password=psw2/

document name=content
entity name=news datasource=test1 query=select...
field column=OTYPE_ID name=otypeID /
field column=NWS_ID name=cntID /

/entity

entity name=news_update datasource=test2 query=select...
field column=OTYPE_ID name=otypeID /
field column=NWS_ID name=cntID /

/entity
/document
/dataConfig

but when in solr from dataimport i execute the second entity-name-query it
launch an exception:

*Table 'firstdb.secondTable' doesn't exist\n\tat* could someone help me?
thank you in advance


http://stackoverflow.com/questions/17974029/solr-4-4-multiple-datasource-connection


Re: solr 4.4 multiple datasource connection

2013-07-31 Thread Alexandre Rafalovitch
On Wed, Jul 31, 2013 at 11:49 AM, Carmine Paternoster
carmine...@gmail.comwrote:

 entity name=news datasource=test1 query=select...


Try datasource = dataSource in:
entity name=news datasource=test1 query=select...
entity name=news_update datasource=test2 query=select...

Regards,
   Alex.
P.s. This check will be (eventually) part of SolrLint:
https://github.com/arafalov/SolrLint/issues/7

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


Re: Solr PolyField

2013-07-31 Thread Jack Krupansky
See:
https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html

I have more examples in my book.

-- Jack Krupansky

From: Luís Portela Afonso 
Sent: Wednesday, July 31, 2013 11:41 AM
To: solr-user@lucene.apache.org 
Subject: Re: Solr PolyField

Hum, ok. 

It's possible to add to a field, static text? Text that i write on the 
configuration and then append another field? I saw something like 
CloneFieldProcessor but when i'm starting solr, it says that could not find the 
class.
I was trying to use processors to move one field to another.

I saw this:
processor class=solr.FieldCopyProcessorFactory
  str name=sourcelastname firstname/str
  str name=destfullname/str
  bool name=appendtrue/bool
  str name=append.delim, /str
/processor
But when i try to use it solr says that he cannot find the 
solr.FieldCopyProcessorFactory. I'm using solr 4.4.0

Thanks ;)

On Jul 31, 2013, at 4:16 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:


  OK,

  Then I would suggest creating multiValued enclosure_type, etc. tags for
  searching, and then one string-typed field to store the JSON snippet you've
  been showing.

  Michael Della Bitta

  Applications Developer

  o: +1 646 532 3062  | c: +1 917 477 7906

  appinions inc.

  “The Science of Influence Marketing”

  18 East 41st Street

  New York, NY 10017

  t: @appinions https://twitter.com/Appinions | g+:
  
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
  w: appinions.com http://www.appinions.com/


  On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso 
  meligalet...@gmail.com wrote:


As a single record? Hum, no.

So an Rss has /rss/channel/ and then lot of /rss/channel/item, right?
Each /rss/channel/item is a new document on Solr. I start with the solr
example rss, but i change that to has more fields, other fields and get the
feed url from a database.

So each /rss/channel/item is a document to the indexing, bue each
/rss/channel/item can have more than on enclosure tag.

Many thanks

On Jul 31, 2013, at 4:05 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:


  So you're trying to index a RSS feed as a single record, but you want to

be

  able to search for and retrieve individual entries from within the feed?

Is

  that the issue?

  Michael Della Bitta

  Applications Developer

  o: +1 646 532 3062  | c: +1 917 477 7906

  appinions inc.

  “The Science of Influence Marketing”

  18 East 41st Street

  New York, NY 10017

  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions


https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts


  w: appinions.com http://www.appinions.com/


  On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso 
  meligalet...@gmail.com wrote:


This fields can be multiValued.
I the rss standart there is not correct to do that, but some sources do
and i like to grab it all. Is there any way that make it possible?

Once again, Many thanks :)

On Jul 31, 2013, at 3:54 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:


  Luís,

  Is there a reason why splitting this up into enclosure_type,

enclosure_url,

  and enclosure_length would not work?


  Michael Della Bitta

  Applications Developer

  o: +1 646 532 3062  | c: +1 917 477 7906

  appinions inc.

  “The Science of Influence Marketing”

  18 East 41st Street

  New York, NY 10017

  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions




https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts


  w: appinions.com http://www.appinions.com/


  On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso 
  meligalet...@gmail.com wrote:


Hi,

I'm trying to index information of RSS Feeds.

So in a more detailed explanation:

The RSS feed has something like:
enclosure url=

http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3;

length=32642192 type=audio/mpeg/

*With my current configuration, this is working and i get a result

like

that:*


- enclosure:
[
   - audio/mpeg,
   - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
   - 37521428
   ],


*BUT,* this is not the result that i'm trying to reach. With that 
i'm

not

able to know in a correct way, if audio/mpeg is the *type*, or

the *

url,* or the *length*.
*
*
*I want to reach 

Re: Solr Cloud Setup

2013-07-31 Thread AdityaR
Flavio, 

There was a problem with the solrconfig and schema files. 

One of the team members had deleted some entries in the solrconfig.xml and I
was picking the same solr configuration everytime, I got the latest version
of solr and carefully edited the solrconfig and schema files and it worked. 

We have the cloud up and running, testing is in progress and it looks good. 



Thanks for all your help. 

-Aditya



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Setup-tp4080182p4081654.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Measuring SOLR performance

2013-07-31 Thread Roman Chyla
Hi Dmitry,
probably mistake in the readme, try calling it with -q
/home/dmitry/projects/lab/solrjmeter/queries/demo/demo.queries

as for the base_url, i was testing it on solr4.0, where it tries contactin
/solr/admin/system - is it different for 4.3? I guess I should make it
configurable (it already is, the endpoint is set at the check_options())

thanks

roman


On Wed, Jul 31, 2013 at 10:01 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Ok, got the error fixed by modifying the base solr ulr in solrjmeter.py
 (added core name after /solr part).
 Next error is:

 WARNING: no test name(s) supplied nor found in:
 ['/home/dmitry/projects/lab/solrjmeter/demo/queries/demo.queries']

 It is a 'slow start with new tool' symptom I guess.. :)


 On Wed, Jul 31, 2013 at 4:39 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi Roman,

 What  version and config of SOLR does the tool expect?

 Tried to run, but got:

 **ERROR**
   File solrjmeter.py, line 1390, in module
 main(sys.argv)
   File solrjmeter.py, line 1296, in main
 check_prerequisities(options)
   File solrjmeter.py, line 351, in check_prerequisities
 error('Cannot contact: %s' % options.query_endpoint)
   File solrjmeter.py, line 66, in error
 traceback.print_stack()
 Cannot contact: http://localhost:8983/solr


 complains about URL, clicking which leads properly to the admin page...
 solr 4.3.1, 2 cores shard

 Dmitry


 On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla roman.ch...@gmail.comwrote:

 Hello,

 I have been wanting some tools for measuring performance of SOLR, similar
 to Mike McCandles' lucene benchmark.

 so yet another monitor was born, is described here:
 http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/

 I tested it on the problem of garbage collectors (see the blogs for
 details) and so far I can't conclude whether highly customized G1 is
 better
 than highly customized CMS, but I think interesting details can be seen
 there.

 Hope this helps someone, and of course, feel free to improve the tool and
 share!

 roman






RE: monitor jvm heap size for solrcloud

2013-07-31 Thread Joshi, Shital
Thanks for all answers. We decided to use VisualVM with multiple remote 
connections. 

-Original Message-
From: Utkarsh Sengar [mailto:utkarsh2...@gmail.com] 
Sent: Friday, July 26, 2013 6:19 PM
To: solr-user@lucene.apache.org
Subject: Re: monitor jvm heap size for solrcloud

We have been using newrelic (they have a free plan too) and gives all
needed info like: jvm heap usage in eden space, survivor space and old gen.
Garbage collection info, detailed info about the solr requests and its
response times, error rates etc.

I highly recommend using newrelic to monitor your solr cluster:
http://blog.newrelic.com/2010/05/11/got-apache-solr-search-server-use-rpm-to-monitor-troubleshoot-and-tune-solr-operations/

Thanks,
-Utkarsh


On Fri, Jul 26, 2013 at 2:38 PM, SolrLover bbar...@gmail.com wrote:

 I have used JMX with SOLR before..

 http://docs.lucidworks.com/display/solr/Using+JMX+with+Solr



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/monitor-jvm-heap-size-for-solrcloud-tp4080713p4080725.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks,
-Utkarsh


Re: Does solr cloud support rename or swap function for collection?

2013-07-31 Thread thzinc
This is awesome news. I had been looking for the ability to do this with
SolrCloud since 4.0.0-ALPHA. We're on 4.1.0 right now, so this is a great
reason to plan for an upgrade.

Just to be clear, CREATEALIAS both creates and updates an alias, right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-solr-cloud-support-rename-or-swap-function-for-collection-tp4054193p4081660.html
Sent from the Solr - User mailing list archive at Nabble.com.


upgrade from 4.3 to 4.4

2013-07-31 Thread Joshi, Shital
We have SolrCloud (4.3.0) cluster (5 shards and 2 replicas) on 10 boxes. We 
have about 450 million documents. We're planning to upgrade to Solr 4.4.0. Do 
We need to re-index already indexed documents?

Thanks!




Re: Measuring SOLR performance

2013-07-31 Thread Roman Chyla
I'll try to run it with the new parameters and let you know how it goes.
I've rechecked details for the G1 (default) garbage collector run and I can
confirm that 2 out of 3 runs were showing high max response times, in some
cases even 10secs, but the customized G1 never - so definitely the
parameters had effect because the max time for the customized G1 never went
higher than 1.5secs (and that happend for 2 query classes only). Both the
cms-custom and G1-custom are similar, the G1 seems to have higher values in
the max fields, but that may be random. So, yes, now I am sure what to
think of default G1 as 'bad', and that these G1 parameters, even if they
don't seem G1 specific, have real effect.
Thanks,

roman


On Tue, Jul 30, 2013 at 11:01 PM, Shawn Heisey s...@elyograg.org wrote:

 On 7/30/2013 6:59 PM, Roman Chyla wrote:
  I have been wanting some tools for measuring performance of SOLR, similar
  to Mike McCandles' lucene benchmark.
 
  so yet another monitor was born, is described here:
  http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/
 
  I tested it on the problem of garbage collectors (see the blogs for
  details) and so far I can't conclude whether highly customized G1 is
 better
  than highly customized CMS, but I think interesting details can be seen
  there.
 
  Hope this helps someone, and of course, feel free to improve the tool and
  share!

 I have a CMS config that's even more tuned than before, and it has made
 things MUCH better.  This new config is inspired by more info that I got
 on IRC:

 http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

 The G1 customizations in your blog post don't look like they are really
 G1-specific - they may be useful with CMS as well.  This statement also
 applies to some of the CMS parameters, so I would use those with G1 as
 well for any testing.

 UseNUMA looks interesting for machines that actually are NUMA.  All the
 information that I can find says it is only for the throughput
 (parallel) collector, so it's probably not doing anything for G1.

 The pause parameters you've got for G1 are targets only.  It will *try*
 to stick within those parameters, but if a collection requires more than
 50 milliseconds or has to happen more often than once a second, the
 collector will ignore what you have told it.

 Thanks,
 Shawn




Re: SolrCloud Exception

2013-07-31 Thread Sinduja Rajendran
Thanks shawn for the reply. I would upgrade to solr 4.3 and check that.



On Wed, Jul 31, 2013 at 4:13 PM, Shawn Heisey s...@elyograg.org wrote:

 On 7/31/2013 4:27 AM, Sinduja Rajendran wrote:
  I am running solr 4.0 in a cloud. We have close to 100Mdocuments. The
 data
  is from a single DB table. I use dih.
  Our solrCloud has 3 zookeepers, one tomcat, 2 solr instances in same
  tomcat. We have 8 GB Ram.
 
  After indexing 14M, my indexing fails witht the below exception.
 
  solr org.apache.lucene.index.MergePolicy$MergeException:
  java.lang.OutOfMemoryError: GC overhead limit exceeded
 
  I tried increasing the GC value to the App server
 
   -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80
 
  But after giving the command, my indexing went drastically down. Its
  was indexing only 15k documents for 20 minutes. Earlier it was 300k
  for 20 min.

 First thing to mention is that Solr 4.0 was extremely buggy, upgrading
 would be advisable.  In the meantime:

 An OutOfMemoryError means that Solr needs more heap memory than the JVM
 is allowed to use.  The Solr Admin UI dashboard will tell you how much
 memory is allocated to your JVM, which you can increase with the -Xmx
 parameter.  Real RAM must be available from the system in order to
 increase the heap size.

 The options you have given just change the GC collector and tune one
 aspect of the new collector, they don't increase anything.  Here are
 some things that may help you:

 http://wiki.apache.org/solr/SolrPerformanceProblems
 http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

 After looking over that information and making adjustments, if you are
 still having trouble, we can go over your config and all your details to
 see what can be done.

 You said that both of your Solr instances are running in the same
 tomcat.  Just FYI - because you aren't running all functions on separate
 hardware, your setup is not fault tolerant.  Machine failures DO happen,
 no matter how much redundancy you build into that server.  If you are
 running all this on a redundant VM solution that has live migration of
 running VMs, then my statement isn't accurate.

 Thanks,
 Shawn




Re: upgrade from 4.3 to 4.4

2013-07-31 Thread Jack Krupansky
A dot release should never require reindexing, unless... there is some 
change in a field type analyzer or update processor that your data depends 
on.


For example, some changes occurred in the ngram filter, so whether that 
would impact your data is up to you to decide.


See:
https://issues.apache.org/jira/browse/LUCENE-4955

There were a few other changes as well - you need to review each change 
yourself.


-- Jack Krupansky

-Original Message- 
From: Joshi, Shital

Sent: Wednesday, July 31, 2013 12:31 PM
To: 'solr-user@lucene.apache.org'
Subject: upgrade from 4.3 to 4.4

We have SolrCloud (4.3.0) cluster (5 shards and 2 replicas) on 10 boxes. We 
have about 450 million documents. We're planning to upgrade to Solr 4.4.0. 
Do We need to re-index already indexed documents?


Thanks!




RE: Measuring SOLR performance

2013-07-31 Thread Markus Jelsma
Did you also test indexing speed? With default G1GC settings we're seeing a 
slightly higher latency for queries than CMS. However, G1GC allows for much 
higher throughput than CMS when indexing. I haven't got the raw numbers here 
but it is roughly 45 minutes against 60 in favour of G1GC!

Load is obviously higher with G1GC.
 
 
-Original message-
 From:Roman Chyla roman.ch...@gmail.com
 Sent: Wednesday 31st July 2013 18:32
 To: solr-user@lucene.apache.org
 Subject: Re: Measuring SOLR performance
 
 I'll try to run it with the new parameters and let you know how it goes.
 I've rechecked details for the G1 (default) garbage collector run and I can
 confirm that 2 out of 3 runs were showing high max response times, in some
 cases even 10secs, but the customized G1 never - so definitely the
 parameters had effect because the max time for the customized G1 never went
 higher than 1.5secs (and that happend for 2 query classes only). Both the
 cms-custom and G1-custom are similar, the G1 seems to have higher values in
 the max fields, but that may be random. So, yes, now I am sure what to
 think of default G1 as 'bad', and that these G1 parameters, even if they
 don't seem G1 specific, have real effect.
 Thanks,
 
 roman
 
 
 On Tue, Jul 30, 2013 at 11:01 PM, Shawn Heisey s...@elyograg.org wrote:
 
  On 7/30/2013 6:59 PM, Roman Chyla wrote:
   I have been wanting some tools for measuring performance of SOLR, similar
   to Mike McCandles' lucene benchmark.
  
   so yet another monitor was born, is described here:
   http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/
  
   I tested it on the problem of garbage collectors (see the blogs for
   details) and so far I can't conclude whether highly customized G1 is
  better
   than highly customized CMS, but I think interesting details can be seen
   there.
  
   Hope this helps someone, and of course, feel free to improve the tool and
   share!
 
  I have a CMS config that's even more tuned than before, and it has made
  things MUCH better.  This new config is inspired by more info that I got
  on IRC:
 
  http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
 
  The G1 customizations in your blog post don't look like they are really
  G1-specific - they may be useful with CMS as well.  This statement also
  applies to some of the CMS parameters, so I would use those with G1 as
  well for any testing.
 
  UseNUMA looks interesting for machines that actually are NUMA.  All the
  information that I can find says it is only for the throughput
  (parallel) collector, so it's probably not doing anything for G1.
 
  The pause parameters you've got for G1 are targets only.  It will *try*
  to stick within those parameters, but if a collection requires more than
  50 milliseconds or has to happen more often than once a second, the
  collector will ignore what you have told it.
 
  Thanks,
  Shawn
 
 
 


Re: Solr PolyField

2013-07-31 Thread Luís Portela Afonso
Ok, thanks. I will check it.

On Jul 31, 2013, at 5:08 PM, Jack Krupansky j...@basetechnology.com wrote:

 See:
 https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html
 
 I have more examples in my book.
 
 -- Jack Krupansky
 
 From: Luís Portela Afonso 
 Sent: Wednesday, July 31, 2013 11:41 AM
 To: solr-user@lucene.apache.org 
 Subject: Re: Solr PolyField
 
 Hum, ok. 
 
 It's possible to add to a field, static text? Text that i write on the 
 configuration and then append another field? I saw something like 
 CloneFieldProcessor but when i'm starting solr, it says that could not find 
 the class.
 I was trying to use processors to move one field to another.
 
 I saw this:
 processor class=solr.FieldCopyProcessorFactory
  str name=sourcelastname firstname/str
  str name=destfullname/str
  bool name=appendtrue/bool
  str name=append.delim, /str
 /processor
 But when i try to use it solr says that he cannot find the 
 solr.FieldCopyProcessorFactory. I'm using solr 4.4.0
 
 Thanks ;)
 
 On Jul 31, 2013, at 4:16 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:
 
 
  OK,
 
  Then I would suggest creating multiValued enclosure_type, etc. tags for
  searching, and then one string-typed field to store the JSON snippet you've
  been showing.
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062  | c: +1 917 477 7906
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  
 plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
  w: appinions.com http://www.appinions.com/
 
 
  On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso 
  meligalet...@gmail.com wrote:
 
 
As a single record? Hum, no.
 
So an Rss has /rss/channel/ and then lot of /rss/channel/item, right?
Each /rss/channel/item is a new document on Solr. I start with the solr
example rss, but i change that to has more fields, other fields and get the
feed url from a database.
 
So each /rss/channel/item is a document to the indexing, bue each
/rss/channel/item can have more than on enclosure tag.
 
Many thanks
 
On Jul 31, 2013, at 4:05 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:
 
 
  So you're trying to index a RSS feed as a single record, but you want to
 
be
 
  able to search for and retrieve individual entries from within the feed?
 
Is
 
  that the issue?
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062  | c: +1 917 477 7906
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
 

 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 
  w: appinions.com http://www.appinions.com/
 
 
  On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso 
  meligalet...@gmail.com wrote:
 
 
This fields can be multiValued.
I the rss standart there is not correct to do that, but some sources do
and i like to grab it all. Is there any way that make it possible?
 
Once again, Many thanks :)
 
On Jul 31, 2013, at 3:54 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:
 
 
  Luís,
 
  Is there a reason why splitting this up into enclosure_type,
 
enclosure_url,
 
  and enclosure_length would not work?
 
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062  | c: +1 917 477 7906
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
 
 
 

 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 
  w: appinions.com http://www.appinions.com/
 
 
  On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso 
  meligalet...@gmail.com wrote:
 
 
Hi,
 
I'm trying to index information of RSS Feeds.
 
So in a more detailed explanation:
 
The RSS feed has something like:
enclosure url=
 
http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3;
 
length=32642192 type=audio/mpeg/
 
*With my current configuration, this is working and i get a result
 
like
 
that:*
 
 
- enclosure:
[
   - audio/mpeg,
   - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
   - 37521428
   ],
 
 
*BUT,* this is not the result that i'm trying to 

RE: Ingesting geo data into Solr very slow

2013-07-31 Thread Simonian, Marta M (US SSA)
Hi guys,

Here is the reply I got from the solr group. I'll change those settings. It's 
good to know that it doesn't matter if we use the bean vs solr doc.

-Marta 

-Original Message-
From: David Smiley (@MITRE.org) [mailto:dsmi...@mitre.org] 
Sent: Tuesday, July 30, 2013 9:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Ingesting geo data into Solr very slow

Hi Marta,

Presumably you are indexing polygons -- I suspect complex ones.  There isn't 
too much that you can do about this right now other than index them in 
parallel.  I see you are doing this in 2 threads; try 4, or maybe even 6. 
Also, ensure that maxDistErr is reflective of the smallest distance you need to 
distinguish between.  It may help a little but not much.  I can think of some 
internal code details that might be improved but that doesn't help you now.

There's some generic Solr things you can do to improve indexing performance too 
like increasing the indexing buffer size (100MB - 200MB) and the mergeFactor 
(10-20 albeit temporarily and/or issue optimize), both in solrconfig.xml.

Changing the servlet engine won't help. Calling server.addBean(item) isn't a 
problem either.

~ David


Simonian, Marta M (US SSA) wrote
 Hi,
 
 We are using Solr 4.4 to ingest geo data and it's really slow. When we 
 don't index the geo it takes seconds to ingest 100, 000 records but as 
 soon as we add it takes 2 hours.
 
 Also we found that when changing the distErrPct from 0.025 to 0.1, 
 1000 rows are ingested in 20 sec vs 2 min. But we can't change that 
 setting as we want our search to be as accurate as possible.
 
 About the environment we are running Solr on 6 CPUs and 8GB of memory.
 We've been monitoring the VMs and they seem to be ok.
 
 We are running on Tomcat but we might switch to Jetty to see if that 
 will increase the performance.
 
 We use ConcurrentUpdateSolrServer(httpSolrServer, 5000, 2);
 
 We are saving a bean rather than a solr document (server.addBean(item)).
 I'm not sure if that could make it slow as it's going to do some 
 conversion?
 
 Can you please let me know what are the best settings for Solr? Maybe 
 some changes in the solrconfig.xml or the schema.xml?
 What are the preferred environment settings and resources?
 
 Thank you!
 Marta





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Ingesting-geo-data-into-Solr-very-slow-tp4081484p4081527.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Ingesting geo data into Solr very slow

2013-07-31 Thread Simonian, Marta M (US SSA)
Does anybody know if Solr performs better on Jetty vs Tomcat?

-Original Message-
From: David Smiley (@MITRE.org) [mailto:dsmi...@mitre.org] 
Sent: Tuesday, July 30, 2013 9:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Ingesting geo data into Solr very slow

Hi Marta,

Presumably you are indexing polygons -- I suspect complex ones.  There isn't 
too much that you can do about this right now other than index them in 
parallel.  I see you are doing this in 2 threads; try 4, or maybe even 6. 
Also, ensure that maxDistErr is reflective of the smallest distance you need to 
distinguish between.  It may help a little but not much.  I can think of some 
internal code details that might be improved but that doesn't help you now.

There's some generic Solr things you can do to improve indexing performance too 
like increasing the indexing buffer size (100MB - 200MB) and the mergeFactor 
(10-20 albeit temporarily and/or issue optimize), both in solrconfig.xml.

Changing the servlet engine won't help. Calling server.addBean(item) isn't a 
problem either.

~ David


Simonian, Marta M (US SSA) wrote
 Hi,
 
 We are using Solr 4.4 to ingest geo data and it's really slow. When we 
 don't index the geo it takes seconds to ingest 100, 000 records but as 
 soon as we add it takes 2 hours.
 
 Also we found that when changing the distErrPct from 0.025 to 0.1, 
 1000 rows are ingested in 20 sec vs 2 min. But we can't change that 
 setting as we want our search to be as accurate as possible.
 
 About the environment we are running Solr on 6 CPUs and 8GB of memory.
 We've been monitoring the VMs and they seem to be ok.
 
 We are running on Tomcat but we might switch to Jetty to see if that 
 will increase the performance.
 
 We use ConcurrentUpdateSolrServer(httpSolrServer, 5000, 2);
 
 We are saving a bean rather than a solr document (server.addBean(item)).
 I'm not sure if that could make it slow as it's going to do some 
 conversion?
 
 Can you please let me know what are the best settings for Solr? Maybe 
 some changes in the solrconfig.xml or the schema.xml?
 What are the preferred environment settings and resources?
 
 Thank you!
 Marta





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Ingesting-geo-data-into-Solr-very-slow-tp4081484p4081527.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Ingesting geo data into Solr very slow

2013-07-31 Thread Shawn Heisey

On 7/31/2013 11:20 AM, Simonian, Marta M (US SSA) wrote:

Does anybody know if Solr performs better on Jetty vs Tomcat?


Jetty has less complexity than tomcat.  It is likely to use less memory. 
 If you went with default settings for both, jetty is likely to perform 
better, but the difference would probably be very small.


If you understand how to tune your servlet container, then there's no 
way to answer that question.  You should use whatever you are 
comfortable with.  A well-tuned tomcat server would probably perform 
better than the default example jetty - but you have to do that tuning.


The only concrete information I can give you is this:  Solr tests use 
jetty, so jetty is the only container that is fully tested with Solr. 
Bugs *have* been found with other containers, and they get fixed as fast 
as possible.


The other point worth reiterating: Unless you carefully tune your 
container, something this list can't really help you with, the container 
choice probably isn't going to affect performance much.


Thanks,
Shawn



Alternative searches

2013-07-31 Thread Mark
Can someone explain how one would go about providing alternative searches for a 
query… similar to Amazon.

For example say I search for Red Dump Truck

- 0 results for Red Dump Truck
- 500 results for  Red Truck
- 350 results for Dump Truck

Does this require multiple searches? 

Thanks

Re: Solr list all records but fq matching records first

2013-07-31 Thread Jack Krupansky
I was going to say 10, but frequently people find that they need a really 
big boost.


Normally, a boost might be 1.5 or 2 or 5, or something like that.

A fractional boost, like 0.5, 0.25, 0.1, or even 0.01 can de-emphasize 
terms.


If you add debugQuery=true to your query request and look at the explain 
section, you can see all the scores and intermediate scores to get an idea 
how big a boost a document needs to make it move as desired.


-- Jack Krupansky

-Original Message- 
From: Thyagaraj

Sent: Wednesday, July 31, 2013 1:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr list all records but fq matching records first

Awesome Jack Krupansky-2!!!. It seems to work!.

What I didn't understand is *^100*. Could you give some explanation on ^100
please? if it could be any number other than 100?.


Thanks a lot!, I was working on this for past 3 days!.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-list-all-records-but-fq-matching-records-first-tp4081572p4081677.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr list all records but fq matching records first

2013-07-31 Thread Thyagaraj
Awesome Jack Krupansky-2!!!. It seems to work!. 

What I didn't understand is *^100*. Could you give some explanation on ^100
please? if it could be any number other than 100?.


Thanks a lot!, I was working on this for past 3 days!.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-list-all-records-but-fq-matching-records-first-tp4081572p4081677.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TrieField and FieldCache confusion

2013-07-31 Thread Chris Hostetter

: Can I expect the FieldCache of Lucene to return the correct values when
: working
: with TrieField with the precisionStep higher than 0. If not, what did I get
: wrong?

Yes -- the code for building FieldCaches from Trie fields is smart enough 
to ensure that only the real original values are used to populate the 
Cache

(See for example: FieldCache.NUMERIC_UTILS_INT_PARSER and the classes 
linked to from it's javadocs...

https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/FieldCache.html#NUMERIC_UTILS_INT_PARSER
https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/util/NumericUtils.html
https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/document/IntField.html

(Solr's Trie fields are backed by the various numeric fields in lucene -- 
ie: solr:TrieIntField - lucene:IntField.  the Trie* prefix is used in 
solr because there already had classes named IntField, DoubleField, etc... 
when the Trie based impls where added to lucene)


-Hoss


Re: Improper shutdown of Solr in Jetty 9

2013-07-31 Thread Chris Hostetter

: it's Windows 7. I'm starting Jetty with java -jar start.jar

Not sure if you are using cygwin, or if this is related but...

https://issues.apache.org/jira/browse/SOLR-3884
https://issues.apache.org/jira/browse/SOLR-3884?focusedCommentId=13462996page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13462996
https://issues.apache.org/jira/browse/SOLR-3884?focusedCommentId=13463332page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463332

http://cygwin.com/ml/cygwin/2012-07/msg00250.html
http://cygwin.com/ml/cygwin/2012-05/msg00482.html


-Hoss


Re: queryResultCache showing all zeros

2013-07-31 Thread Chris Hostetter


: We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran 
: about 200 000 queries taken from our production environment and measured 
: the performance of the cloud over a collection of 14M documents with the 
: default Solr settings. We are now trying to tune the different caches 
: and when I look at each node of the cloud, all of them are showing no 
: activity (see below) regarding the queryResultCache... all other caches 
: are showing some activity. Any idea what could cause this?

Can you show us some examples of hte types of queries you are executing?

Do you have useFilterForSortedQuery in your solrconfig.xml ?



-Hoss


Re: FieldCollapsing issues in SolrCloud 4.4

2013-07-31 Thread Ali, Saqib
Hello Paul,

Can you please explain what you mean by:
To get the exact number of groups, you need to shard along your grouping
field

Thanks! :)


On Wed, Jul 31, 2013 at 3:08 AM, Paul Masurel paul.masu...@gmail.comwrote:

 Do you mean you get different results with group=true?
 numFound is supposed returns the number of ungrouped hits.

 To get the number of groups, you are expected to set
 set group.ngroups=true.
 Even then, the result will only give you an upperbound
 in a distributed environment.
 To get the exact number of groups, you need to shard along
 your grouping field.

 If you have many groups, you may also experience a huge performance
 hit, as the current implementation has been heaviy optimized for low
 number of groups (e.g. e-commerce categories).

 Paul



 On Wed, Jul 31, 2013 at 1:59 AM, Ali, Saqib docbook@gmail.com wrote:

  Hello all,
 
  Is anyone experiencing issues with the numFound when using group=true in
  SolrCloud 4.4?
 
  Sometimes the results are off for us.
 
  I will post more details shortly.
 
  Thanks.
 



 --
 __

  Masurel Paul
  e-mail: paul.masu...@gmail.com



Re: Sending shard requests to all replicas

2013-07-31 Thread Isaac Hebsh
Thanks to Ryan Ernst, my issue is duplicate of SOLR-4449.
I think that this proposal might be very useful (some supporting links are
attached there. worth reading..)


On Tue, Jul 30, 2013 at 11:49 PM, Isaac Hebsh isaac.he...@gmail.com wrote:

 Hi,
 I submitted a new JIRA for this:
 https://issues.apache.org/jira/browse/SOLR-5092

 A (very initial) patch is already attached. Reviews are very welcome.


 On Sun, Jul 28, 2013 at 4:50 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 You'd probably start in CloudSolrServer in SolrJ code,
 as far as I know that's where the request is sent out.

 I'd think that would be better than changing Solr itself
 since if you found that this was useful you wouldn't
 be patching your Solr release, just keeping your client
 up to date.

 Best
 Erick

 On Sat, Jul 27, 2013 at 7:28 PM, Isaac Hebsh isaac.he...@gmail.com
 wrote:
  Shawn, thank you for the tips.
  I know the significant cons of virtualization, but I don't want to move
  this thread into a virtualization pros/cons in the Solr(Cloud) case.
 
  I've just asked what is the minimal code change should be made, in
 order to
  examine whether this is a possible solution or not.. :)
 
 
  On Sun, Jul 28, 2013 at 1:06 AM, Shawn Heisey s...@elyograg.org
 wrote:
 
  On 7/27/2013 3:33 PM, Isaac Hebsh wrote:
   I have about 40 shards. repFactor=2.
   The cause of slower shards is very interesting, and this is the main
   approach we took.
   Note that in every query, it is another shard which is the slowest.
 In
  20%
   of the queries, the slowest shard takes about 4 times more than the
  average
   shard qtime.
   While continuing investigation, remember it might be the
 virtualization /
   storage-access / network / gc /..., so I thought that reducing the
 effect
   of the slow shards might be a good (temporary or permanent) solution.
 
  Virtualization is not the best approach for Solr.  Assuming you're
  dealing with your own hardware and not something based in the cloud
 like
  Amazon, you can get better results by running on bare metal and having
  multiple shards per host.
 
  Garbage collection is a very likely source of this problem.
 
  http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems
 
   I thought it should be an almost trivial code change (for proving the
   concept). Isn't it?
 
  I have no idea what you're saying/asking here.  Can you clarify?
 
  It seems to me that sending requests to all replicas would just
 increase
  the overall load on the cluster, with no real benefit.
 
  Thanks,
  Shawn
 
 





RE: queryResultCache showing all zeros

2013-07-31 Thread Jean-Sebastien Vachon
Looks like the problem might not be related to Solr but to a proprietary system 
we have on top of it. 
I made some queries with facets and the cache was updated. We are looking into 
this... I should not have assumed that the problem was coming from Solr ;)

I'll let you know if there is anything

From: Chris Hostetter
Sent: Wednesday, July 31, 2013 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: queryResultCache showing all zeros

: We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran
: about 200 000 queries taken from our production environment and measured
: the performance of the cloud over a collection of 14M documents with the
: default Solr settings. We are now trying to tune the different caches
: and when I look at each node of the cloud, all of them are showing no
: activity (see below) regarding the queryResultCache... all other caches
: are showing some activity. Any idea what could cause this?

Can you show us some examples of hte types of queries you are executing?

Do you have useFilterForSortedQuery in your solrconfig.xml ?



-Hoss


RE: Highlighting externally stored text

2013-07-31 Thread JohnRodey
Hey Bryan, Thanks for the response!  To make use of the FastVectorHighlighter
you need to enable termVectors, termPositions, and termOffsets correct? 
Which takes a considerable amount of space, but is good to know and I may
possibly pursue this solution as well.  Just starting to look at the code
now, do you remember how substantial the change was?

Are there any other options?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387p4081719.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Alternative searches

2013-07-31 Thread Petersen, Robert
Hi Mark

Yes, it is something we implemented also.  We just try various subsets of the 
search terms when there are zero results.  To increase performance for all 
these searches we return only the first three results and no facets so we can 
simply display the result counts for the various subsets of the original search 
terms.  We only do this if the first search had zero results and then a double 
metaphone search (which is how we handle misspelled terms) also returned 
nothing.  We also apply various heuristics to the alternative searches being 
performed like no one word searches if the original search had many words etc

Thanks
Robi

-Original Message-
From: Mark [mailto:static.void@gmail.com] 
Sent: Wednesday, July 31, 2013 10:35 AM
To: solr-user@lucene.apache.org
Subject: Alternative searches

Can someone explain how one would go about providing alternative searches for a 
query... similar to Amazon.

For example say I search for Red Dump Truck

- 0 results for Red Dump Truck
- 500 results for  Red Truck
- 350 results for Dump Truck

Does this require multiple searches? 

Thanks



RE: queryResultCache showing all zeros

2013-07-31 Thread Jean-Sebastien Vachon
Ok I might have found an Solr issue after I fixed a problem in our system.

This the kind of query we are making:

http://10.0.5.214:8201/solr/Current/select?fq=position_refreshed_date_id:[2747%20TO%203501]fq=position_soc_2011_8_code:41101100fq=country_id:1fq=position_job_type_id:4fq=position_education_level_id:8fq=position_salary_range_id:2fq=is_dirty:falsefq=is_staffing:falsefq=-position_soc_2011_2_code:99fq=-covering_source_id:(839%20OR%201145%20OR%2025%20OR%20802%20OR%20777%20OR%2085%20OR%20881%20OR%20775%20OR%201558%20OR%20743%20OR%20800%20OR%201580%20OR%201147%20OR%201690%20OR%20674%20OR%20894%20OR%20791)q=%20(title:photographer%20OR%20ad_description:photographer%20OR%20super_alias:photographer)%20AND%20(_val_:%22sum(product(75,div(5000,sum(50,sub(3500,position_refreshed_date_id,product(0.75,job_score),product(0.75,source_score))%22)facet=truefacet.mincount=1f.state_id.facet.limit=10facet.field=state_idfacet.field=position_salary_range_idfacet.field=position_job_type_idfacet.field=position_naics_6_codefacet.field=place_idfacet.field=position_education_level_idfacet.field=position_soc_2011_8_codef.position_salary_range_id.facet.limit=10f.position_job_type_id.facet.limit=10f.position_naics_6_code.facet.limit=10f.place_id.facet.limit=10f.position_education_level_id.facet.limit=10f.position_soc_2011_8_code.facet.limit=10rows=10start=0fl=job_id,position_id,super_alias_id,advertiser,super_alias,credited_source_id,position_first_seen_date_id,position_last_seen_date_id,%20position_posted_date_id,%20position_refreshed_date_id,%20position_job_type_id,%20position_function_id,position_green_code,title_id,semi_clean_title_id,clean_title_id,position_empl_count,place_id,%20state_id,county_id,msa_id,country_id,position_id,position_job_type_mva,%20ad_activity_status_id,%20position_score,%20ad_score,position_salary,position_salary_range_id,position_salary_source,position_naics_6_code,position_education_level_id,%20is_staffing,is_bulk,is_anonymous,is_third_party,is_dirty,ref_num,tags,lat,long,position_duns_number,url,advertiser_id,%20title,%20semi_clean_title,%20ad_description,%20position_description,%20ad_bls_salary,%20position_bls_salary,%20covering_source_id,%20content_model_id,position_soc_2011_8_code,position_noc_2006_4_idgroup.field=position_idgroup=truegroup.ngroups=truegroup.main=truesort=score%20desc

it's quite long but this request uses both faceting and grouping. If I remove 
the grouping then the cache is used. Is this a normal behavior or a bug?

Thanks

From: Jean-Sebastien Vachon
Sent: Wednesday, July 31, 2013 2:38 PM
To: solr-user@lucene.apache.org
Subject: RE: queryResultCache showing all zeros

Looks like the problem might not be related to Solr but to a proprietary system 
we have on top of it.
I made some queries with facets and the cache was updated. We are looking into 
this... I should not have assumed that the problem was coming from Solr ;)

I'll let you know if there is anything

From: Chris Hostetter
Sent: Wednesday, July 31, 2013 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: queryResultCache showing all zeros

: We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran
: about 200 000 queries taken from our production environment and measured
: the performance of the cloud over a collection of 14M documents with the
: default Solr settings. We are now trying to tune the different caches
: and when I look at each node of the cloud, all of them are showing no
: activity (see below) regarding the queryResultCache... all other caches
: are showing some activity. Any idea what could cause this?

Can you show us some examples of hte types of queries you are executing?

Do you have useFilterForSortedQuery in your solrconfig.xml ?



-Hoss

RE: queryResultCache showing all zeros

2013-07-31 Thread Jean-Sebastien Vachon
Also we do not have any useFilterForSortedQuery in our config. So we are 
relying on the default which I guess is false.




From: Jean-Sebastien Vachon
Sent: Wednesday, July 31, 2013 3:44 PM
To: solr-user@lucene.apache.org
Subject: RE: queryResultCache showing all zeros

Ok I might have found an Solr issue after I fixed a problem in our system.

This the kind of query we are making:

http://10.0.5.214:8201/solr/Current/select?fq=position_refreshed_date_id:[2747%20TO%203501]fq=position_soc_2011_8_code:41101100fq=country_id:1fq=position_job_type_id:4fq=position_education_level_id:8fq=position_salary_range_id:2fq=is_dirty:falsefq=is_staffing:falsefq=-position_soc_2011_2_code:99fq=-covering_source_id:(839%20OR%201145%20OR%2025%20OR%20802%20OR%20777%20OR%2085%20OR%20881%20OR%20775%20OR%201558%20OR%20743%20OR%20800%20OR%201580%20OR%201147%20OR%201690%20OR%20674%20OR%20894%20OR%20791)q=%20(title:photographer%20OR%20ad_description:photographer%20OR%20super_alias:photographer)%20AND%20(_val_:%22sum(product(75,div(5000,sum(50,sub(3500,position_refreshed_date_id,product(0.75,job_score),product(0.75,source_score))%22)facet=truefacet.mincount=1f.state_id.facet.limit=10facet.field=state_idfacet.field=position_salary_range_idfacet.field=position_job_type_idfacet.field=position_naics_6_codefacet.field=place_idfacet.field=position_education_level_idfacet.field=position_soc_2011_8_codef.position_salary_range_id.facet.limit=10f.position_job_type_id.facet.limit=10f.position_naics_6_code.facet.limit=10f.place_id.facet.limit=10f.position_education_level_id.facet.limit=10f.position_soc_2011_8_code.facet.limit=10rows=10start=0fl=job_id,position_id,super_alias_id,advertiser,super_alias,credited_source_id,position_first_seen_date_id,position_last_seen_date_id,%20position_posted_date_id,%20position_refreshed_date_id,%20position_job_type_id,%20position_function_id,position_green_code,title_id,semi_clean_title_id,clean_title_id,position_empl_count,place_id,%20state_id,county_id,msa_id,country_id,position_id,position_job_type_mva,%20ad_activity_status_id,%20position_score,%20ad_score,position_salary,position_salary_range_id,position_salary_source,position_naics_6_code,position_education_level_id,%20is_staffing,is_bulk,is_anonymous,is_third_party,is_dirty,ref_num,tags,lat,long,position_duns_number,url,advertiser_id,%20title,%20semi_clean_title,%20ad_description,%20position_description,%20ad_bls_salary,%20position_bls_salary,%20covering_source_id,%20content_model_id,position_soc_2011_8_code,position_noc_2006_4_idgroup.field=position_idgroup=truegroup.ngroups=truegroup.main=truesort=score%20desc

it's quite long but this request uses both faceting and grouping. If I remove 
the grouping then the cache is used. Is this a normal behavior or a bug?

Thanks

From: Jean-Sebastien Vachon
Sent: Wednesday, July 31, 2013 2:38 PM
To: solr-user@lucene.apache.org
Subject: RE: queryResultCache showing all zeros

Looks like the problem might not be related to Solr but to a proprietary system 
we have on top of it.
I made some queries with facets and the cache was updated. We are looking into 
this... I should not have assumed that the problem was coming from Solr ;)

I'll let you know if there is anything

From: Chris Hostetter
Sent: Wednesday, July 31, 2013 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: queryResultCache showing all zeros

: We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran
: about 200 000 queries taken from our production environment and measured
: the performance of the cloud over a collection of 14M documents with the
: default Solr settings. We are now trying to tune the different caches
: and when I look at each node of the cloud, all of them are showing no
: activity (see below) regarding the queryResultCache... all other caches
: are showing some activity. Any idea what could cause this?

Can you show us some examples of hte types of queries you are executing?

Do you have useFilterForSortedQuery in your solrconfig.xml ?



-Hoss

RE: queryResultCache showing all zeros

2013-07-31 Thread Chris Hostetter

: it's quite long but this request uses both faceting and grouping. If I 
: remove the grouping then the cache is used. Is this a normal behavior or 
: a bug?

I believe that is expected -- i don't think grouping can take advantage of 
the queryResultCache because of how it collects documents.

there is however a group.cache.percent option tha you might look into -- 
but i honestly have no idea if that toggles the use of queryResultCache or 
something else, i havn't played with it before...

https://wiki.apache.org/solr/FieldCollapsing#Request_Parameters

-Hoss


Re: Performance question on Spatial Search

2013-07-31 Thread Steven Bower
the list of IDs does change relatively frequently, but this doesn't seem to
have very much impact on the performance of the query as far as I can tell.

attached are the stacks

thanks,

steve


On Wed, Jul 31, 2013 at 6:33 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 On Wed, Jul 31, 2013 at 1:10 AM, Steven Bower sbo...@alcyon.net wrote:

 
  not sure what you mean by good hit raitio?
 

 I mean such queries are really expensive (even on cache hit), so if the
 list of ids changes every time, it never hit cache and hence executes these
 heavy queries every time. It's well known performance problem.


  Here are the stacks...
 
 they seems like hotspots, and shows index reading that's reasonable. But I
 can't see what caused these readings, to get that I need whole stack of hot
 thread.


 
Name Time (ms) Own Time (ms)
 
 
 org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext,
  Bits) 300879 203478
 
 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc()
  45539 19
 
 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs()
  45519 40
 
 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readVIntBlock(IndexInput,
  int[], int[], int, boolean) 24352 0
  org.apache.lucene.store.DataInput.readVInt() 24352 24352
  org.apache.lucene.codecs.lucene41.ForUtil.readBlock(IndexInput, byte[],
  int[]) 21126 14976
  org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
  6150 0  java.nio.DirectByteBuffer.get(byte[], int, int)
  6150 0
  java.nio.Bits.copyToArray(long, Object, long, long, long) 6150 6150
 
 
 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(Bits,
  DocsEnum, int) 35342 421
 
 
 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.decodeMetaData()
  34920 27939
 
 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.nextTerm(FieldInfo,
  BlockTermState) 6980 6980
 
 
 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next()
  14129 1053
 
 
 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadNextFloorBlock()
  5948 261
 
 
 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
  5686 199
  org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
  3606 0  java.nio.DirectByteBuffer.get(byte[], int, int)
  3606 0
  java.nio.Bits.copyToArray(long, Object, long, long, long) 3606 3606
 
 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput,
  FieldInfo, BlockTermState) 1879 80
  org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
  1798 0java.nio.DirectByteBuffer.get(byte[], int, int)
  1798 0
  java.nio.Bits.copyToArray(long, Object, long, long, long) 1798 1798
 
 
 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next()
  4010 3324
 
 
 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextNonLeaf()
  685 685
 
 
 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
  3117 144
  org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
  1861 0java.nio.DirectByteBuffer.get(byte[], int, int) 1861
  0
  java.nio.Bits.copyToArray(long, Object, long, long, long) 1861 1861
 
 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput,
  FieldInfo, BlockTermState) 1090 19
  org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
  1070 0  java.nio.DirectByteBuffer.get(byte[], int, int)
  1070 0
  java.nio.Bits.copyToArray(long, Object, long, long, long) 1070 1070
 
 
 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.initIndexInput()
  20 0org.apache.lucene.store.ByteBufferIndexInput.clone()
  20 0
  org.apache.lucene.store.ByteBufferIndexInput.clone() 20 0
  org.apache.lucene.store.ByteBufferIndexInput.buildSlice(long, long) 20
  0
  org.apache.lucene.util.WeakIdentityMap.put(Object, Object) 20 0
 
 org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference.init(Object,
  ReferenceQueue) 20 0
  java.lang.System.identityHashCode(Object) 20 20
  org.apache.lucene.index.FilteredTermsEnum.docs(Bits, DocsEnum, int)
  1485 527
 
 
 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(Bits,
  DocsEnum, int) 957 0
 
 
 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.decodeMetaData()
  957 513
 
 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.nextTerm(FieldInfo,
  BlockTermState) 443 443
  org.apache.lucene.index.FilteredTermsEnum.next() 874 324
 
 
 org.apache.lucene.search.NumericRangeQuery$NumericRangeTermsEnum.accept(BytesRef)
  368 0
 
 
 org.apache.lucene.util.BytesRef$UTF8SortedAsUnicodeComparator.compare(Object,
  Object) 368 

Re: Auto Correction of Solr Query

2013-07-31 Thread Otis Gospodnetic
Hi Siva,

I think I mention this several days ago... DYM ReSearcher will do that:
http://sematext.com/products/dym-researcher/index.html

Otis


On Tuesday, July 30, 2013, sivaprasad wrote:

 Hi,

 Is there any way to auto correct the Solr query and get the results? For
 example, user tries to search for beats by dre , but by mistake , he
 typed
 beats bt dre. In this case, Solr should correct the query and return the
 results for beats by dre.

 Is there any suggestions, how we can achieve this?

 -Siva



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Auto-Correction-of-Solr-Query-tp4081220.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm


RE: Highlighting externally stored text

2013-07-31 Thread JohnRodey
Just an update.  Change was pretty straight forward (at least for my simple
test case) just a few lines in the getBestFragments method seemed to do the
trick.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387p4081748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Inconsistent facet ranges when using distributed search in Solr 4.3

2013-07-31 Thread Jose Aguilar
Hi all,

I am seeing some inconsistent behavior with facets, specifically range facets, 
on Solr 4.3. Running the same query several times (pressing F5 on the browser) 
produces different facet ranges when doing distributed searches, as some times 
it doesn't include some of the buckets. The results of the search are always 
correct as far as I can tell, it is just the range facets that sometimes miss 
ranges . 

Has anyone seen this behavior in Solr before? Any recommendations on how to 
troubleshoot this issue?

Here are some details and an example:

As an example of what I am seeing, take this query, in which I'll be faceting 
on the docnumber field:

http://SERVER:8081/solr/shard1/myhandler?

shards=SERVER:8081/solr/shard1,SERVER:8081/solr/shard2,SERVER:8081/solr/shard3
shards.qt=myhandler
facet=true
facet.field=docnumber
f.docnumber.facet.sort=index
facet.range=docnumber
f.docnumber.facet.range.start=0
f.docnumber.facet.range.gap=100
f.docnumber.facet.range.end=10
f.docnumber.facet.limit=1000
facet.mincount=1
q=type:document
wt=xml

When I run it, I get one of the following three response, seemingly at random 
(haven't been able to notice a pattern so far):

1. Get 859 results (correct), but nothing on the facet ranges:

...
result name=response numFound=859 start=0 maxScore=8.006225
...
lst name=facet_ranges
lst name=docnumber
lst name=counts/
int name=gap100/int
int name=start0/int
int name=end10/int
/lst
/lst

2. Get 859 results (correct), and the correct number of facets come up in the 
facet ranges (118+109+119+122+134+100+100+57=859):

...
result name=response numFound=859 start=0 maxScore=8.006225
...
lst name=facet_ranges
lst name=docnumber
lst name=counts
int name=0118/int
int name=100109/int
int name=200119/int
int name=300122/int
int name=400134/int
int name=500100/int
int name=600100/int
int name=70057/int
/lst
int name=gap100/int
int name=start0/int
int name=end10/int
/lst
/lst

3. Get 859 results (correct), and only a partial number of facet ranges 
(118+109+119+122+134=602 vs. 859 results):

...
result name=response numFound=859 start=0 maxScore=8.006225
...
lst name=facet_ranges
lst name=docnumber
lst name=counts
int name=0118/int
int name=100109/int
int name=200119/int
int name=300122/int
int name=400134/int
/lst
int name=gap100/int
int name=start0/int
int name=end10/int
/lst
/lst

I am using Solr 4.3 (4.3.0 1477023), with these parameters:

Facet-related:
facet=true
facet.field=docnumber
f.docnumber.facet.sort=index
facet.range=docnumber
f.docnumber.facet.range.start=0
f.docnumber.facet.range.gap=100
f.docnumber.facet.range.end=10
f.docnumber.facet.limit=1000
facet.mincount=1

For distributed search (environment has 3 cores in the same box):

shards=SERVER:8081/solr/shard1,SERVER:8081/solr/shard2,SERVER:8081/solr/shard3
shards.qt=myhandler

And the query:
q=type:document
wt=xml

It is also worth noting that the facet field section does come up with the 
correct facets, the issue seems to be related only to the facet  ranges (unless 
I am missing something). In the responses for all three examples above, the 
facet_fields list has all the values for docnumber, from 1 to 756, even if the 
facet ranges are missing buckets.

lst name=facet_fields
lst name=docnumber
int name=11/int
int name=22/int
... (continues on from 3 to 754) ...
int name=7551/int
int name=7561/int
/lst
/lst


Thanks,


Jose. 

RE: Highlighting externally stored text

2013-07-31 Thread Bryan Loofbourrow
 Hey Bryan, Thanks for the response!  To make use of the
 FastVectorHighlighter
 you need to enable termVectors, termPositions, and termOffsets correct?
 Which takes a considerable amount of space, but is good to know and I
may
 possibly pursue this solution as well.  Just starting to look at the
code
 now, do you remember how substantial the change was?

 Are there any other options?

John,

Yes, you do need to enable those, and yes, it takes a considerable amount
of space.

It has been a while, but the change itself was not too bad, mostly at the
top level, isolating an interface that returns the structure you need, and
transposing that into something for Solr to return.

The only other issues are around queries. If FVH supports all the queries
you use, great. If it's just missing something simple to deal with, like
DisjunctionMaxQuery, then it's just adding another rewrite call.

But if you are using the SpanQuery hierarchy, it's much trickier. I did in
fact do an implementation for that, but it was not very satisfactory --
transposing unordered SpanNearQuery into the representation used by FVH
was an O(n!) operation, and the complexity of the implementation was quite
high, for a number of reasons including lack of FVH representation for
mixed-slop phrases.

I don't know of other options -- except for the one I finally wound up
doing, which was writing my own highlighter, which unfortunately I am not
in a position to share for reasons not my own. But the main reason for
that was the SpanNearQuery support, which may not be a problem you have.

It's possible that something similar could be done with the Postings
highlighter, but I did not look too deeply into that, because the lack of
phrase support was a blocker.

-- Bryan


Re: Measuring SOLR performance

2013-07-31 Thread Roman Chyla
No, I haven't had time for that (and unlikely won't have for the next few
weeks), but it is on the list - if it is 25% improvement, it would be
really worth of the change to G1.
Thanks,

roman


On Wed, Jul 31, 2013 at 1:00 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Did you also test indexing speed? With default G1GC settings we're seeing
 a slightly higher latency for queries than CMS. However, G1GC allows for
 much higher throughput than CMS when indexing. I haven't got the raw
 numbers here but it is roughly 45 minutes against 60 in favour of G1GC!

 Load is obviously higher with G1GC.


 -Original message-
  From:Roman Chyla roman.ch...@gmail.com
  Sent: Wednesday 31st July 2013 18:32
  To: solr-user@lucene.apache.org
  Subject: Re: Measuring SOLR performance
 
  I'll try to run it with the new parameters and let you know how it goes.
  I've rechecked details for the G1 (default) garbage collector run and I
 can
  confirm that 2 out of 3 runs were showing high max response times, in
 some
  cases even 10secs, but the customized G1 never - so definitely the
  parameters had effect because the max time for the customized G1 never
 went
  higher than 1.5secs (and that happend for 2 query classes only). Both the
  cms-custom and G1-custom are similar, the G1 seems to have higher values
 in
  the max fields, but that may be random. So, yes, now I am sure what to
  think of default G1 as 'bad', and that these G1 parameters, even if they
  don't seem G1 specific, have real effect.
  Thanks,
 
  roman
 
 
  On Tue, Jul 30, 2013 at 11:01 PM, Shawn Heisey s...@elyograg.org
 wrote:
 
   On 7/30/2013 6:59 PM, Roman Chyla wrote:
I have been wanting some tools for measuring performance of SOLR,
 similar
to Mike McCandles' lucene benchmark.
   
so yet another monitor was born, is described here:
   
 http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/
   
I tested it on the problem of garbage collectors (see the blogs for
details) and so far I can't conclude whether highly customized G1 is
   better
than highly customized CMS, but I think interesting details can be
 seen
there.
   
Hope this helps someone, and of course, feel free to improve the
 tool and
share!
  
   I have a CMS config that's even more tuned than before, and it has made
   things MUCH better.  This new config is inspired by more info that I
 got
   on IRC:
  
   http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
  
   The G1 customizations in your blog post don't look like they are really
   G1-specific - they may be useful with CMS as well.  This statement also
   applies to some of the CMS parameters, so I would use those with G1 as
   well for any testing.
  
   UseNUMA looks interesting for machines that actually are NUMA.  All the
   information that I can find says it is only for the throughput
   (parallel) collector, so it's probably not doing anything for G1.
  
   The pause parameters you've got for G1 are targets only.  It will *try*
   to stick within those parameters, but if a collection requires more
 than
   50 milliseconds or has to happen more often than once a second, the
   collector will ignore what you have told it.
  
   Thanks,
   Shawn
  
  
 



Re: queryResultCache showing all zeros

2013-07-31 Thread Yonik Seeley
On Wed, Jul 31, 2013 at 3:49 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
 there is however a group.cache.percent option tha you might look into --
 but i honestly have no idea if that toggles the use of queryResultCache or
 something else, i havn't played with it before...

That's only a single-request cache (caches some ids/scores within a
single request and is not reused across different requests).

-Yonik
http://lucidworks.com


Re: SolrCloud and Joins

2013-07-31 Thread David Larochelle
Thanks Walter,

Existing media sets will rarely change but new media sets will be added
relatively frequently. (There is a many to many relationship between media
sets and media sources.) Given the size of data, a new Media Set that only
includes 1% of the collection would include 6 million rows.

Our data is stored in a Postgresql database and imported using the
dataImportHandler. It takes around 3 days to fully import the data.
In the single shard case, the nice thing about using joins is that the
media set to source mapping data could be updated using an hourly cron job
while the sentence data could be updated using a delta query.

The obvious alternative to joins is to add the media_sets_id to the
sentence data as a multi-value field. We'll benchmark this. But my concern
is that importing the full data will take even longer and that there will
be no easy way to automatically update each affected row when a new media
set is created. (I could write a separate one-off query for
DataImportHandler each time a new media set is added but this requires a
lot of manual interaction.)

Does SolrCloud really not have a simple way to specify which shard to put a
document on? I'm considering randomly generating document ID prefixes and
then taking their murmurhash to determine what shards they correspond to. I
could then explicitly send documents to a particular shard by specifying a
document ID prefix. However, this seems like a hackish approach. Is there a
better way?



On Mon, Jul 29, 2013 at 12:45 PM, Walter Underwood wun...@wunderwood.orgwrote:

 A join may seem clean, but it will be slow and (currently) doesn't work in
 a cluster.

 You find all the sentences in a media set by searching for that set id and
 requesting only the sentence_id (yes, you need that). Then you reindex
 them. With small documents like this, it is probably fairly fast.

 If you can't estimate how often the media sets will change or the size of
 the changes, then you aren't ready to choose a design.

 wunder

 On Jul 29, 2013, at 8:41 AM, David Larochelle wrote:

  We'd like to be able to easily update the media set to source mapping.
 I'm
  concerned that if we store the media_sets_id in the sentence documents,
 it
  will be very difficult to add additional media set to source mapping. I
  imagine that adding a new media set would either require reimporting all
  600 million documents or writing complicated application logic to find
 out
  which sentences to update. Hence joins seem like a cleaner solution.
 
  --
 
  David
 
 
  On Mon, Jul 29, 2013 at 11:22 AM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
  Denormalize. Add media_set_id to each sentence document. Done.
 
  wunder
 
  On Jul 29, 2013, at 7:58 AM, David Larochelle wrote:
 
  I'm setting up SolrCloud with around 600 million documents. The basic
  structure of each document is:
 
  stories_id: integer, media_id: integer, sentence: text_en
 
  We have a number of stories from different media and we treat each
  sentence
  as a separate document because we need to run sentence level analytics.
 
  We also have a concept of groups or sets of sources. We've imported
 this
  media source to media sets mapping into Solr using the following
  structure:
 
  media_id_inner: integer, media_sets_id: integer
 
  For the single node case, we're able to filter our sources by
  media_set_id
  using a join query like the following:
 
 
 
 http://localhost:8983/solr/select?q={!join+from=media_id_inner+to=media_id}media_sets_id:1http://localhost:8983/solr/select?q=%7B!join+from=media_id_inner+to=media_id%7Dmedia_sets_id:1
  
 
 http://localhost:8983/solr/select?q=%7B!join+from=media_id_inner+to=media_id%7Dmedia_sets_id:1
 
 
  However, this does not work correctly with SolrCloud. The problem is
 that
  the join query is performed separately on each of the shards and no
 shard
  has the complete media set to source mapping data. So SolrCloud returns
  incomplete results.
 
  Since the complete media set to source mapping data is comparatively
  small
  (~50,000 rows), I would like to replicate it on every shard. So that
 the
  results of the individual join queries on separate shards would  be
  equivalent to performing the same query on a single shard system.
 
  However, I'm can't figure out how to replicate documents on separate
  shards. The compositeID router has the ability to colocate documents
  based
  on a prefix in the document ID but this isn't what I need. What I would
  like is some way to either have the media set to source data replicated
  on
  every shard or to be able to explicitly upload this data to the
  individual
  shards. (For the rest of the data I like the compositeID autorouting.)
 
  Any suggestions?
 
  --
 
  Thanks,
 
 
  David
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: FieldCollapsing issues in SolrCloud 4.4

2013-07-31 Thread Paul Masurel
If your issue is that you want to retrieve the number of groups,
group.ngroups will return the sum of the number of groups per shard.

This is not the number of groups overall as there if some groups are present
on more than one shard.

To make sure that this does not happen, one can choose to distribute
documents
so that all the documents with the same group key goes to the same shard.

(Disclaimer : Before doing so, you need to make sure that your documents
will still be spread
about equally.)

You can check out how to do that here
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud





On Wed, Jul 31, 2013 at 8:02 PM, Ali, Saqib docbook@gmail.com wrote:

 Hello Paul,

 Can you please explain what you mean by:
 To get the exact number of groups, you need to shard along your grouping
 field

 Thanks! :)


 On Wed, Jul 31, 2013 at 3:08 AM, Paul Masurel paul.masu...@gmail.com
 wrote:

  Do you mean you get different results with group=true?
  numFound is supposed returns the number of ungrouped hits.
 
  To get the number of groups, you are expected to set
  set group.ngroups=true.
  Even then, the result will only give you an upperbound
  in a distributed environment.
  To get the exact number of groups, you need to shard along
  your grouping field.
 
  If you have many groups, you may also experience a huge performance
  hit, as the current implementation has been heaviy optimized for low
  number of groups (e.g. e-commerce categories).
 
  Paul
 
 
 
  On Wed, Jul 31, 2013 at 1:59 AM, Ali, Saqib docbook@gmail.com
 wrote:
 
   Hello all,
  
   Is anyone experiencing issues with the numFound when using group=true
 in
   SolrCloud 4.4?
  
   Sometimes the results are off for us.
  
   I will post more details shortly.
  
   Thanks.
  
 
 
 
  --
  __
 
   Masurel Paul
   e-mail: paul.masu...@gmail.com
 




-- 
__

 Masurel Paul
 e-mail: paul.masu...@gmail.com


no servers hosting shard

2013-07-31 Thread smanad
I have setup solr cloud and when I try to access documents I get this error,

lst name=errorstr name=msgno servers hosting shard: /strint
name=code503/int/lst

However if I add shards=shard1 param it works.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/no-servers-hosting-shard-tp4081783.html
Sent from the Solr - User mailing list archive at Nabble.com.


debian package for solr with jetty

2013-07-31 Thread smanad
Hi, 

I am trying to create a debian package for solr 4.3 (default installation
with jetty). 
Is there anything already available?

Also, I need 3 different cores so plan to create corresponding packages for
each of them to create solr core using admin/cores or collections api. 

I also want to use, solrcloud setup with external zookeeper ensemble, whats
the best way to create a debian package for updating zookeeper config files
as well?

Please suggest. Any pointers will be helpful.

Thanks, 
-Manasi





--
View this message in context: 
http://lucene.472066.n3.nabble.com/debian-package-for-solr-with-jetty-tp4081784.html
Sent from the Solr - User mailing list archive at Nabble.com.


Proposal/request for comments: Solr schema annotation

2013-07-31 Thread Steve Rowe
In thinking about making the entire Solr schema REST-API-addressable 
(SOLR-4898), I'd like to be able to add arbitrary metadata at both the top 
level of the schema and at each leaf node, and allow read/write access to that 
metadata via the REST API.

Some uses I've thought of for such a facility: 

1. The managed schema now drops XML comments from schema.xml upon conversion to 
managed-schema format, but it would be much better if these were somehow 
preserved, as well as round-trippable when retrieving the schema and its 
constituents via the REST API.

2. Some comments in the example schemas don't refer to just one or to all leaf 
nodes, but rather to a group of them. I'd like to be able to group nodes by 
adding same-named tags to multiple nodes, and also have a top-level 
(optional) tag description - this description could then be presented with 
tagged nodes in various output formats.

3. Some comments in the example schema are documentation about a feature, e.g. 
copyFields.  A top-level documentation annotation could take a leaf node 
element name (or maybe an XPath? probably overkill) and apply to all matching 
elements. 

4. When modifying the schema via REST API, a last-modified annotation could 
be automatically added.

5. There were a couple of user complaints recently when schema.xml parsing was 
tightened to disallow unknown attributes on field declarations (SOLR-4641): 
people were storing their own information there.  User-level metadata would 
support this in a round-trippable way - I'm thinking we could restrict it to 
flat string-typed key/value pairs, with no nested structure.

W3C XML Schema has a similar facility: 
http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#element-annotation.

Thoughts?

Some concrete examples of what I'm thinking of in schema.xml format 
(syntax/naming as yet unsettled):

schema name=example version=1.5
  annotation
description element=tag content=plain-numeric-field-types
  Plain numeric field types store and index the text value verbatim.
/description
documentation element=copyField
  copyField commands copy one field to another at the time a document
  is added to the index.  It's used either to index the same field 
differently,
  or to add multiple fields to the same field for easier/faster searching.
/documentation
last-modified2014-03-08T12:14:02Z/last-modified
…
  /annotation
…
  fieldType name=pint class=solr.IntField
annotation
  tagplain-numeric-field-types/tag
/annotation
  /fieldType
  fieldType name=plong class=solr.LongField
annotation
  tagplain-numeric-field-types/tag
/annotation
  /fieldType
  …
  copyField source=cat dest=text
annotation
  todoShould this field really be copied to the catchall text 
field?/todo
/annotation
  /copyField
  …
  field name=text type=text_general
annotation
  descriptioncatchall field/description
  visibilitypublic/visibility
/annotation
  /field



Re: Proposal/request for comments: Solr schema annotation

2013-07-31 Thread Walter Underwood
An annotation field would be much better than the current anything goes 
schema-less schema.xml.

Has anyone built an XML Schema for schema.xml? I know it is extensible, but it 
would be worth a try.

wunder

On Jul 31, 2013, at 6:21 PM, Steve Rowe wrote:

 In thinking about making the entire Solr schema REST-API-addressable 
 (SOLR-4898), I'd like to be able to add arbitrary metadata at both the top 
 level of the schema and at each leaf node, and allow read/write access to 
 that metadata via the REST API.
 
 Some uses I've thought of for such a facility: 
 
 1. The managed schema now drops XML comments from schema.xml upon conversion 
 to managed-schema format, but it would be much better if these were somehow 
 preserved, as well as round-trippable when retrieving the schema and its 
 constituents via the REST API.
 
 2. Some comments in the example schemas don't refer to just one or to all 
 leaf nodes, but rather to a group of them. I'd like to be able to group nodes 
 by adding same-named tags to multiple nodes, and also have a top-level 
 (optional) tag description - this description could then be presented with 
 tagged nodes in various output formats.
 
 3. Some comments in the example schema are documentation about a feature, 
 e.g. copyFields.  A top-level documentation annotation could take a leaf 
 node element name (or maybe an XPath? probably overkill) and apply to all 
 matching elements. 
 
 4. When modifying the schema via REST API, a last-modified annotation could 
 be automatically added.
 
 5. There were a couple of user complaints recently when schema.xml parsing 
 was tightened to disallow unknown attributes on field declarations 
 (SOLR-4641): people were storing their own information there.  User-level 
 metadata would support this in a round-trippable way - I'm thinking we could 
 restrict it to flat string-typed key/value pairs, with no nested structure.
 
 W3C XML Schema has a similar facility: 
 http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#element-annotation.
 
 Thoughts?
 
 Some concrete examples of what I'm thinking of in schema.xml format 
 (syntax/naming as yet unsettled):
 
 schema name=example version=1.5
  annotation
description element=tag content=plain-numeric-field-types
  Plain numeric field types store and index the text value verbatim.
/description
documentation element=copyField
  copyField commands copy one field to another at the time a document
  is added to the index.  It's used either to index the same field 
 differently,
  or to add multiple fields to the same field for easier/faster searching.
/documentation
last-modified2014-03-08T12:14:02Z/last-modified
…
  /annotation
 …
  fieldType name=pint class=solr.IntField
annotation
  tagplain-numeric-field-types/tag
/annotation
  /fieldType
  fieldType name=plong class=solr.LongField
annotation
  tagplain-numeric-field-types/tag
/annotation
  /fieldType
  …
  copyField source=cat dest=text
annotation
  todoShould this field really be copied to the catchall text 
 field?/todo
/annotation
  /copyField
  …
  field name=text type=text_general
annotation
  descriptioncatchall field/description
  visibilitypublic/visibility
/annotation
  /field
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

--
Walter Underwood
wun...@wunderwood.org