date:20130731

Hi All

Currently I am in a mid of a project which Index some data to Solrs
multiple instance.

I have the Configuration as, on the same machine I have made multiple
instances of Solr

http://localhost:8080/solr1
http://localhost:8080/solr2
http://localhost:8080/solr3
http://localhost:8080/solr4
http://localhost:8080/solr5
http://localhost:8080/solr6

Now when I am posting the Data to Solr through SimplePostTool by passing a
xml file in spt.postFile(file) method and committing it there after.

This all process is Multithreaded and works fine till 1 Million of data
record but there after it suddenly stops saying,

*SimplePostTool: FATAL: Solr returned an error #400 Bad Request*
*
*
in the Tomcat Catalina I found

*WARNING: Failed to register info bean: searcher*
*javax.management.InstanceAlreadyExistsException:
solr/:type=searcher,id=org.apache.solr.search.SolrIndexSearcher*
* at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)*
* at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
*
* at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
*
* at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
*
* at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
*
* at
com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:513)
*
* at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:141)*
* at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)*
* at
org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:220)
*
* at org.apache.solr.core.SolrCore.registerSearcher(SolrCore.java:1349)*
* at org.apache.solr.core.SolrCore.access$000(SolrCore.java:84)*
* at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1247)*
* at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)*
* at java.util.concurrent.FutureTask.run(FutureTask.java:166)*
* at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
*
* at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
*
* at java.lang.Thread.run(Thread.java:722)*
*
*
*Jul 31, 2013 12:46:00 PM org.apache.solr.core.SolrCore registerSearcher*
*INFO: [] Registered new searcher Searcher@5fa1891b main*
*Jul 31, 2013 12:46:00 PM org.apache.solr.search.SolrIndexSearcher close*

Has anybody traced such issue. Please this is really very Urgent and
Important. Waiting for your response.

Thanks and Regards
Vineet

Re: EmbeddedSolrServer Solr 4.4.0 bug?

2013-07-31 Thread Alan Woodward

Hi Luis,

You need to call coreContainer.load() after construction for it to load the 
cores.  Previously the CoreContainer(solrHome, configFile) constructor also 
called load(), but this was the only constructor to do that.

I probably need to put something in CHANGES.txt to point this out...

Alan Woodward
www.flax.co.uk


On 31 Jul 2013, at 08:53, Luis Cappa Banda wrote:

 Hello guys,
 
 Since I upgrade from 4.1.0 to 4.4.0 version I've noticed that
 EmbeddedSolrServer has changed a little the way of construction:
 
 *Solr 4.1.0 style:*
 
 CoreContainer coreContainer = new CoreContainer(*solrHome, new
 File(solrHome+/solr.xml*));
 EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer,
 core);
 
 *Solr 4.4.0 new style:
 *
 
 CoreContainer coreContainer = new CoreContainer(*solrHome*);
 EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer,
 core);
 
 
 However, it's not working. I've got the following solr.xml configuration
 file:
 
 *cores adminPath=/admin/cores defaultCoreName=core host=${host:}
 hostPort=${jetty.port:8983} hostContext=${hostContext:solr}
 zkClientTimeout=${zkClientTimeout:15000}
 *
 *core name=core instanceDir=core /*
 *  /cores  *
 */solr*
 
 
 And resources appears to be loaded correctly:
 
 *2013-07-31 09:46:37,583 47889 [main] INFO  org.apache.solr.core.ConfigSolr
 - Loading container configuration from /opt/solr/solr.xml*
 
 
 But when indexing into core with coreName 'core', it throws an Exception:
 
 *2013-07-31 09:50:49,409 5189 [main] ERROR
 com.buguroo.solr.index.WriteIndex  - No such core: core*
 
 Or I am sleppy, something that's possible, or there is some kind of bug
 here.
 
 Best regards,
 
 -- 
 - Luis Cappa

result grouping and paging, solr 4.21

2013-07-31 Thread Gunnar


Hello,

I'm trying to page results with grouping /field collapsing. My query is:

?q=myKeywordsstart=0rows=100group=truegroup.field=myGroupFieldgroup.format=simplegroup.limit=1

The result will contain 70 groups, is there a way to get 100 records 
returned, means 70 from each group first doc and second docs

from the first 30 groups?

Thanks,

Gunnar

Re: EmbeddedSolrServer Solr 4.4.0 bug?

2013-07-31 Thread Luis Cappa Banda

Thank you very much, Alan. Now it's working! I agree with you: this kind of
things should be documented at least in CHANGELOG.txt, because when
upgrading from one version to another all should be compatible between
versions, but this is not the case, thus people should be noticed of that.

Regards,


2013/7/31 Alan Woodward a...@flax.co.uk

 Hi Luis,

 You need to call coreContainer.load() after construction for it to load
 the cores.  Previously the CoreContainer(solrHome, configFile) constructor
 also called load(), but this was the only constructor to do that.

 I probably need to put something in CHANGES.txt to point this out...

 Alan Woodward
 www.flax.co.uk


 On 31 Jul 2013, at 08:53, Luis Cappa Banda wrote:

  Hello guys,
 
  Since I upgrade from 4.1.0 to 4.4.0 version I've noticed that
  EmbeddedSolrServer has changed a little the way of construction:
 
  *Solr 4.1.0 style:*
 
  CoreContainer coreContainer = new CoreContainer(*solrHome, new
  File(solrHome+/solr.xml*));
  EmbeddedSolrServer localSolrServer = new
 EmbeddedSolrServer(coreContainer,
  core);
 
  *Solr 4.4.0 new style:
  *
 
  CoreContainer coreContainer = new CoreContainer(*solrHome*);
  EmbeddedSolrServer localSolrServer = new
 EmbeddedSolrServer(coreContainer,
  core);
 
 
  However, it's not working. I've got the following solr.xml configuration
  file:
 
  *cores adminPath=/admin/cores defaultCoreName=core host=${host:}
  hostPort=${jetty.port:8983} hostContext=${hostContext:solr}
  zkClientTimeout=${zkClientTimeout:15000}
  *
  *core name=core instanceDir=core /*
  *  /cores  *
  */solr*
 
 
  And resources appears to be loaded correctly:
 
  *2013-07-31 09:46:37,583 47889 [main] INFO
  org.apache.solr.core.ConfigSolr
  - Loading container configuration from /opt/solr/solr.xml*
 
 
  But when indexing into core with coreName 'core', it throws an Exception:
 
  *2013-07-31 09:50:49,409 5189 [main] ERROR
  com.buguroo.solr.index.WriteIndex  - No such core: core*
 
  Or I am sleppy, something that's possible, or there is some kind of bug
  here.
 
  Best regards,
 
  --
  - Luis Cappa




-- 
- Luis Cappa

Solr PolyField

Hi, I'm trying to create a field with multiple fields inside, that is:

origin: {
htmlUrl: http://www.gazzetta.it/;,
streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
title: Gazzetta.it
},

Get something like this. Is that possible? I'm using Solr 4.4.0.

Thanks

smime.p7s
Description: S/MIME cryptographic signature

Sharding with a SolrCloud

2013-07-31 Thread Oliver Goldschmidt

Hi list,

I have a Solr server, which uses sharding to make distributed search
with another Solr server. The other Solr server now migrates to a Solr
Cloud system. I've been trying recently to continue searching the Solr
Cloud as a shard for my Solr server, but this is failing with mysterious
effects. I am getting a result with a number of hits, when I perform a
search, but the results are not displayed at all. This is the resonse
header I am getting from Solr:

{
  responseHeader:{
status:0,
QTime:305,
params:{
  facet:true,
  indent:yes,
  facet.mincount:1,
  facet.limit:30,
  qf:title_short^750 title_full_unstemmed^600,
  json.nl:arrarr,
  wt:json,
  rows:20,
  shards:ourindex.nowhere.de/solr/index,
  bq:format:Book^500,
  fl:*,score,
  facet.sort:count,
  start:0,
  q:xml,
  shards.info:true,
  facet.prefix:,
  facet.field:[publishDate],
  qt:dismax}},
  shards.info:{
ourindex.nowhere.de/solr/index:{
  numFound:10076,
  maxScore:8.507474,
  time:263}},
  response:{numFound:10056,start:0,maxScore:8.507474,docs:[]
  }

As you can see, there are no docs in the result. This result is not 100%
reproducable: sometimes I get no results displayed, other times it works
(with the same query URL!). As you also can see in the result, the
number of hits in the response is a little bit less than the number of
hits sent from the shard.

This makes me wonder if it is not possible to use a Solr Cloud as a
shard for another standalone Solr server?

Any hint is appreciated!

Best
- Oliver

-- 
Oliver Goldschmidt
TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
Denickestr. 22
21071 Hamburg - Harburg
Tel.+49 (0)40 / 428 78 - 32 91
eMail   o.goldschm...@tuhh.de
--
GPG/PGP-Schlüssel: 
http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc

Solr show total row count in response of full import

2013-07-31 Thread Sandro Zbinden

Hey there

Is there a way to show the total row count (documents that will be inserted) 
when executing a full import through the Data Import Request handler ?

Currently after executing a full import and pointing to solrcore/dataimport 
you can get  the total rows processed

str name=Total Documents Processed6354/str

It would be nice if you could receive a total row count like

str name=Total Documents10100/str

With this information we could add another information like

str name=Imported in Percent 62.91/str

This would make it easier to generate a progress bar for the end user.


Best regards

Sandro Zbinden

Re: Negative Query Behaviour in Solr 3.2

2013-07-31 Thread Mikhail Khludnev

Can you try:

q=+name:memory -name:encoded
or
q=name:memory AND -name:encoded



On Wed, Jul 31, 2013 at 10:14 AM, karanjindal 
karan_jin...@students.iiit.ac.in wrote:

 Hi All,

 I am using solr 3.2 and confused how a particular query is executed.
 q=name:memory OR -name:encoded
 separately firing q=name:memory gives 3 results
 and q=-name:encoded gives 25 results and result sets are disjoint sets.

 Since I am doing OR query it should return 28 results, but it is only
 returning 3 results same as query (name:memory).

 Can anyone explain?

 -Karan




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Negative-Query-Behaviour-in-Solr-3-2-tp4081538.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

TrieField and FieldCache confusion

2013-07-31 Thread Paul Masurel

Hello everyone,

I have a question about Solr TrieField and Lucene FieldCache.

From my understanding, Solr added the implementation of TrieField to
perform faster range queries.
For each value it will index multiple terms. The n-th term being a masked
version of our value,
showing only it first (precisionStep * n) bits.

When uninverting the field to populate a FieldCache, the last value with
regard
to the lexicographical order will be retained ; which from my understanding
should
be the term with the highest precision.

Can I expect the FieldCache of Lucene to return the correct values when
working
with TrieField with the precisionStep higher than 0. If not, what did I get
wrong?

Regards,

Paul Masurel
e-mail: paul.masu...@gmail.com

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-31 Thread Mikhail Khludnev

fwiw,

this code won't capture uncommitted duplicates.


On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen dotanco...@gmail.com wrote:

 On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
 j...@basetechnology.com wrote:
  The Solr SignatureUpdateProcessorFactory is designed to facilitate
 dedupe...
  any particular reason you did not use it?
 
  See:
  http://wiki.apache.org/solr/Deduplication
 
  and
 
  https://cwiki.apache.org/confluence/display/solr/De-Duplication
 

 Actually, the guy who made the changes (a coworker) did in fact write
 an alternative UpdateHandler. I've just noticed that there are a bunch
 of dupes right now, though.

 public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 {

 public DiscoAPIUpdateHandler(SolrCore core) {
 super(core);
 }

 @Override
 public int  addDoc(AddUpdateCommand cmd) throws IOException{

 // if overwrite is set to false we'll use the
 DefaultUpdateHandler2 , this is done for debugging to insert
 duplicates to solr
 if (!cmd.overwrite) return super.addDoc(cmd);


 // when using ref counted objects you have!! to decrement the
 ref count when your done
 RefCountedSolrIndexSearcher indexSearcher =
 this.core.getNewestSearcher(false);

 // the idea is like this we'll make an internal lucene query
 and check if that id already exists

 Term updateTerm = null;


 if (cmd.updateTerm != null){
 updateTerm = cmd.updateTerm;
 } else {
 updateTerm = new Term(id,cmd.getIndexedId());
 }


 Query query = new TermQuery(updateTerm);
 TopDocs docs = indexSearcher.get().search(query,2);

 if (docs.totalHits0){
 // index searcher is no longer needed
 indexSearcher.decref();
 // don't add the new document
 return 0;
 }

 // index searcher is no longer needed
 indexSearcher.decref();

 // if i'm here then it's a new document
 return super.addDoc(cmd);

 }

 }


  And I give a bunch of examples in my book.
 

 I anticipate the book with esteem!

 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: Performance question on Spatial Search

2013-07-31 Thread Mikhail Khludnev

On Wed, Jul 31, 2013 at 1:10 AM, Steven Bower sbo...@alcyon.net wrote:


 not sure what you mean by good hit raitio?


I mean such queries are really expensive (even on cache hit), so if the
list of ids changes every time, it never hit cache and hence executes these
heavy queries every time. It's well known performance problem.


 Here are the stacks...

they seems like hotspots, and shows index reading that's reasonable. But I
can't see what caused these readings, to get that I need whole stack of hot
thread.



   Name Time (ms) Own Time (ms)

 org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext,
 Bits) 300879 203478

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc()
 45539 19

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs()
 45519 40

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readVIntBlock(IndexInput,
 int[], int[], int, boolean) 24352 0
 org.apache.lucene.store.DataInput.readVInt() 24352 24352
 org.apache.lucene.codecs.lucene41.ForUtil.readBlock(IndexInput, byte[],
 int[]) 21126 14976
 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
 6150 0  java.nio.DirectByteBuffer.get(byte[], int, int)
 6150 0
 java.nio.Bits.copyToArray(long, Object, long, long, long) 6150 6150

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(Bits,
 DocsEnum, int) 35342 421

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.decodeMetaData()
 34920 27939

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.nextTerm(FieldInfo,
 BlockTermState) 6980 6980

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next()
 14129 1053

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadNextFloorBlock()
 5948 261

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
 5686 199
 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
 3606 0  java.nio.DirectByteBuffer.get(byte[], int, int)
 3606 0
 java.nio.Bits.copyToArray(long, Object, long, long, long) 3606 3606

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput,
 FieldInfo, BlockTermState) 1879 80
 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
 1798 0java.nio.DirectByteBuffer.get(byte[], int, int)
 1798 0
 java.nio.Bits.copyToArray(long, Object, long, long, long) 1798 1798

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next()
 4010 3324

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextNonLeaf()
 685 685

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
 3117 144
 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
 1861 0java.nio.DirectByteBuffer.get(byte[], int, int) 1861
 0
 java.nio.Bits.copyToArray(long, Object, long, long, long) 1861 1861

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput,
 FieldInfo, BlockTermState) 1090 19
 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
 1070 0  java.nio.DirectByteBuffer.get(byte[], int, int)
 1070 0
 java.nio.Bits.copyToArray(long, Object, long, long, long) 1070 1070

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.initIndexInput()
 20 0org.apache.lucene.store.ByteBufferIndexInput.clone()
 20 0
 org.apache.lucene.store.ByteBufferIndexInput.clone() 20 0
 org.apache.lucene.store.ByteBufferIndexInput.buildSlice(long, long) 20
 0
 org.apache.lucene.util.WeakIdentityMap.put(Object, Object) 20 0
 org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference.init(Object,
 ReferenceQueue) 20 0
 java.lang.System.identityHashCode(Object) 20 20
 org.apache.lucene.index.FilteredTermsEnum.docs(Bits, DocsEnum, int)
 1485 527

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(Bits,
 DocsEnum, int) 957 0

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.decodeMetaData()
 957 513

 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.nextTerm(FieldInfo,
 BlockTermState) 443 443
 org.apache.lucene.index.FilteredTermsEnum.next() 874 324

 org.apache.lucene.search.NumericRangeQuery$NumericRangeTermsEnum.accept(BytesRef)
 368 0

 org.apache.lucene.util.BytesRef$UTF8SortedAsUnicodeComparator.compare(Object,
 Object) 368 368

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next()
 160 0

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadNextFloorBlock()
 160 0

 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
 160 0
 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int)
 120
 0

Re: Improper shutdown of Solr in Jetty 9

2013-07-31 Thread Artem Karpenko


Hello Dmitry,

it's Windows 7. I'm starting Jetty with java -jar start.jar

31.07.2013 12:36, Dmitry Kan пишет:

Artem,

Whats the OS are using?
So far jetty 9 with solr 4.3.1 works ok under ubuntu 12.04.
On 30 Jul 2013 17:23, Alexandre Rafalovitch arafa...@gmail.com wrote:


Of course, I meant Jetty (not Tomcat). So apologies for spam and confusion
of my own. The rest of the statement stands.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Jul 30, 2013 at 10:20 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:


Thanks for letting us know. See if you can add it to the documentation
somewhere.

Solr is not using Tomcat 9, but I believe that was primarily because
Tomcat 9 requires Java 7 and Solr 4.x is staying with Java 6 as minimum
requirement.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Jul 30, 2013 at 10:09 AM, Artem Karpenko a.karpe...@oxseed.com
wrote:


Uh, sorry for spamming, but if anyone interested there is a way to
properly shutdown Jetty when it's launched with --exec flag.
You can use JMX to invoke method stop() on the Jetty's Server MBean.

This

triggers a proper shutdown with all Solr's close() callbacks executed.
I wonder why it's not noted at least in Jetty documentation.

Regards,
Artem Karpenko.

30.07.2013 16:58, Artem Karpenko пишет:

  After some investigation I found that the problem is not with Jetty's

version but usage of --exec flag.
Namely, when --exec is used (to specify JVM args) then shutdown is not
graceful, it seems that Java process that is just killed.
Not sure how to handle this...

Regards,
Artem Karpenko.

29.07.2013 16:51, Artem Karpenko пишет:


Hi,

I can't make Solr shut down properly when using Jetty 9. Tested this
with a simple plugin that only extends DirectUpdateHandler2, creates a
file in constructor and deletes it in close(). While it's working fine
in the example installation (the one that can be downloaded from Solr
site) and in the simple custom installation with Jetty 8, it won't in
Jetty 9. There is not much logging at shutdown at all, just Jetty's
closing selector or smth., unlike with Jetty 8 where it prints

various

Graceful shutdown messages from Solr.

Installation procedure I used for both Jettys is rather simple: just

put

solr.war into webapps/ directory, plugin JAR into {core}/lib/ and
configure update handler in solrconfig.xml.
OS is Windows 7, Solr 4.4.
I tried to stop Jetty with both Ctrl+C and java start.jar [port/key
params] --stop. For Jetty 8 it works fine even with Ctrl+C.

Did anybody stumble on this issue?

Best regards,
Artem.

Re: Trying to determine the benefit of spellcheck-based suggester vs. using terms component?

The biggest thing is that the spellchecker has lots of knobs
to tune, all the stuff in
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate

TermsComponent, on the other hand, just gives you
what's in the index with essentially no knobs to tune.

So it depends on your goal. Typeahead or spelling
correction? In the first case I'd go for TermsComponent
and the second spell check as an example.

Best
Erick

On Tue, Jul 30, 2013 at 2:07 PM, Timothy Potter thelabd...@gmail.com wrote:
 Going over the comments in SOLR-1316, I seemed to have lost the
 forrest for the trees. What is the benefit of using the spellcheck
 based suggester over something like the terms component to get
 suggestions as the user types?

 Maybe it is faster because it builds the in-memory data structure on
 commit? Seems like the terms component is pretty fast too.

 I'd appreciate any additional insights about this. There are so many
 solutions to auto-suggest for Solr, it's hard to decide what
 approach to take.

 Cheers,
 Tim

Re: SimplePostTool: FATAL: Solr returned an error #400 Bad Request

Probably not the root of your problem, but
bq: and committing it there after.

Does that mean you're calling  commit after every
document? This is usually poor practice, I'd set
the autocommit intervals on solrconfig.xml and NOT
call commit explicitly.

Does the same document fail every time? What does
it look like?

You really haven't provided much information
to go on.

Best
Erick

On Wed, Jul 31, 2013 at 3:55 AM, Vineet Mishra clearmido...@gmail.com wrote:
 Hi All

 Currently I am in a mid of a project which Index some data to Solrs
 multiple instance.

 I have the Configuration as, on the same machine I have made multiple
 instances of Solr

 http://localhost:8080/solr1
 http://localhost:8080/solr2
 http://localhost:8080/solr3
 http://localhost:8080/solr4
 http://localhost:8080/solr5
 http://localhost:8080/solr6

 Now when I am posting the Data to Solr through SimplePostTool by passing a
 xml file in spt.postFile(file) method and committing it there after.

 This all process is Multithreaded and works fine till 1 Million of data
 record but there after it suddenly stops saying,

 *SimplePostTool: FATAL: Solr returned an error #400 Bad Request*
 *
 *
 in the Tomcat Catalina I found

 *WARNING: Failed to register info bean: searcher*
 *javax.management.InstanceAlreadyExistsException:
 solr/:type=searcher,id=org.apache.solr.search.SolrIndexSearcher*
 * at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)*
 * at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
 *
 * at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
 *
 * at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
 *
 * at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
 *
 * at
 com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:513)
 *
 * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:141)*
 * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)*
 * at
 org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:220)
 *
 * at org.apache.solr.core.SolrCore.registerSearcher(SolrCore.java:1349)*
 * at org.apache.solr.core.SolrCore.access$000(SolrCore.java:84)*
 * at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1247)*
 * at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)*
 * at java.util.concurrent.FutureTask.run(FutureTask.java:166)*
 * at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 *
 * at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 *
 * at java.lang.Thread.run(Thread.java:722)*
 *
 *
 *Jul 31, 2013 12:46:00 PM org.apache.solr.core.SolrCore registerSearcher*
 *INFO: [] Registered new searcher Searcher@5fa1891b main*
 *Jul 31, 2013 12:46:00 PM org.apache.solr.search.SolrIndexSearcher close*

 Has anybody traced such issue. Please this is really very Urgent and
 Important. Waiting for your response.

 Thanks and Regards
 Vineet

Re: result grouping and paging, solr 4.21

Not that I know of. Grouping pretty much treats all groups the same...

Best
Erick

On Wed, Jul 31, 2013 at 4:14 AM, Gunnar glus...@akitogo.com wrote:
 Hello,

 I'm trying to page results with grouping /field collapsing. My query is:

 ?q=myKeywordsstart=0rows=100group=truegroup.field=myGroupFieldgroup.format=simplegroup.limit=1

 The result will contain 70 groups, is there a way to get 100 records
 returned, means 70 from each group first doc and second docs
 from the first 30 groups?

 Thanks,

 Gunnar

Working with solr over two different db schemas

2013-07-31 Thread Mysurf Mail

Been working on it for quitre some time.

this is my config



dataConfig
dataSource type=JdbcDataSource name=ds1
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://...:1433;databaseName=A
user=XX password=XX /
document

  entity name=PackageVersion pk=PackageVersionId
query= /*PackageVersion.Query*/ select PackageVersion.Id PackageVersionId,
PackageVersion.VersionNumber, CONVERT(char(19),
PackageVersion.LastModificationTime ,126) + 'Z' LastModificationTime,
Package.Id PackageId, Package.Name
PackageName, PackageVersion.Comments PackageVersionComments,
Package.CreatedBy CreatedBy
from [dbo].[Package] Package inner join [dbo].[PackageVersion]
PackageVersion on Package.Id = PackageVersion.PackageId
where Package.RecordStatusId=0 and PackageVersion.RecordStatusId=0
 entity name=PackageTag pk=ResourceId
processor=CachedSqlEntityProcessor cacheKey=ResourceId
cacheLookup=PackageVersion.PackageId
query=/*PackageTag.Query*/
select ResourceId,[Text] PackageTag
from [dbo].[Tag] Tag
Where ResourceType = 0/
/entity
  /document
/dataConfig

Now, this runs in my test env and the only thing I do is change the
configuration to another db( and as a result also the schema name from
[dbo] to another )
This result in a totally different behavior.
In the first configuration the selects were done be this order - inner
object and then outer object. which means that the cache works.
In the second configuration - over the other db the order was first the
outer and then the inner. cache did not work at all.
the inner query is not stored at all.

What could be the problem?

queryResultCache showing all zeros

Hi,

We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran about 200 
000 queries taken from our production environment and measured the performance 
of the cloud over a collection of 14M documents with the default Solr settings. 
We are now trying to tune the different caches and when I look at each node of 
the cloud, all of them are showing no activity (see below) regarding the 
queryResultCache... all other caches are showing some activity. Any idea what 
could cause this?


  *

org.apache.solr.search.LRUCache
  *

version:
1.0
  *

description:
LRU Cache(maxSize=512, initialSize=512)
  *

src:
$URL: 
https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_3/?solr/?core/?src/?java/?org/?apache/?solr/?search/?LRUCache.javahttps://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java
 $
  *   stats:
 *

lookups:
0
 *

hits:
0
 *

hitratio:
0.00
 *

inserts:
0
 *

evictions:
0
 *

size:
0
 *

warmupTime:
0
 *

cumulative_lookups:
0
 *

cumulative_hits:
0
 *

cumulative_hitratio:
0.00
 *

cumulative_inserts:
0
 *

cumulative_evictions:
0

Re: Negative Query Behaviour in Solr 3.2

Since there are no parentheses, the terms and operators are all at the save 
level and the OR is essentially a redundant operator and ignored, so:


q=name:memory OR -name:encoded

is treated as:

q=name:memory -name:encoded

When what you probably want is:

q=name:memory OR (-name:encoded)

BUT... a bug/deficiency prevents Solr from handling pure-negative 
sub-queries properly, so you have to add a *:*:


q=name:memory OR (*:* -name:encoded)

So that reads ... or any documents that do not contain encoded in the name 
field, which is equivalent to ... or all documents except those that have 
encoded in the name field.


-- Jack Krupansky

-Original Message- 
From: karanjindal

Sent: Wednesday, July 31, 2013 2:14 AM
To: solr-user@lucene.apache.org
Subject: Negative Query Behaviour in Solr 3.2

Hi All,

I am using solr 3.2 and confused how a particular query is executed.
q=name:memory OR -name:encoded
separately firing q=name:memory gives 3 results
and q=-name:encoded gives 25 results and result sets are disjoint sets.

Since I am doing OR query it should return 28 results, but it is only
returning 3 results same as query (name:memory).

Can anyone explain?

-Karan




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Negative-Query-Behaviour-in-Solr-3-2-tp4081538.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SimplePostTool: FATAL: Solr returned an error #400 Bad Request

I got it resolved, actually the error trace was even more above this one.
It was just that the posting XML was not forming properly for the Solr
field *Date* which usually takes the format

*2006-07-15T22:18:48Z*
*
*
This is the standard format for the Solr date(datatype) which follows
specifically some of the pattern mentioned.


   - 1995-12-31T23:59:59Z
   - 1995-12-31T23:59:59.9Z
   - 1995-12-31T23:59:59.99Z
   - 1995-12-31T23:59:59.999Z


As documented by Solr http://www.meticent.com/DAt( *www.meticent.com/DAt*)

By the way thanks!
Vineet


On Wed, Jul 31, 2013 at 4:47 PM, Erick Erickson erickerick...@gmail.comwrote:

 Probably not the root of your problem, but
 bq: and committing it there after.

 Does that mean you're calling  commit after every
 document? This is usually poor practice, I'd set
 the autocommit intervals on solrconfig.xml and NOT
 call commit explicitly.

 Does the same document fail every time? What does
 it look like?

 You really haven't provided much information
 to go on.

 Best
 Erick

 On Wed, Jul 31, 2013 at 3:55 AM, Vineet Mishra clearmido...@gmail.com
 wrote:
  Hi All
 
  Currently I am in a mid of a project which Index some data to Solrs
  multiple instance.
 
  I have the Configuration as, on the same machine I have made multiple
  instances of Solr
 
  http://localhost:8080/solr1
  http://localhost:8080/solr2
  http://localhost:8080/solr3
  http://localhost:8080/solr4
  http://localhost:8080/solr5
  http://localhost:8080/solr6
 
  Now when I am posting the Data to Solr through SimplePostTool by passing
 a
  xml file in spt.postFile(file) method and committing it there after.
 
  This all process is Multithreaded and works fine till 1 Million of data
  record but there after it suddenly stops saying,
 
  *SimplePostTool: FATAL: Solr returned an error #400 Bad Request*
  *
  *
  in the Tomcat Catalina I found
 
  *WARNING: Failed to register info bean: searcher*
  *javax.management.InstanceAlreadyExistsException:
  solr/:type=searcher,id=org.apache.solr.search.SolrIndexSearcher*
  * at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)*
  * at
 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
  *
  * at
 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
  *
  * at
 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
  *
  * at
 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
  *
  * at
 
 com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:513)
  *
  * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:141)*
  * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)*
  * at
 
 org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:220)
  *
  * at org.apache.solr.core.SolrCore.registerSearcher(SolrCore.java:1349)*
  * at org.apache.solr.core.SolrCore.access$000(SolrCore.java:84)*
  * at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1247)*
  * at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)*
  * at java.util.concurrent.FutureTask.run(FutureTask.java:166)*
  * at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  *
  * at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  *
  * at java.lang.Thread.run(Thread.java:722)*
  *
  *
  *Jul 31, 2013 12:46:00 PM org.apache.solr.core.SolrCore registerSearcher*
  *INFO: [] Registered new searcher Searcher@5fa1891b main*
  *Jul 31, 2013 12:46:00 PM org.apache.solr.search.SolrIndexSearcher close*
 
  Has anybody traced such issue. Please this is really very Urgent and
  Important. Waiting for your response.
 
  Thanks and Regards
  Vineet

Unexpected character '' (code 60) expected '='

Hi All

I am currently stuck in a Solr Issue while Posting some data to Solr Server.

I have some record from Hbase which I am posting to Solr, but after posting
some 1 Million of data records, it suddenly stopped. Checking the Catalina
log trace it showed,

*org.apache.solr.common.SolrException: Unexpected character '' (code 60)
expected '='*
*
*
*
*
I am not sure whether its the issue with some malformed data for the
posting, because whatever xml file which I am generating before posting I
have tried posting that specific file to the solr and its going well.

Below is the whole log trace,


*SEVERE: org.apache.solr.common.SolrException: Unexpected character ''
(code 60) expected '='*
* at [row,col {unknown-source}]: [20281,18]*
* at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)*
* at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
*
* at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
*
* at org.apache.solr.core.SolrCore.execute(SolrCore.java:1398)*
* at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
*
* at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
*
* at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
*
* at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
*
* at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
*
* at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
*
* at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
*
* at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
*
* at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
*
* at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)*
* at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)*
* at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
*
* at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
*
* at java.lang.Thread.run(Thread.java:722)*
*Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
character '' (code 60) expected '='*
* at [row,col {unknown-source}]: [20281,18]*
* at
com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)*
* at
com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3001)
*
* at
com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
*
* at
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)*
* at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)*
* at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:295)*
* at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)*
* at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)*
* ... 17 more*
*
*
Has anybody faced this issue.

Thanks and Regards
Vineet

RE: Unexpected character '' (code 60) expected '='

2013-07-31 Thread Markus Jelsma

This file is malformed:

*SEVERE: org.apache.solr.common.SolrException: Unexpected character ''
(code 60) expected '='*
* at [row,col {unknown-source}]: [20281,18]*

Check row 20281 column 18
 
 
-Original message-
 From:Vineet Mishra clearmido...@gmail.com
 Sent: Wednesday 31st July 2013 15:05
 To: solr-user@lucene.apache.org
 Subject: Unexpected character 'lt;' (code 60) expected '='
 
 Hi All
 
 I am currently stuck in a Solr Issue while Posting some data to Solr Server.
 
 I have some record from Hbase which I am posting to Solr, but after posting
 some 1 Million of data records, it suddenly stopped. Checking the Catalina
 log trace it showed,
 
 *org.apache.solr.common.SolrException: Unexpected character '' (code 60)
 expected '='*
 *
 *
 *
 *
 I am not sure whether its the issue with some malformed data for the
 posting, because whatever xml file which I am generating before posting I
 have tried posting that specific file to the solr and its going well.
 
 Below is the whole log trace,
 
 
 *SEVERE: org.apache.solr.common.SolrException: Unexpected character ''
 (code 60) expected '='*
 * at [row,col {unknown-source}]: [20281,18]*
 * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)*
 * at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
 *
 * at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 *
 * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1398)*
 * at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 *
 * at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 *
 * at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 *
 * at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 *
 * at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 *
 * at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 *
 * at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 *
 * at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
 *
 * at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 *
 * at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)*
 * at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)*
 * at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
 *
 * at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 *
 * at java.lang.Thread.run(Thread.java:722)*
 *Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
 character '' (code 60) expected '='*
 * at [row,col {unknown-source}]: [20281,18]*
 * at
 com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)*
 * at
 com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3001)
 *
 * at
 com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
 *
 * at
 com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)*
 * at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)*
 * at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:295)*
 * at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)*
 * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)*
 * ... 17 more*
 *
 *
 Has anybody faced this issue.
 
 Thanks and Regards
 Vineet

RE: new field type - enum field

2013-07-31 Thread Elran Dvir

Hi,

I have managed to attach the patch in Jira.

Thanks.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, July 29, 2013 2:15 PM
To: solr-user@lucene.apache.org
Subject: Re: new field type - enum field

OK, if you can attach it to an e-mail, I'll attach it.

Just to check, though, make sure you're logged in. I've been fooled once or 
twice by being automatically signed out...

Erick

On Mon, Jul 29, 2013 at 3:17 AM, Elran Dvir elr...@checkpoint.com wrote:
 Thanks, Erick.

 I have tried it four times. It keeps failing.
 The problem reoccurred today.

 Thanks.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Monday, July 29, 2013 2:44 AM
 To: solr-user@lucene.apache.org
 Subject: Re: new field type - enum field

 You should be able to attach a patch, wonder if there was some temporary 
 glitch in the JIRA. Is this persisting.

 Let us know if this continues...

 Erick

 On Sun, Jul 28, 2013 at 12:11 PM, Elran Dvir elr...@checkpoint.com wrote:
 Hi,

 I have created an issue:
 https://issues.apache.org/jira/browse/SOLR-5084
 I tried to attach my patch, but it failed:  Cannot attach file 
 Solr-5084.patch: Unable to communicate with JIRA.
 What am I doing wrong?

 Thanks.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Thursday, July 25, 2013 3:25 PM
 To: solr-user@lucene.apache.org
 Subject: Re: new field type - enum field

 Start here: http://wiki.apache.org/solr/HowToContribute

 Then, when your patch is ready submit a JIRA and attach your patch. Then 
 nudge (gently) if none of the committers picks it up and applies it

 NOTE: It is _not_ necessary that the first version of your patch is 
 completely polished. I often put up partial/incomplete patches (comments 
 with //nocommit are explicitly caught by the ant precommit target for 
 instance) to see if anyone has any comments before polishing.

 Best
 Erick

 On Thu, Jul 25, 2013 at 5:04 AM, Elran Dvir elr...@checkpoint.com wrote:
 Hi,

 I have implemented like Chris described it:
 The field is indexed as numeric, but displayed as string, according to 
 configuration.
 It applies to facet, pivot, group and query.

 How do we proceed? How do I contribute it?

 Thanks.

 -Original Message-
 From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
 Sent: Thursday, July 25, 2013 4:40 AM
 To: solr-user@lucene.apache.org
 Subject: Re: new field type - enum field

 : Doable at Lucene level by any chance?

 Given how well the Trie fields compress (ByteField and ShortField have been 
 deprecated in favor of TrieIntField for this reason) it probably just makes 
 sense to treat it as a numeric at the Lucene level.

 :  If there's positive feedback, I'll open an issue with a patch for the 
 functionality.

 I've typically dealt with this sort of thing at the client layer 
 using a simple numeric field in Solr, or used an UpdateProcessor to 
 convert the
 String-numeric mapping when indexing  used clinet logic of a
 DocTransformer to handle the stored value at query time -- but having a 
 built in FieldType that handles that for you automatically (and helps 
 ensure the indexed values conform to the enum) would certainly be cool if 
 you'd like to contribute it.

 -Hoss

 Email secured by Check Point

 Email secured by Check Point

 Email secured by Check Point

Email secured by Check Point

Re: How might one search for dupe IDs other than faceting on the ID field?


Good to note!

But... any search will not detect dupe IDs for uncommitted documents.

-- Jack Krupansky

-Original Message- 
From: Mikhail Khludnev

Sent: Wednesday, July 31, 2013 6:11 AM
To: solr-user
Subject: Re: How might one search for dupe IDs other than faceting on the ID 
field?


fwiw,

this code won't capture uncommitted duplicates.


On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen dotanco...@gmail.com wrote:


On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
j...@basetechnology.com wrote:
 The Solr SignatureUpdateProcessorFactory is designed to facilitate
dedupe...
 any particular reason you did not use it?

 See:
 http://wiki.apache.org/solr/Deduplication

 and

 https://cwiki.apache.org/confluence/display/solr/De-Duplication


Actually, the guy who made the changes (a coworker) did in fact write
an alternative UpdateHandler. I've just noticed that there are a bunch
of dupes right now, though.

public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 {

public DiscoAPIUpdateHandler(SolrCore core) {
super(core);
}

@Override
public int  addDoc(AddUpdateCommand cmd) throws IOException{

// if overwrite is set to false we'll use the
DefaultUpdateHandler2 , this is done for debugging to insert
duplicates to solr
if (!cmd.overwrite) return super.addDoc(cmd);


// when using ref counted objects you have!! to decrement the
ref count when your done
RefCountedSolrIndexSearcher indexSearcher =
this.core.getNewestSearcher(false);

// the idea is like this we'll make an internal lucene query
and check if that id already exists

Term updateTerm = null;


if (cmd.updateTerm != null){
updateTerm = cmd.updateTerm;
} else {
updateTerm = new Term(id,cmd.getIndexedId());
}


Query query = new TermQuery(updateTerm);
TopDocs docs = indexSearcher.get().search(query,2);

if (docs.totalHits0){
// index searcher is no longer needed
indexSearcher.decref();
// don't add the new document
return 0;
}

// index searcher is no longer needed
indexSearcher.decref();

// if i'm here then it's a new document
return super.addDoc(cmd);

}

}


 And I give a bunch of examples in my book.


I anticipate the book with esteem!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com





--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com

Re: Unexpected character '' (code 60) expected '='

I checked the File. . .nothing is there. I mean the formatting is correct,
its a valid XML file.


On Wed, Jul 31, 2013 at 6:38 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 This file is malformed:

 *SEVERE: org.apache.solr.common.SolrException: Unexpected character ''
 (code 60) expected '='*
 * at [row,col {unknown-source}]: [20281,18]*

 Check row 20281 column 18


 -Original message-
  From:Vineet Mishra clearmido...@gmail.com
  Sent: Wednesday 31st July 2013 15:05
  To: solr-user@lucene.apache.org
  Subject: Unexpected character 'lt;' (code 60) expected '='
 
  Hi All
 
  I am currently stuck in a Solr Issue while Posting some data to Solr
 Server.
 
  I have some record from Hbase which I am posting to Solr, but after
 posting
  some 1 Million of data records, it suddenly stopped. Checking the
 Catalina
  log trace it showed,
 
  *org.apache.solr.common.SolrException: Unexpected character '' (code 60)
  expected '='*
  *
  *
  *
  *
  I am not sure whether its the issue with some malformed data for the
  posting, because whatever xml file which I am generating before posting I
  have tried posting that specific file to the solr and its going well.
 
  Below is the whole log trace,
 
 
  *SEVERE: org.apache.solr.common.SolrException: Unexpected character ''
  (code 60) expected '='*
  * at [row,col {unknown-source}]: [20281,18]*
  * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)*
  * at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
  *
  * at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  *
  * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1398)*
  * at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
  *
  * at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
  *
  * at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  *
  * at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  *
  * at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  *
  * at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  *
  * at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
  *
  * at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
  *
  * at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  *
  * at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)*
  * at
 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)*
  * at
 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
  *
  * at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
  *
  * at java.lang.Thread.run(Thread.java:722)*
  *Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
  character '' (code 60) expected '='*
  * at [row,col {unknown-source}]: [20281,18]*
  * at
 
 com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)*
  * at
 
 com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3001)
  *
  * at
 
 com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
  *
  * at
 
 com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)*
  * at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)*
  * at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:295)*
  * at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)*
  * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)*
  * ... 17 more*
  *
  *
  Has anybody faced this issue.
 
  Thanks and Regards
  Vineet

Re: Trying to determine the benefit of spellcheck-based suggester vs. using terms component?

2013-07-31 Thread Timothy Potter

Thanks for the reply Erick. I'm looking for type-ahead support; using
spell checking too via the DirectSolrSpellChecker. Seems like the
spell check based suggester is designed for type-head or am I not
understanding something? Here's my config:

requestHandler
class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
str name=echoParamsexplicit/str
str name=wtjson/str
str name=indenttrue/str
str name=dfsuggest/str
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggestDictionary/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count5/str
str name=spellcheck.collatefalse/str
/lst
arr name=components
strsuggest/str
/arr
/requestHandler

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
str name=namesuggestDictionary/str
str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
str name=fieldsuggest/str
float name=threshold0./float
str name=buildOnCommittrue/str
/lst
/searchComponent

I was confused why this approach was needed because using terms
component is so easy and doesn't require any build step. From your
answer, it seems like either approach is valid in Solr 4.4 but the
spellcheck based suggester has more knobs, such as loading in an
external dictionary in addition to data in my index, etc.

Cheers,
Tim


On Wed, Jul 31, 2013 at 5:08 AM, Erick Erickson erickerick...@gmail.com wrote:
 The biggest thing is that the spellchecker has lots of knobs
 to tune, all the stuff in
 http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate

 TermsComponent, on the other hand, just gives you
 what's in the index with essentially no knobs to tune.

 So it depends on your goal. Typeahead or spelling
 correction? In the first case I'd go for TermsComponent
 and the second spell check as an example.

 Best
 Erick

 On Tue, Jul 30, 2013 at 2:07 PM, Timothy Potter thelabd...@gmail.com wrote:
 Going over the comments in SOLR-1316, I seemed to have lost the
 forrest for the trees. What is the benefit of using the spellcheck
 based suggester over something like the terms component to get
 suggestions as the user types?

 Maybe it is faster because it builds the in-memory data structure on
 commit? Seems like the terms component is pretty fast too.

 I'd appreciate any additional insights about this. There are so many
 solutions to auto-suggest for Solr, it's hard to decide what
 approach to take.

 Cheers,
 Tim

Autowarming last 15 days data

2013-07-31 Thread Cool Techi

Hi,

We have a solr master slave set up with close to 30 million records. Our index 
changes/updates very frequently and replication is set up at 60 seconds delay.

Now every time replication completes, the new searches take a time. How can 
this be improved? I have come across that warming would help this scenario, I 
our case we cannot warm some queries, but most of the users use the last 15 
days data only. 

So would it be possible to auto warm only last 15 days data?

Regards,
Ayush

Re: Measuring SOLR performance

2013-07-31 Thread Dmitry Kan

Hi Roman,

What  version and config of SOLR does the tool expect?

Tried to run, but got:

**ERROR**
  File solrjmeter.py, line 1390, in module
main(sys.argv)
  File solrjmeter.py, line 1296, in main
check_prerequisities(options)
  File solrjmeter.py, line 351, in check_prerequisities
error('Cannot contact: %s' % options.query_endpoint)
  File solrjmeter.py, line 66, in error
traceback.print_stack()
Cannot contact: http://localhost:8983/solr


complains about URL, clicking which leads properly to the admin page...
solr 4.3.1, 2 cores shard

Dmitry


On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla roman.ch...@gmail.com wrote:

 Hello,

 I have been wanting some tools for measuring performance of SOLR, similar
 to Mike McCandles' lucene benchmark.

 so yet another monitor was born, is described here:
 http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/

 I tested it on the problem of garbage collectors (see the blogs for
 details) and so far I can't conclude whether highly customized G1 is better
 than highly customized CMS, but I think interesting details can be seen
 there.

 Hope this helps someone, and of course, feel free to improve the tool and
 share!

 roman

Re: Improper shutdown of Solr in Jetty 9

2013-07-31 Thread Dmitry Kan

OK. On ubuntu there are shell scripts that come with jetty 9. They seem to
do the proper job (disclaimer: not yet extensive testing with solr done,
but looks good so far).
Not sure, how well jetty supports win environment on the life-cycle
automation side.

On Wed, Jul 31, 2013 at 1:43 PM, Artem Karpenko a.karpe...@oxseed.comwrote:

Hello Dmitry,

it's Windows 7. I'm starting Jetty with java -jar start.jar

31.07.2013 12:36, Dmitry Kan пишет:

Artem,

Whats the OS are using?
So far jetty 9 with solr 4.3.1 works ok under ubuntu 12.04.
On 30 Jul 2013 17:23, Alexandre Rafalovitch arafa...@gmail.com wrote:

Of course, I meant Jetty (not Tomcat). So apologies for spam and
confusion
of my own. The rest of the statement stands.

Personal website: http://www.outerthoughts.com/
LinkedIn:
http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)

On Tue, Jul 30, 2013 at 10:20 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

Thanks for letting us know. See if you can add it to the documentation
somewhere.

Solr is not using Tomcat 9, but I believe that was primarily because
Tomcat 9 requires Java 7 and Solr 4.x is staying with Java 6 as minimum
requirement.

Regards,
Alex.

On Tue, Jul 30, 2013 at 10:09 AM, Artem Karpenko a.karpe...@oxseed.com
wrote:

Uh, sorry for spamming, but if anyone interested there is a way to
properly shutdown Jetty when it's launched with --exec flag.
You can use JMX to invoke method stop() on the Jetty's Server MBean.

This

triggers a proper shutdown with all Solr's close() callbacks executed.
I wonder why it's not noted at least in Jetty documentation.

Regards,
Artem Karpenko.

30.07.2013 16:58, Artem Karpenko пишет:

After some investigation I found that the problem is not with Jetty's

version but usage of --exec flag.
Namely, when --exec is used (to specify JVM args) then shutdown is not
graceful, it seems that Java process that is just killed.
Not sure how to handle this...

Regards,
Artem Karpenko.

29.07.2013 16:51, Artem Karpenko пишет:

Hi,

I can't make Solr shut down properly when using Jetty 9. Tested this
with a simple plugin that only extends DirectUpdateHandler2, creates
a
file in constructor and deletes it in close(). While it's working
fine
in the example installation (the one that can be downloaded from Solr
site) and in the simple custom installation with Jetty 8, it won't in
Jetty 9. There is not much logging at shutdown at all, just Jetty's
closing selector or smth., unlike with Jetty 8 where it prints

various

Graceful shutdown messages from Solr.

Installation procedure I used for both Jettys is rather simple: just

put

solr.war into webapps/ directory, plugin JAR into {core}/lib/ and
configure update handler in solrconfig.xml.
OS is Windows 7, Solr 4.4.
I tried to stop Jetty with both Ctrl+C and java start.jar
[port/key
params] --stop. For Jetty 8 it works fine even with Ctrl+C.

Did anybody stumble on this issue?

Best regards,
Artem.

Re: Measuring SOLR performance

2013-07-31 Thread Dmitry Kan

Ok, got the error fixed by modifying the base solr ulr in solrjmeter.py
(added core name after /solr part).
Next error is:

WARNING: no test name(s) supplied nor found in:
['/home/dmitry/projects/lab/solrjmeter/demo/queries/demo.queries']

It is a 'slow start with new tool' symptom I guess.. :)


On Wed, Jul 31, 2013 at 4:39 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi Roman,

 What  version and config of SOLR does the tool expect?

 Tried to run, but got:

 **ERROR**
   File solrjmeter.py, line 1390, in module
 main(sys.argv)
   File solrjmeter.py, line 1296, in main
 check_prerequisities(options)
   File solrjmeter.py, line 351, in check_prerequisities
 error('Cannot contact: %s' % options.query_endpoint)
   File solrjmeter.py, line 66, in error
 traceback.print_stack()
 Cannot contact: http://localhost:8983/solr


 complains about URL, clicking which leads properly to the admin page...
 solr 4.3.1, 2 cores shard

 Dmitry


 On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla roman.ch...@gmail.comwrote:

 Hello,

 I have been wanting some tools for measuring performance of SOLR, similar
 to Mike McCandles' lucene benchmark.

 so yet another monitor was born, is described here:
 http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/

 I tested it on the problem of garbage collectors (see the blogs for
 details) and so far I can't conclude whether highly customized G1 is
 better
 than highly customized CMS, but I think interesting details can be seen
 there.

 Hope this helps someone, and of course, feel free to improve the tool and
 share!

 roman

Re: SolrCloud Exception

On 7/31/2013 4:27 AM, Sinduja Rajendran wrote:
 I am running solr 4.0 in a cloud. We have close to 100Mdocuments. The data
 is from a single DB table. I use dih.
 Our solrCloud has 3 zookeepers, one tomcat, 2 solr instances in same
 tomcat. We have 8 GB Ram.
 
 After indexing 14M, my indexing fails witht the below exception.
 
 solr org.apache.lucene.index.MergePolicy$MergeException:
 java.lang.OutOfMemoryError: GC overhead limit exceeded
 
 I tried increasing the GC value to the App server
 
  -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80
 
 But after giving the command, my indexing went drastically down. Its
 was indexing only 15k documents for 20 minutes. Earlier it was 300k
 for 20 min.

First thing to mention is that Solr 4.0 was extremely buggy, upgrading
would be advisable.  In the meantime:

An OutOfMemoryError means that Solr needs more heap memory than the JVM
is allowed to use.  The Solr Admin UI dashboard will tell you how much
memory is allocated to your JVM, which you can increase with the -Xmx
parameter.  Real RAM must be available from the system in order to
increase the heap size.

The options you have given just change the GC collector and tune one
aspect of the new collector, they don't increase anything.  Here are
some things that may help you:

http://wiki.apache.org/solr/SolrPerformanceProblems
http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

After looking over that information and making adjustments, if you are
still having trouble, we can go over your config and all your details to
see what can be done.

You said that both of your Solr instances are running in the same
tomcat.  Just FYI - because you aren't running all functions on separate
hardware, your setup is not fault tolerant.  Machine failures DO happen,
no matter how much redundancy you build into that server.  If you are
running all this on a redundant VM solution that has live migration of
running VMs, then my statement isn't accurate.

Thanks,
Shawn

SolrCloud - Replica 'down'. How to get it back as 'active'? - Solr 4.3.0

2013-07-31 Thread Jeroen Steggink


Hi,

After the following error, one of the replicas of the leader went down.
Error opening new searcher. exceeded limit of maxWarmingSearchers=2, 
try again later.

I increased the autoCommit time to 5000ms and restarted Solr.

However, the status is still set to down.
How do I get it back to active?

Regards,
Jeroen

Re: Solr PolyField

Nope. Solr fields are flat. Why do you want to do this? I'm
asking because this might be an XY problems and there
may be other possibilities.

Best
Erick

On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso
meligalet...@gmail.com wrote:
 Hi, I'm trying to create a field with multiple fields inside, that is:

 origin:
 {

 htmlUrl: http://www.gazzetta.it/;,
 streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
 title: Gazzetta.it

 },


 Get something like this. Is that possible? I'm using Solr 4.4.0.

 Thanks

Re: Sharding with a SolrCloud

You're in uncharted territory. I can imagine you use
a SolrCloud cluster as a separate Solr for a federated
search, but using it as a single shard just seems wrong.

If nothing else, indexing to the shards will require that
the documents be routed correctly. But having one
shard in SolrCloud and another shard managed
externally seems ripe for getting the docs indexed
to various shards you're not expecting, unless you're
using explicit routing

All in all, this _really_ sounds like something you should
not be attempting. Why are you trying to do this? Is it
possible to just set up a SolrCloud cluster and index
all the docs to it and be done with it?

'cause I think you'll end up with endless problems given
what you've described.

Best
Erick

On Wed, Jul 31, 2013 at 5:16 AM, Oliver Goldschmidt
o.goldschm...@tuhh.de wrote:
 Hi list,

 I have a Solr server, which uses sharding to make distributed search
 with another Solr server. The other Solr server now migrates to a Solr
 Cloud system. I've been trying recently to continue searching the Solr
 Cloud as a shard for my Solr server, but this is failing with mysterious
 effects. I am getting a result with a number of hits, when I perform a
 search, but the results are not displayed at all. This is the resonse
 header I am getting from Solr:

 {
   responseHeader:{
 status:0,
 QTime:305,
 params:{
   facet:true,
   indent:yes,
   facet.mincount:1,
   facet.limit:30,
   qf:title_short^750 title_full_unstemmed^600,
   json.nl:arrarr,
   wt:json,
   rows:20,
   shards:ourindex.nowhere.de/solr/index,
   bq:format:Book^500,
   fl:*,score,
   facet.sort:count,
   start:0,
   q:xml,
   shards.info:true,
   facet.prefix:,
   facet.field:[publishDate],
   qt:dismax}},
   shards.info:{
 ourindex.nowhere.de/solr/index:{
   numFound:10076,
   maxScore:8.507474,
   time:263}},
   response:{numFound:10056,start:0,maxScore:8.507474,docs:[]
   }

 As you can see, there are no docs in the result. This result is not 100%
 reproducable: sometimes I get no results displayed, other times it works
 (with the same query URL!). As you also can see in the result, the
 number of hits in the response is a little bit less than the number of
 hits sent from the shard.

 This makes me wonder if it is not possible to use a Solr Cloud as a
 shard for another standalone Solr server?

 Any hint is appreciated!

 Best
 - Oliver

 --
 Oliver Goldschmidt
 TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
 Denickestr. 22
 21071 Hamburg - Harburg
 Tel.+49 (0)40 / 428 78 - 32 91
 eMail   o.goldschm...@tuhh.de
 --
 GPG/PGP-Schlüssel:
 http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc

Re: SolrCloud - Replica 'down'. How to get it back as 'active'? - Solr 4.3.0

2013-07-31 Thread Anshum Gupta

It perhaps is just replaying the transaction logs and coming up. Wait for
it is what I'd say.
The admin UI as of now doesn't show replaying of transaction log as
'recovering', it does so only during peer sync.

Also, you may want to add autoSoftCommit and increase the autoCommit to a
few minutes.


On Wed, Jul 31, 2013 at 7:55 PM, Jeroen Steggink jer...@stegg-inc.comwrote:

 Hi,

 After the following error, one of the replicas of the leader went down.
 Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try
 again later.
 I increased the autoCommit time to 5000ms and restarted Solr.

 However, the status is still set to down.
 How do I get it back to active?

 Regards,
 Jeroen





-- 

Anshum Gupta
http://www.anshumgupta.net

Re: Solr PolyField

Hi,

I'm trying to index information of RSS Feeds.

So in a more detailed explanation:

The RSS feed has something like: 
enclosure url=http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; 
length=32642192 type=audio/mpeg/

With my current configuration, this is working and i get a result like that:

enclosure: [
audio/mpeg,
http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
37521428
],

BUT, this is not the result that i'm trying to reach. With that i'm not able to 
know in a correct way, if audio/mpeg is the type, or the url, or the length.

I want to reach something like:

enclosure: {
type: audio/mpeg,
url: http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
length: 37521428
},


So, how i intend this, this should be 3 fields inside of another field, no?


Many Thanks for the answer and the help.


On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com wrote:

 Nope. Solr fields are flat. Why do you want to do this? I'm
 asking because this might be an XY problems and there
 may be other possibilities.
 
 Best
 Erick
 
 On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso
 meligalet...@gmail.com wrote:
 Hi, I'm trying to create a field with multiple fields inside, that is:
 
 origin:
 {
 
 htmlUrl: http://www.gazzetta.it/;,
 streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
 title: Gazzetta.it
 
 },
 
 
 Get something like this. Is that possible? I'm using Solr 4.4.0.
 
 Thanks



smime.p7s
Description: S/MIME cryptographic signature

Re: Autowarming last 15 days data

On 7/31/2013 7:30 AM, Cool Techi wrote:
 We have a solr master slave set up with close to 30 million records. Our 
 index changes/updates very frequently and replication is set up at 60 seconds 
 delay.
 
 Now every time replication completes, the new searches take a time. How can 
 this be improved? I have come across that warming would help this scenario, I 
 our case we cannot warm some queries, but most of the users use the last 15 
 days data only. 
 
 So would it be possible to auto warm only last 15 days data?

Autowarming is generally done automatically when a new searcher is
opened, according to the cache config.  It will take the most recent N
queries in the cache (according to the autowarmCount) and re-execute
those queries against the index to populate the cache.  The document
cache cannot be warmed directly, but when the query result cache is
warmed, that will also populate the document cache.

Because you have a potentially very frequent interval for opening new
searchers (possibly replicating every 60 seconds), you will want to
avoid large autowarmCount values.  If your autowarming ends up taking
too long, the system will try to open a new searcher while the previous
one is being warmed, which can lead to problems.  I have found that the
filterCache is particularly slow to warm.

Thanks,
Shawn

Re: Solr PolyField

2013-07-31 Thread Michael Della Bitta

Luís,

Is there a reason why splitting this up into enclosure_type, enclosure_url,
and enclosure_length would not work?


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso 
meligalet...@gmail.com wrote:

 Hi,

 I'm trying to index information of RSS Feeds.

 So in a more detailed explanation:

 The RSS feed has something like:
 enclosure url=http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3;
  length=32642192 type=audio/mpeg/

 *With my current configuration, this is working and i get a result like
 that:*


- enclosure:
[
   - audio/mpeg,
   - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
   - 37521428
   ],


 *BUT,* this is not the result that i'm trying to reach. With that i'm not
 able to know in a correct way, if audio/mpeg is the *type*, or the *
 url,* or the *length*.
 *
 *
 *I want to reach something like:*

-
- enclosure:
{
   - type: a http://www.gazzetta.it/udio/mpeg,
   - url:
   http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
   - length: 37521428
   },



 So, how i intend this, this should be 3 fields inside of another field, no?


 Many Thanks for the answer and the help.


 On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Nope. Solr fields are flat. Why do you want to do this? I'm
 asking because this might be an XY problems and there
 may be other possibilities.

 Best
 Erick

 On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso
 meligalet...@gmail.com wrote:

 Hi, I'm trying to create a field with multiple fields inside, that is:

 origin:
 {

 htmlUrl: http://www.gazzetta.it/;,
 streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
 title: Gazzetta.it

 },


 Get something like this. Is that possible? I'm using Solr 4.4.0.

 Thanks

Re: Solr PolyField

This fields can be multiValued.
I the rss standart there is not correct to do that, but some sources do and i 
like to grab it all. Is there any way that make it possible?

Once again, Many thanks :)

On Jul 31, 2013, at 3:54 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Luís,
 
 Is there a reason why splitting this up into enclosure_type, enclosure_url,
 and enclosure_length would not work?
 
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 w: appinions.com http://www.appinions.com/
 
 
 On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso 
 meligalet...@gmail.com wrote:
 
 Hi,
 
 I'm trying to index information of RSS Feeds.
 
 So in a more detailed explanation:
 
 The RSS feed has something like:
 enclosure url=http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3;
 length=32642192 type=audio/mpeg/
 
 *With my current configuration, this is working and i get a result like
 that:*
 
 
   - enclosure:
   [
  - audio/mpeg,
  - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
  - 37521428
  ],
 
 
 *BUT,* this is not the result that i'm trying to reach. With that i'm not
 able to know in a correct way, if audio/mpeg is the *type*, or the *
 url,* or the *length*.
 *
 *
 *I want to reach something like:*
 
   -
   - enclosure:
   {
  - type: a http://www.gazzetta.it/udio/mpeg,
  - url:
  http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
  - length: 37521428
  },
 
 
 
 So, how i intend this, this should be 3 fields inside of another field, no?
 
 
 Many Thanks for the answer and the help.
 
 
 On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 Nope. Solr fields are flat. Why do you want to do this? I'm
 asking because this might be an XY problems and there
 may be other possibilities.
 
 Best
 Erick
 
 On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso
 meligalet...@gmail.com wrote:
 
 Hi, I'm trying to create a field with multiple fields inside, that is:
 
 origin:
 {
 
 htmlUrl: http://www.gazzetta.it/;,
 streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
 title: Gazzetta.it
 
 },
 
 
 Get something like this. Is that possible? I'm using Solr 4.4.0.
 
 Thanks
 
 
 



smime.p7s
Description: S/MIME cryptographic signature

Re: Sharding with a SolrCloud

2013-07-31 Thread Oliver Goldschmidt

Thank you very much for that information, Erick. That was what I was
fearing...

Well, the problem, why I am trying to do this is, that the SolrCloud is
managed by someone else. We are indexing some content to a pretty small
local index. To this index we have complete access and can do whatever
we want to do. But we also need the seperate index, which is now moving
into the cloud. Its not possible to put our local content into the
cloud, because we are not maintaining it and have no write permission to it.

But why shouldn't that work? Isn't Solr Cloud acting like one solr
server? The indices have to be maintained seperately - can't I just
continue using them as shards and get one result list from both of them
(thats how I did it before they wanted to switch to Solr Cloud)?

Though, if there is no way to use the cloud as a shard, we will have to
think about how to solve that. Of course we can split up the queries and
make two queries (one for the cloud and one for our local index). But
this might be a bit confusing for the user.

Thank you again, best
- Oliver

Am 31.07.2013 16:39, schrieb Erick Erickson:
 You're in uncharted territory. I can imagine you use
 a SolrCloud cluster as a separate Solr for a federated
 search, but using it as a single shard just seems wrong.

 If nothing else, indexing to the shards will require that
 the documents be routed correctly. But having one
 shard in SolrCloud and another shard managed
 externally seems ripe for getting the docs indexed
 to various shards you're not expecting, unless you're
 using explicit routing

 All in all, this _really_ sounds like something you should
 not be attempting. Why are you trying to do this? Is it
 possible to just set up a SolrCloud cluster and index
 all the docs to it and be done with it?

 'cause I think you'll end up with endless problems given
 what you've described.

 Best
 Erick

 On Wed, Jul 31, 2013 at 5:16 AM, Oliver Goldschmidt
 o.goldschm...@tuhh.de wrote:
 Hi list,

 I have a Solr server, which uses sharding to make distributed search
 with another Solr server. The other Solr server now migrates to a Solr
 Cloud system. I've been trying recently to continue searching the Solr
 Cloud as a shard for my Solr server, but this is failing with mysterious
 effects. I am getting a result with a number of hits, when I perform a
 search, but the results are not displayed at all. This is the resonse
 header I am getting from Solr:

 {
   responseHeader:{
 status:0,
 QTime:305,
 params:{
   facet:true,
   indent:yes,
   facet.mincount:1,
   facet.limit:30,
   qf:title_short^750 title_full_unstemmed^600,
   json.nl:arrarr,
   wt:json,
   rows:20,
   shards:ourindex.nowhere.de/solr/index,
   bq:format:Book^500,
   fl:*,score,
   facet.sort:count,
   start:0,
   q:xml,
   shards.info:true,
   facet.prefix:,
   facet.field:[publishDate],
   qt:dismax}},
   shards.info:{
 ourindex.nowhere.de/solr/index:{
   numFound:10076,
   maxScore:8.507474,
   time:263}},
   response:{numFound:10056,start:0,maxScore:8.507474,docs:[]
   }

 As you can see, there are no docs in the result. This result is not 100%
 reproducable: sometimes I get no results displayed, other times it works
 (with the same query URL!). As you also can see in the result, the
 number of hits in the response is a little bit less than the number of
 hits sent from the shard.

 This makes me wonder if it is not possible to use a Solr Cloud as a
 shard for another standalone Solr server?

 Any hint is appreciated!

 Best
 - Oliver

 --
 Oliver Goldschmidt
 TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
 Denickestr. 22
 21071 Hamburg - Harburg
 Tel.+49 (0)40 / 428 78 - 32 91
 eMail   o.goldschm...@tuhh.de
 --
 GPG/PGP-Schlüssel:
 http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc



-- 
Oliver Goldschmidt
TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
Denickestr. 22
21071 Hamburg - Harburg
Tel.+49 (0)40 / 428 78 - 32 91
eMail   o.goldschm...@tuhh.de
--
GPG/PGP-Schlüssel: 
http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc

Re: Measuring SOLR performance

On 7/31/2013 12:24 AM, William Bell wrote:
 But that link does not tell me which on you are using?
 
 You are listing like 4 versions on your site.
 
 Also, what did it fix? Pause times?
 
 
 Any other words of wisdom ?

I'm not sure whether that was directed at me or Roman, but here's my
answers:

I run one copy of my index on Solr 3.5.0 and another copy on Solr 4.2.1.
 I have a completely separate (and much smaller) index using SolrCloud
on 4.2.1.

I was seeing GC pause times of 8-10 seconds on both 3.5.0 and 4.2.1 with
an untuned CMS collector.  When I switched that to G1 (also untuned), I
was seeing pause times of 12 seconds.  The average GC time did go down,
but the long stop-the-world pauses were worse.  I used the jHiccup tool
to see the problem.

I went to a CMS config much like what Roman used on his benchmarks, and
that improved things greatly, but I was still seeing occasional pauses
long enough to make my load balancer ping check (5 second timeout) think
that the index had gone down.

I later tried the CMS config that's on my wiki page.  That seems to have
fixed my load balancer problem.  I do still see pauses of up to a
second, but they are not frequent.  We have more page load delay from
our webapp than we do from Solr, so users aren't noticing when searches
occasionally take a little longer.

Thanks,
Shawn

Re: Solr PolyField

As a single record? Hum, no.

So an Rss has /rss/channel/ and then lot of /rss/channel/item, right?
Each /rss/channel/item is a new document on Solr. I start with the solr example 
rss, but i change that to has more fields, other fields and get the feed url 
from a database.

So each /rss/channel/item is a document to the indexing, bue each 
/rss/channel/item can have more than on enclosure tag.

Many thanks

On Jul 31, 2013, at 4:05 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 So you're trying to index a RSS feed as a single record, but you want to be
 able to search for and retrieve individual entries from within the feed? Is
 that the issue?
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 w: appinions.com http://www.appinions.com/
 
 
 On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso 
 meligalet...@gmail.com wrote:
 
 This fields can be multiValued.
 I the rss standart there is not correct to do that, but some sources do
 and i like to grab it all. Is there any way that make it possible?
 
 Once again, Many thanks :)
 
 On Jul 31, 2013, at 3:54 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:
 
 Luís,
 
 Is there a reason why splitting this up into enclosure_type,
 enclosure_url,
 and enclosure_length would not work?
 
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/
 
 
 On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso 
 meligalet...@gmail.com wrote:
 
 Hi,
 
 I'm trying to index information of RSS Feeds.
 
 So in a more detailed explanation:
 
 The RSS feed has something like:
 enclosure url=
 http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3;
 length=32642192 type=audio/mpeg/
 
 *With my current configuration, this is working and i get a result like
 that:*
 
 
  - enclosure:
  [
 - audio/mpeg,
 - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
 - 37521428
 ],
 
 
 *BUT,* this is not the result that i'm trying to reach. With that i'm
 not
 able to know in a correct way, if audio/mpeg is the *type*, or the *
 url,* or the *length*.
 *
 *
 *I want to reach something like:*
 
  -
  - enclosure:
  {
 - type: a http://www.gazzetta.it/udio/mpeg,
 - url:
 http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
 - length: 37521428
 },
 
 
 
 So, how i intend this, this should be 3 fields inside of another field,
 no?
 
 
 Many Thanks for the answer and the help.
 
 
 On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 Nope. Solr fields are flat. Why do you want to do this? I'm
 asking because this might be an XY problems and there
 may be other possibilities.
 
 Best
 Erick
 
 On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso
 meligalet...@gmail.com wrote:
 
 Hi, I'm trying to create a field with multiple fields inside, that is:
 
 origin:
 {
 
 htmlUrl: http://www.gazzetta.it/;,
 streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
 title: Gazzetta.it
 
 },
 
 
 Get something like this. Is that possible? I'm using Solr 4.4.0.
 
 Thanks
 
 
 
 
 



smime.p7s
Description: S/MIME cryptographic signature

Re: SolrCloud - Replica 'down'. How to get it back as 'active'? - Solr 4.3.0

2013-07-31 Thread Jeroen Steggink


Thanks Anshum,

autoSoftCommit was alread set to 1000ms, but I changed the autoCommit to 
3 minutes.


I'll wait for it to come back. The index contains about 200.000 
documents and the last commit was 14 hours ago. So I wonder how long it 
will take.

I would have thought it would be back up already.

On 31-7-2013 16:40, Anshum Gupta wrote:

It perhaps is just replaying the transaction logs and coming up. Wait for
it is what I'd say.
The admin UI as of now doesn't show replaying of transaction log as
'recovering', it does so only during peer sync.

Also, you may want to add autoSoftCommit and increase the autoCommit to a
few minutes.


On Wed, Jul 31, 2013 at 7:55 PM, Jeroen Steggink jer...@stegg-inc.comwrote:


Hi,

After the following error, one of the replicas of the leader went down.
Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try
again later.
I increased the autoCommit time to 5000ms and restarted Solr.

However, the status is still set to down.
How do I get it back to active?

Regards,
Jeroen

Re: Solr PolyField

Hum, ok.

It's possible to add to a field, static text? Text that i write on the 
configuration and then append another field? I saw something like 
CloneFieldProcessor but when i'm starting solr, it says that could not find the 
class.
I was trying to use processors to move one field to another.

I saw this:
processor class=solr.FieldCopyProcessorFactory
  str name=sourcelastname firstname/str
  str name=destfullname/str
  bool name=appendtrue/bool
  str name=append.delim, /str
/processor

But when i try to use it solr says that he cannot find the 
solr.FieldCopyProcessorFactory. I'm using solr 4.4.0

Thanks ;)

On Jul 31, 2013, at 4:16 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 OK,
 
 Then I would suggest creating multiValued enclosure_type, etc. tags for
 searching, and then one string-typed field to store the JSON snippet you've
 been showing.
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 w: appinions.com http://www.appinions.com/
 
 
 On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso 
 meligalet...@gmail.com wrote:
 
 As a single record? Hum, no.
 
 So an Rss has /rss/channel/ and then lot of /rss/channel/item, right?
 Each /rss/channel/item is a new document on Solr. I start with the solr
 example rss, but i change that to has more fields, other fields and get the
 feed url from a database.
 
 So each /rss/channel/item is a document to the indexing, bue each
 /rss/channel/item can have more than on enclosure tag.
 
 Many thanks
 
 On Jul 31, 2013, at 4:05 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:
 
 So you're trying to index a RSS feed as a single record, but you want to
 be
 able to search for and retrieve individual entries from within the feed?
 Is
 that the issue?
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/
 
 
 On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso 
 meligalet...@gmail.com wrote:
 
 This fields can be multiValued.
 I the rss standart there is not correct to do that, but some sources do
 and i like to grab it all. Is there any way that make it possible?
 
 Once again, Many thanks :)
 
 On Jul 31, 2013, at 3:54 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:
 
 Luís,
 
 Is there a reason why splitting this up into enclosure_type,
 enclosure_url,
 and enclosure_length would not work?
 
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/
 
 
 On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso 
 meligalet...@gmail.com wrote:
 
 Hi,
 
 I'm trying to index information of RSS Feeds.
 
 So in a more detailed explanation:
 
 The RSS feed has something like:
 enclosure url=
 http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3;
 length=32642192 type=audio/mpeg/
 
 *With my current configuration, this is working and i get a result
 like
 that:*
 
 
 - enclosure:
 [
- audio/mpeg,
- http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
- 37521428
],
 
 
 *BUT,* this is not the result that i'm trying to reach. With that i'm
 not
 able to know in a correct way, if audio/mpeg is the *type*, or
 the *
 url,* or the *length*.
 *
 *
 *I want to reach something like:*
 
 -
 - enclosure:
 {
- type: a http://www.gazzetta.it/udio/mpeg,
- url:
http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
- length: 37521428
},
 
 
 
 So, how i intend this, this should be 3 fields inside of another
 field,
 no?
 
 
 Many Thanks for the answer and the help.
 
 
 On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 Nope. Solr fields are flat. Why do you want to do this? I'm
 asking because this might be an XY problems and there
 may be other possibilities.
 
 Best
 Erick
 
 On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso
 meligalet...@gmail.com wrote:
 
 Hi, I'm trying to create a field with multiple fields inside, that is:
 
 origin:
 {
 
 htmlUrl: http://www.gazzetta.it/;,
 streamId: feed/http://www.gazzetta.it/rss/Home.xml;,
 title: Gazzetta.it

Re: Sharding with a SolrCloud

Well, assuming you have solved the differences
in statistics between the index you maintain and
the one in the cloud with respect to the scoring...

My comment about indexing is probably
irrelevant, you're not indexing anything to the
SolrCloud cluster.

But still doubt this will work. Here's the problem:

Internally, the round-trip looks like this:
node 1 receives request
node 1 sends requests to all the shards
node 1 receives the top N docs from each shard
node 1 collates those to the real top N
node1 then queries each shard for the docs hosted on those shards.

This last step is where I'd expect just adding shard to
the list that happened to be a separate SolrCloud instance
to fall down, the originating node would expect to just get
the documents from the shard it knew about.

And if you list _all_ the shards in the SolrCloud instance,
then each of them will distribute the request to all shards
in the SolrCloud instance, confusing things even more.

Much of this is speculation, but I can imagine a number
of ways this scenario would go bad, it wasn't one of the
design goals as far as I know.

Best
Erick

On Wed, Jul 31, 2013 at 11:01 AM, Oliver Goldschmidt
o.goldschm...@tuhh.de wrote:
 Thank you very much for that information, Erick. That was what I was
 fearing...

 Well, the problem, why I am trying to do this is, that the SolrCloud is
 managed by someone else. We are indexing some content to a pretty small
 local index. To this index we have complete access and can do whatever
 we want to do. But we also need the seperate index, which is now moving
 into the cloud. Its not possible to put our local content into the
 cloud, because we are not maintaining it and have no write permission to it.

 But why shouldn't that work? Isn't Solr Cloud acting like one solr
 server? The indices have to be maintained seperately - can't I just
 continue using them as shards and get one result list from both of them
 (thats how I did it before they wanted to switch to Solr Cloud)?

 Though, if there is no way to use the cloud as a shard, we will have to
 think about how to solve that. Of course we can split up the queries and
 make two queries (one for the cloud and one for our local index). But
 this might be a bit confusing for the user.

 Thank you again, best
 - Oliver

 Am 31.07.2013 16:39, schrieb Erick Erickson:
 You're in uncharted territory. I can imagine you use
 a SolrCloud cluster as a separate Solr for a federated
 search, but using it as a single shard just seems wrong.

 If nothing else, indexing to the shards will require that
 the documents be routed correctly. But having one
 shard in SolrCloud and another shard managed
 externally seems ripe for getting the docs indexed
 to various shards you're not expecting, unless you're
 using explicit routing

 All in all, this _really_ sounds like something you should
 not be attempting. Why are you trying to do this? Is it
 possible to just set up a SolrCloud cluster and index
 all the docs to it and be done with it?

 'cause I think you'll end up with endless problems given
 what you've described.

 Best
 Erick

 On Wed, Jul 31, 2013 at 5:16 AM, Oliver Goldschmidt
 o.goldschm...@tuhh.de wrote:
 Hi list,

 I have a Solr server, which uses sharding to make distributed search
 with another Solr server. The other Solr server now migrates to a Solr
 Cloud system. I've been trying recently to continue searching the Solr
 Cloud as a shard for my Solr server, but this is failing with mysterious
 effects. I am getting a result with a number of hits, when I perform a
 search, but the results are not displayed at all. This is the resonse
 header I am getting from Solr:

 {
   responseHeader:{
 status:0,
 QTime:305,
 params:{
   facet:true,
   indent:yes,
   facet.mincount:1,
   facet.limit:30,
   qf:title_short^750 title_full_unstemmed^600,
   json.nl:arrarr,
   wt:json,
   rows:20,
   shards:ourindex.nowhere.de/solr/index,
   bq:format:Book^500,
   fl:*,score,
   facet.sort:count,
   start:0,
   q:xml,
   shards.info:true,
   facet.prefix:,
   facet.field:[publishDate],
   qt:dismax}},
   shards.info:{
 ourindex.nowhere.de/solr/index:{
   numFound:10076,
   maxScore:8.507474,
   time:263}},
   response:{numFound:10056,start:0,maxScore:8.507474,docs:[]
   }

 As you can see, there are no docs in the result. This result is not 100%
 reproducable: sometimes I get no results displayed, other times it works
 (with the same query URL!). As you also can see in the result, the
 number of hits in the response is a little bit less than the number of
 hits sent from the shard.

 This makes me wonder if it is not possible to use a Solr Cloud as a
 shard for another standalone Solr server?

 Any hint is appreciated!

 Best
 - Oliver

 --
 Oliver Goldschmidt
 TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
 Denickestr. 22
 21071 Hamburg - Harburg
 Tel.+49 (0)40

Re: Autowarming last 15 days data

On 7/31/2013 9:21 AM, Cool Techi wrote:
 Would it make sense if we open a newSearcher with the last 15 days documents? 
 since these is the documents which are mostly used by the users. Also, how 
 could i do this if this is possible?

When you open a searcher, it's for the entire index.  You may want to go
distributed and keep the newest 15 days of data in a separate index from
the rest.  For my own index, I use this hot/cold shard setup.  I have a
nightly process that indexes data that needs to be moved into the cold
shards and deletes it from the hot shard.

http://wiki.apache.org/solr/DistributedSearch

SolrCloud is the future of distributed search, but it does not have
built-in support for a hot/cold shard setup.  You'd need to manage that
yourself with manual sharding.  A custom sharding plugin to automate
indexing would likely be very very involved, it would probably be easier
to manage it outside of SolrCloud.

Thanks,
Shawn

solr 4.4 multiple datasource connection

2013-07-31 Thread Carmine Paternoster

in my db-data-config.xml i have configured two datasource, each with his
parameter name, for example:

dataSource name=test1
 type=JdbcDataSource
 driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost/firstdb
 user=username1
 password=psw1/

dataSource name=test2
 type=JdbcDataSource
 driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost/seconddb
 user=username2
 password=psw2/

document name=content
entity name=news datasource=test1 query=select...
field column=OTYPE_ID name=otypeID /
field column=NWS_ID name=cntID /

/entity

entity name=news_update datasource=test2 query=select...
field column=OTYPE_ID name=otypeID /
field column=NWS_ID name=cntID /

/entity
/document
/dataConfig

but when in solr from dataimport i execute the second entity-name-query it
launch an exception:

*Table 'firstdb.secondTable' doesn't exist\n\tat* could someone help me?
thank you in advance


http://stackoverflow.com/questions/17974029/solr-4-4-multiple-datasource-connection

Re: solr 4.4 multiple datasource connection

2013-07-31 Thread Alexandre Rafalovitch

On Wed, Jul 31, 2013 at 11:49 AM, Carmine Paternoster
carmine...@gmail.comwrote:

 entity name=news datasource=test1 query=select...


Try datasource = dataSource in:
entity name=news datasource=test1 query=select...
entity name=news_update datasource=test2 query=select...

Regards,
   Alex.
P.s. This check will be (eventually) part of SolrLint:
https://github.com/arafalov/SolrLint/issues/7

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Re: Solr PolyField

See:
https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html

I have more examples in my book.

-- Jack Krupansky

From: Luís Portela Afonso 
Sent: Wednesday, July 31, 2013 11:41 AM
To: solr-user@lucene.apache.org 
Subject: Re: Solr PolyField

Hum, ok. 

It's possible to add to a field, static text? Text that i write on the 
configuration and then append another field? I saw something like 
CloneFieldProcessor but when i'm starting solr, it says that could not find the 
class.
I was trying to use processors to move one field to another.

I saw this:
processor class=solr.FieldCopyProcessorFactory
  str name=sourcelastname firstname/str
  str name=destfullname/str
  bool name=appendtrue/bool
  str name=append.delim, /str
/processor
But when i try to use it solr says that he cannot find the 
solr.FieldCopyProcessorFactory. I'm using solr 4.4.0

Thanks ;)

On Jul 31, 2013, at 4:16 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

  OK,

  Then I would suggest creating multiValued enclosure_type, etc. tags for
  searching, and then one string-typed field to store the JSON snippet you've
  been showing.

  Michael Della Bitta

  Applications Developer

  o: +1 646 532 3062  | c: +1 917 477 7906

  appinions inc.

  “The Science of Influence Marketing”

  18 East 41st Street

  New York, NY 10017

  t: @appinions https://twitter.com/Appinions | g+:

plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
  w: appinions.com http://www.appinions.com/

  On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso 
  meligalet...@gmail.com wrote:

As a single record? Hum, no.

So an Rss has /rss/channel/ and then lot of /rss/channel/item, right?
Each /rss/channel/item is a new document on Solr. I start with the solr
example rss, but i change that to has more fields, other fields and get the
feed url from a database.

So each /rss/channel/item is a document to the indexing, bue each
/rss/channel/item can have more than on enclosure tag.

Many thanks

On Jul 31, 2013, at 4:05 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

  So you're trying to index a RSS feed as a single record, but you want to

be

  able to search for and retrieve individual entries from within the feed?

Is

  that the issue?

  Michael Della Bitta

  Applications Developer

  o: +1 646 532 3062  | c: +1 917 477 7906

  appinions inc.

  “The Science of Influence Marketing”

  18 East 41st Street

  New York, NY 10017

  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions

https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts

  w: appinions.com http://www.appinions.com/

  On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso 
  meligalet...@gmail.com wrote:

This fields can be multiValued.
I the rss standart there is not correct to do that, but some sources do
and i like to grab it all. Is there any way that make it possible?

Once again, Many thanks :)

On Jul 31, 2013, at 3:54 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

  Luís,

  Is there a reason why splitting this up into enclosure_type,

enclosure_url,

  and enclosure_length would not work?

  Michael Della Bitta

  Applications Developer

  o: +1 646 532 3062  | c: +1 917 477 7906

  appinions inc.

  “The Science of Influence Marketing”

  18 East 41st Street

  New York, NY 10017

  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions

https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts

  w: appinions.com http://www.appinions.com/

  On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso 
  meligalet...@gmail.com wrote:

Hi,

I'm trying to index information of RSS Feeds.

So in a more detailed explanation:

The RSS feed has something like:
enclosure url=

http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3;

length=32642192 type=audio/mpeg/

*With my current configuration, this is working and i get a result

like

that:*

- enclosure:
[
   - audio/mpeg,
   - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
   - 37521428
   ],

*BUT,* this is not the result that i'm trying to reach. With that 
i'm

not

able to know in a correct way, if audio/mpeg is the *type*, or

the *

url,* or the *length*.
*
*
*I want to reach

Re: Solr Cloud Setup

2013-07-31 Thread AdityaR

Flavio, 

There was a problem with the solrconfig and schema files. 

One of the team members had deleted some entries in the solrconfig.xml and I
was picking the same solr configuration everytime, I got the latest version
of solr and carefully edited the solrconfig and schema files and it worked. 

We have the cloud up and running, testing is in progress and it looks good. 



Thanks for all your help. 

-Aditya



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Setup-tp4080182p4081654.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Measuring SOLR performance

2013-07-31 Thread Roman Chyla

Hi Dmitry,
probably mistake in the readme, try calling it with -q
/home/dmitry/projects/lab/solrjmeter/queries/demo/demo.queries

as for the base_url, i was testing it on solr4.0, where it tries contactin
/solr/admin/system - is it different for 4.3? I guess I should make it
configurable (it already is, the endpoint is set at the check_options())

thanks

roman


On Wed, Jul 31, 2013 at 10:01 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Ok, got the error fixed by modifying the base solr ulr in solrjmeter.py
 (added core name after /solr part).
 Next error is:

 WARNING: no test name(s) supplied nor found in:
 ['/home/dmitry/projects/lab/solrjmeter/demo/queries/demo.queries']

 It is a 'slow start with new tool' symptom I guess.. :)


 On Wed, Jul 31, 2013 at 4:39 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi Roman,

 What  version and config of SOLR does the tool expect?

 Tried to run, but got:

 **ERROR**
   File solrjmeter.py, line 1390, in module
 main(sys.argv)
   File solrjmeter.py, line 1296, in main
 check_prerequisities(options)
   File solrjmeter.py, line 351, in check_prerequisities
 error('Cannot contact: %s' % options.query_endpoint)
   File solrjmeter.py, line 66, in error
 traceback.print_stack()
 Cannot contact: http://localhost:8983/solr


 complains about URL, clicking which leads properly to the admin page...
 solr 4.3.1, 2 cores shard

 Dmitry


 On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla roman.ch...@gmail.comwrote:

 Hello,

 I have been wanting some tools for measuring performance of SOLR, similar
 to Mike McCandles' lucene benchmark.

 so yet another monitor was born, is described here:
 http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/

 I tested it on the problem of garbage collectors (see the blogs for
 details) and so far I can't conclude whether highly customized G1 is
 better
 than highly customized CMS, but I think interesting details can be seen
 there.

 Hope this helps someone, and of course, feel free to improve the tool and
 share!

 roman

RE: monitor jvm heap size for solrcloud

2013-07-31 Thread Joshi, Shital

Thanks for all answers. We decided to use VisualVM with multiple remote 
connections. 

-Original Message-
From: Utkarsh Sengar [mailto:utkarsh2...@gmail.com] 
Sent: Friday, July 26, 2013 6:19 PM
To: solr-user@lucene.apache.org
Subject: Re: monitor jvm heap size for solrcloud

We have been using newrelic (they have a free plan too) and gives all
needed info like: jvm heap usage in eden space, survivor space and old gen.
Garbage collection info, detailed info about the solr requests and its
response times, error rates etc.

I highly recommend using newrelic to monitor your solr cluster:
http://blog.newrelic.com/2010/05/11/got-apache-solr-search-server-use-rpm-to-monitor-troubleshoot-and-tune-solr-operations/

Thanks,
-Utkarsh

On Fri, Jul 26, 2013 at 2:38 PM, SolrLover bbar...@gmail.com wrote:

 I have used JMX with SOLR before..

 http://docs.lucidworks.com/display/solr/Using+JMX+with+Solr

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/monitor-jvm-heap-size-for-solrcloud-tp4080713p4080725.html
 Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Thanks,
-Utkarsh

Re: Does solr cloud support rename or swap function for collection?

2013-07-31 Thread thzinc

This is awesome news. I had been looking for the ability to do this with
SolrCloud since 4.0.0-ALPHA. We're on 4.1.0 right now, so this is a great
reason to plan for an upgrade.

Just to be clear, CREATEALIAS both creates and updates an alias, right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-solr-cloud-support-rename-or-swap-function-for-collection-tp4054193p4081660.html
Sent from the Solr - User mailing list archive at Nabble.com.

upgrade from 4.3 to 4.4

2013-07-31 Thread Joshi, Shital

We have SolrCloud (4.3.0) cluster (5 shards and 2 replicas) on 10 boxes. We 
have about 450 million documents. We're planning to upgrade to Solr 4.4.0. Do 
We need to re-index already indexed documents?

Thanks!

Re: Measuring SOLR performance

2013-07-31 Thread Roman Chyla

I'll try to run it with the new parameters and let you know how it goes.
I've rechecked details for the G1 (default) garbage collector run and I can
confirm that 2 out of 3 runs were showing high max response times, in some
cases even 10secs, but the customized G1 never - so definitely the
parameters had effect because the max time for the customized G1 never went
higher than 1.5secs (and that happend for 2 query classes only). Both the
cms-custom and G1-custom are similar, the G1 seems to have higher values in
the max fields, but that may be random. So, yes, now I am sure what to
think of default G1 as 'bad', and that these G1 parameters, even if they
don't seem G1 specific, have real effect.
Thanks,

roman


On Tue, Jul 30, 2013 at 11:01 PM, Shawn Heisey s...@elyograg.org wrote:

 On 7/30/2013 6:59 PM, Roman Chyla wrote:
  I have been wanting some tools for measuring performance of SOLR, similar
  to Mike McCandles' lucene benchmark.
 
  so yet another monitor was born, is described here:
  http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/
 
  I tested it on the problem of garbage collectors (see the blogs for
  details) and so far I can't conclude whether highly customized G1 is
 better
  than highly customized CMS, but I think interesting details can be seen
  there.
 
  Hope this helps someone, and of course, feel free to improve the tool and
  share!

 I have a CMS config that's even more tuned than before, and it has made
 things MUCH better.  This new config is inspired by more info that I got
 on IRC:

 http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

 The G1 customizations in your blog post don't look like they are really
 G1-specific - they may be useful with CMS as well.  This statement also
 applies to some of the CMS parameters, so I would use those with G1 as
 well for any testing.

 UseNUMA looks interesting for machines that actually are NUMA.  All the
 information that I can find says it is only for the throughput
 (parallel) collector, so it's probably not doing anything for G1.

 The pause parameters you've got for G1 are targets only.  It will *try*
 to stick within those parameters, but if a collection requires more than
 50 milliseconds or has to happen more often than once a second, the
 collector will ignore what you have told it.

 Thanks,
 Shawn

Re: SolrCloud Exception

2013-07-31 Thread Sinduja Rajendran

Thanks shawn for the reply. I would upgrade to solr 4.3 and check that.



On Wed, Jul 31, 2013 at 4:13 PM, Shawn Heisey s...@elyograg.org wrote:

 On 7/31/2013 4:27 AM, Sinduja Rajendran wrote:
  I am running solr 4.0 in a cloud. We have close to 100Mdocuments. The
 data
  is from a single DB table. I use dih.
  Our solrCloud has 3 zookeepers, one tomcat, 2 solr instances in same
  tomcat. We have 8 GB Ram.
 
  After indexing 14M, my indexing fails witht the below exception.
 
  solr org.apache.lucene.index.MergePolicy$MergeException:
  java.lang.OutOfMemoryError: GC overhead limit exceeded
 
  I tried increasing the GC value to the App server
 
   -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80
 
  But after giving the command, my indexing went drastically down. Its
  was indexing only 15k documents for 20 minutes. Earlier it was 300k
  for 20 min.

 First thing to mention is that Solr 4.0 was extremely buggy, upgrading
 would be advisable.  In the meantime:

 An OutOfMemoryError means that Solr needs more heap memory than the JVM
 is allowed to use.  The Solr Admin UI dashboard will tell you how much
 memory is allocated to your JVM, which you can increase with the -Xmx
 parameter.  Real RAM must be available from the system in order to
 increase the heap size.

 The options you have given just change the GC collector and tune one
 aspect of the new collector, they don't increase anything.  Here are
 some things that may help you:

 http://wiki.apache.org/solr/SolrPerformanceProblems
 http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

 After looking over that information and making adjustments, if you are
 still having trouble, we can go over your config and all your details to
 see what can be done.

 You said that both of your Solr instances are running in the same
 tomcat.  Just FYI - because you aren't running all functions on separate
 hardware, your setup is not fault tolerant.  Machine failures DO happen,
 no matter how much redundancy you build into that server.  If you are
 running all this on a redundant VM solution that has live migration of
 running VMs, then my statement isn't accurate.

 Thanks,
 Shawn

Re: upgrade from 4.3 to 4.4

A dot release should never require reindexing, unless... there is some 
change in a field type analyzer or update processor that your data depends 
on.


For example, some changes occurred in the ngram filter, so whether that 
would impact your data is up to you to decide.


See:
https://issues.apache.org/jira/browse/LUCENE-4955

There were a few other changes as well - you need to review each change 
yourself.


-- Jack Krupansky

-Original Message- 
From: Joshi, Shital

Sent: Wednesday, July 31, 2013 12:31 PM
To: 'solr-user@lucene.apache.org'
Subject: upgrade from 4.3 to 4.4

We have SolrCloud (4.3.0) cluster (5 shards and 2 replicas) on 10 boxes. We 
have about 450 million documents. We're planning to upgrade to Solr 4.4.0. 
Do We need to re-index already indexed documents?


Thanks!

RE: Measuring SOLR performance

2013-07-31 Thread Markus Jelsma

Did you also test indexing speed? With default G1GC settings we're seeing a 
slightly higher latency for queries than CMS. However, G1GC allows for much 
higher throughput than CMS when indexing. I haven't got the raw numbers here 
but it is roughly 45 minutes against 60 in favour of G1GC!

Load is obviously higher with G1GC.
 
 
-Original message-
 From:Roman Chyla roman.ch...@gmail.com
 Sent: Wednesday 31st July 2013 18:32
 To: solr-user@lucene.apache.org
 Subject: Re: Measuring SOLR performance
 
 I'll try to run it with the new parameters and let you know how it goes.
 I've rechecked details for the G1 (default) garbage collector run and I can
 confirm that 2 out of 3 runs were showing high max response times, in some
 cases even 10secs, but the customized G1 never - so definitely the
 parameters had effect because the max time for the customized G1 never went
 higher than 1.5secs (and that happend for 2 query classes only). Both the
 cms-custom and G1-custom are similar, the G1 seems to have higher values in
 the max fields, but that may be random. So, yes, now I am sure what to
 think of default G1 as 'bad', and that these G1 parameters, even if they
 don't seem G1 specific, have real effect.
 Thanks,
 
 roman
 
 
 On Tue, Jul 30, 2013 at 11:01 PM, Shawn Heisey s...@elyograg.org wrote:
 
  On 7/30/2013 6:59 PM, Roman Chyla wrote:
   I have been wanting some tools for measuring performance of SOLR, similar
   to Mike McCandles' lucene benchmark.
  
   so yet another monitor was born, is described here:
   http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/
  
   I tested it on the problem of garbage collectors (see the blogs for
   details) and so far I can't conclude whether highly customized G1 is
  better
   than highly customized CMS, but I think interesting details can be seen
   there.
  
   Hope this helps someone, and of course, feel free to improve the tool and
   share!
 
  I have a CMS config that's even more tuned than before, and it has made
  things MUCH better.  This new config is inspired by more info that I got
  on IRC:
 
  http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
 
  The G1 customizations in your blog post don't look like they are really
  G1-specific - they may be useful with CMS as well.  This statement also
  applies to some of the CMS parameters, so I would use those with G1 as
  well for any testing.
 
  UseNUMA looks interesting for machines that actually are NUMA.  All the
  information that I can find says it is only for the throughput
  (parallel) collector, so it's probably not doing anything for G1.
 
  The pause parameters you've got for G1 are targets only.  It will *try*
  to stick within those parameters, but if a collection requires more than
  50 milliseconds or has to happen more often than once a second, the
  collector will ignore what you have told it.
 
  Thanks,
  Shawn

Re: Solr PolyField

Ok, thanks. I will check it.

On Jul 31, 2013, at 5:08 PM, Jack Krupansky j...@basetechnology.com wrote:

 See:
 https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html
 
 I have more examples in my book.
 
 -- Jack Krupansky
 
 From: Luís Portela Afonso 
 Sent: Wednesday, July 31, 2013 11:41 AM
 To: solr-user@lucene.apache.org 
 Subject: Re: Solr PolyField
 
 Hum, ok. 
 
 It's possible to add to a field, static text? Text that i write on the 
 configuration and then append another field? I saw something like 
 CloneFieldProcessor but when i'm starting solr, it says that could not find 
 the class.
 I was trying to use processors to move one field to another.
 
 I saw this:
 processor class=solr.FieldCopyProcessorFactory
  str name=sourcelastname firstname/str
  str name=destfullname/str
  bool name=appendtrue/bool
  str name=append.delim, /str
 /processor
 But when i try to use it solr says that he cannot find the 
 solr.FieldCopyProcessorFactory. I'm using solr 4.4.0
 
 Thanks ;)
 
 On Jul 31, 2013, at 4:16 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:
 
 
  OK,
 
  Then I would suggest creating multiValued enclosure_type, etc. tags for
  searching, and then one string-typed field to store the JSON snippet you've
  been showing.
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062  | c: +1 917 477 7906
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  
 plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
  w: appinions.com http://www.appinions.com/
 
 
  On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso 
  meligalet...@gmail.com wrote:
 
 
As a single record? Hum, no.
 
So an Rss has /rss/channel/ and then lot of /rss/channel/item, right?
Each /rss/channel/item is a new document on Solr. I start with the solr
example rss, but i change that to has more fields, other fields and get the
feed url from a database.
 
So each /rss/channel/item is a document to the indexing, bue each
/rss/channel/item can have more than on enclosure tag.
 
Many thanks
 
On Jul 31, 2013, at 4:05 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:
 
 
  So you're trying to index a RSS feed as a single record, but you want to
 
be
 
  able to search for and retrieve individual entries from within the feed?
 
Is
 
  that the issue?
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062  | c: +1 917 477 7906
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
 

 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 
  w: appinions.com http://www.appinions.com/
 
 
  On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso 
  meligalet...@gmail.com wrote:
 
 
This fields can be multiValued.
I the rss standart there is not correct to do that, but some sources do
and i like to grab it all. Is there any way that make it possible?
 
Once again, Many thanks :)
 
On Jul 31, 2013, at 3:54 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:
 
 
  Luís,
 
  Is there a reason why splitting this up into enclosure_type,
 
enclosure_url,
 
  and enclosure_length would not work?
 
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062  | c: +1 917 477 7906
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
 
 
 

 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 
  w: appinions.com http://www.appinions.com/
 
 
  On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso 
  meligalet...@gmail.com wrote:
 
 
Hi,
 
I'm trying to index information of RSS Feeds.
 
So in a more detailed explanation:
 
The RSS feed has something like:
enclosure url=
 
http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3;
 
length=32642192 type=audio/mpeg/
 
*With my current configuration, this is working and i get a result
 
like
 
that:*
 
 
- enclosure:
[
   - audio/mpeg,
   - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;,
   - 37521428
   ],
 
 
*BUT,* this is not the result that i'm trying to

RE: Ingesting geo data into Solr very slow

2013-07-31 Thread Simonian, Marta M (US SSA)

Hi guys,

Here is the reply I got from the solr group. I'll change those settings. It's 
good to know that it doesn't matter if we use the bean vs solr doc.

-Marta 

-Original Message-
From: David Smiley (@MITRE.org) [mailto:dsmi...@mitre.org] 
Sent: Tuesday, July 30, 2013 9:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Ingesting geo data into Solr very slow

Hi Marta,

Presumably you are indexing polygons -- I suspect complex ones.  There isn't 
too much that you can do about this right now other than index them in 
parallel.  I see you are doing this in 2 threads; try 4, or maybe even 6. 
Also, ensure that maxDistErr is reflective of the smallest distance you need to 
distinguish between.  It may help a little but not much.  I can think of some 
internal code details that might be improved but that doesn't help you now.

There's some generic Solr things you can do to improve indexing performance too 
like increasing the indexing buffer size (100MB - 200MB) and the mergeFactor 
(10-20 albeit temporarily and/or issue optimize), both in solrconfig.xml.

Changing the servlet engine won't help. Calling server.addBean(item) isn't a 
problem either.

~ David


Simonian, Marta M (US SSA) wrote
 Hi,
 
 We are using Solr 4.4 to ingest geo data and it's really slow. When we 
 don't index the geo it takes seconds to ingest 100, 000 records but as 
 soon as we add it takes 2 hours.
 
 Also we found that when changing the distErrPct from 0.025 to 0.1, 
 1000 rows are ingested in 20 sec vs 2 min. But we can't change that 
 setting as we want our search to be as accurate as possible.
 
 About the environment we are running Solr on 6 CPUs and 8GB of memory.
 We've been monitoring the VMs and they seem to be ok.
 
 We are running on Tomcat but we might switch to Jetty to see if that 
 will increase the performance.
 
 We use ConcurrentUpdateSolrServer(httpSolrServer, 5000, 2);
 
 We are saving a bean rather than a solr document (server.addBean(item)).
 I'm not sure if that could make it slow as it's going to do some 
 conversion?
 
 Can you please let me know what are the best settings for Solr? Maybe 
 some changes in the solrconfig.xml or the schema.xml?
 What are the preferred environment settings and resources?
 
 Thank you!
 Marta





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Ingesting-geo-data-into-Solr-very-slow-tp4081484p4081527.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Ingesting geo data into Solr very slow

2013-07-31 Thread Simonian, Marta M (US SSA)

Does anybody know if Solr performs better on Jetty vs Tomcat?

-Original Message-
From: David Smiley (@MITRE.org) [mailto:dsmi...@mitre.org] 
Sent: Tuesday, July 30, 2013 9:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Ingesting geo data into Solr very slow

Hi Marta,

Presumably you are indexing polygons -- I suspect complex ones.  There isn't 
too much that you can do about this right now other than index them in 
parallel.  I see you are doing this in 2 threads; try 4, or maybe even 6. 
Also, ensure that maxDistErr is reflective of the smallest distance you need to 
distinguish between.  It may help a little but not much.  I can think of some 
internal code details that might be improved but that doesn't help you now.

There's some generic Solr things you can do to improve indexing performance too 
like increasing the indexing buffer size (100MB - 200MB) and the mergeFactor 
(10-20 albeit temporarily and/or issue optimize), both in solrconfig.xml.

Changing the servlet engine won't help. Calling server.addBean(item) isn't a 
problem either.

~ David

Simonian, Marta M (US SSA) wrote
 Hi,

 We are using Solr 4.4 to ingest geo data and it's really slow. When we 
 don't index the geo it takes seconds to ingest 100, 000 records but as 
 soon as we add it takes 2 hours.

 Also we found that when changing the distErrPct from 0.025 to 0.1, 
 1000 rows are ingested in 20 sec vs 2 min. But we can't change that 
 setting as we want our search to be as accurate as possible.

 About the environment we are running Solr on 6 CPUs and 8GB of memory.
 We've been monitoring the VMs and they seem to be ok.

 We are running on Tomcat but we might switch to Jetty to see if that 
 will increase the performance.

 We use ConcurrentUpdateSolrServer(httpSolrServer, 5000, 2);

 We are saving a bean rather than a solr document (server.addBean(item)).
 I'm not sure if that could make it slow as it's going to do some 
 conversion?

 Can you please let me know what are the best settings for Solr? Maybe 
 some changes in the solrconfig.xml or the schema.xml?
 What are the preferred environment settings and resources?

 Thank you!
 Marta

-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Ingesting-geo-data-into-Solr-very-slow-tp4081484p4081527.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Ingesting geo data into Solr very slow


On 7/31/2013 11:20 AM, Simonian, Marta M (US SSA) wrote:

Does anybody know if Solr performs better on Jetty vs Tomcat?


Jetty has less complexity than tomcat.  It is likely to use less memory. 
 If you went with default settings for both, jetty is likely to perform 
better, but the difference would probably be very small.


If you understand how to tune your servlet container, then there's no 
way to answer that question.  You should use whatever you are 
comfortable with.  A well-tuned tomcat server would probably perform 
better than the default example jetty - but you have to do that tuning.


The only concrete information I can give you is this:  Solr tests use 
jetty, so jetty is the only container that is fully tested with Solr. 
Bugs *have* been found with other containers, and they get fixed as fast 
as possible.


The other point worth reiterating: Unless you carefully tune your 
container, something this list can't really help you with, the container 
choice probably isn't going to affect performance much.


Thanks,
Shawn

Alternative searches

2013-07-31 Thread Mark

Can someone explain how one would go about providing alternative searches for a 
query… similar to Amazon.

For example say I search for Red Dump Truck

- 0 results for Red Dump Truck
- 500 results for  Red Truck
- 350 results for Dump Truck

Does this require multiple searches? 

Thanks

Re: Solr list all records but fq matching records first

I was going to say 10, but frequently people find that they need a really 
big boost.


Normally, a boost might be 1.5 or 2 or 5, or something like that.

A fractional boost, like 0.5, 0.25, 0.1, or even 0.01 can de-emphasize 
terms.


If you add debugQuery=true to your query request and look at the explain 
section, you can see all the scores and intermediate scores to get an idea 
how big a boost a document needs to make it move as desired.


-- Jack Krupansky

-Original Message- 
From: Thyagaraj

Sent: Wednesday, July 31, 2013 1:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr list all records but fq matching records first

Awesome Jack Krupansky-2!!!. It seems to work!.

What I didn't understand is *^100*. Could you give some explanation on ^100
please? if it could be any number other than 100?.


Thanks a lot!, I was working on this for past 3 days!.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-list-all-records-but-fq-matching-records-first-tp4081572p4081677.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr list all records but fq matching records first

2013-07-31 Thread Thyagaraj

Awesome Jack Krupansky-2!!!. It seems to work!. 

What I didn't understand is *^100*. Could you give some explanation on ^100
please? if it could be any number other than 100?.


Thanks a lot!, I was working on this for past 3 days!.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-list-all-records-but-fq-matching-records-first-tp4081572p4081677.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TrieField and FieldCache confusion


: Can I expect the FieldCache of Lucene to return the correct values when
: working
: with TrieField with the precisionStep higher than 0. If not, what did I get
: wrong?

Yes -- the code for building FieldCaches from Trie fields is smart enough 
to ensure that only the real original values are used to populate the 
Cache

(See for example: FieldCache.NUMERIC_UTILS_INT_PARSER and the classes 
linked to from it's javadocs...

https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/FieldCache.html#NUMERIC_UTILS_INT_PARSER
https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/util/NumericUtils.html
https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/document/IntField.html

(Solr's Trie fields are backed by the various numeric fields in lucene -- 
ie: solr:TrieIntField - lucene:IntField.  the Trie* prefix is used in 
solr because there already had classes named IntField, DoubleField, etc... 
when the Trie based impls where added to lucene)


-Hoss

Re: Improper shutdown of Solr in Jetty 9


: it's Windows 7. I'm starting Jetty with java -jar start.jar

Not sure if you are using cygwin, or if this is related but...

https://issues.apache.org/jira/browse/SOLR-3884
https://issues.apache.org/jira/browse/SOLR-3884?focusedCommentId=13462996page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13462996
https://issues.apache.org/jira/browse/SOLR-3884?focusedCommentId=13463332page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463332

http://cygwin.com/ml/cygwin/2012-07/msg00250.html
http://cygwin.com/ml/cygwin/2012-05/msg00482.html


-Hoss

Re: queryResultCache showing all zeros



: We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran 
: about 200 000 queries taken from our production environment and measured 
: the performance of the cloud over a collection of 14M documents with the 
: default Solr settings. We are now trying to tune the different caches 
: and when I look at each node of the cloud, all of them are showing no 
: activity (see below) regarding the queryResultCache... all other caches 
: are showing some activity. Any idea what could cause this?

Can you show us some examples of hte types of queries you are executing?

Do you have useFilterForSortedQuery in your solrconfig.xml ?



-Hoss

Re: FieldCollapsing issues in SolrCloud 4.4

2013-07-31 Thread Ali, Saqib

Hello Paul,

Can you please explain what you mean by:
To get the exact number of groups, you need to shard along your grouping
field

Thanks! :)


On Wed, Jul 31, 2013 at 3:08 AM, Paul Masurel paul.masu...@gmail.comwrote:

 Do you mean you get different results with group=true?
 numFound is supposed returns the number of ungrouped hits.

 To get the number of groups, you are expected to set
 set group.ngroups=true.
 Even then, the result will only give you an upperbound
 in a distributed environment.
 To get the exact number of groups, you need to shard along
 your grouping field.

 If you have many groups, you may also experience a huge performance
 hit, as the current implementation has been heaviy optimized for low
 number of groups (e.g. e-commerce categories).

 Paul



 On Wed, Jul 31, 2013 at 1:59 AM, Ali, Saqib docbook@gmail.com wrote:

  Hello all,
 
  Is anyone experiencing issues with the numFound when using group=true in
  SolrCloud 4.4?
 
  Sometimes the results are off for us.
 
  I will post more details shortly.
 
  Thanks.
 



 --
 __

  Masurel Paul
  e-mail: paul.masu...@gmail.com

Re: Sending shard requests to all replicas

2013-07-31 Thread Isaac Hebsh

Thanks to Ryan Ernst, my issue is duplicate of SOLR-4449.
I think that this proposal might be very useful (some supporting links are
attached there. worth reading..)


On Tue, Jul 30, 2013 at 11:49 PM, Isaac Hebsh isaac.he...@gmail.com wrote:

 Hi,
 I submitted a new JIRA for this:
 https://issues.apache.org/jira/browse/SOLR-5092

 A (very initial) patch is already attached. Reviews are very welcome.


 On Sun, Jul 28, 2013 at 4:50 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 You'd probably start in CloudSolrServer in SolrJ code,
 as far as I know that's where the request is sent out.

 I'd think that would be better than changing Solr itself
 since if you found that this was useful you wouldn't
 be patching your Solr release, just keeping your client
 up to date.

 Best
 Erick

 On Sat, Jul 27, 2013 at 7:28 PM, Isaac Hebsh isaac.he...@gmail.com
 wrote:
  Shawn, thank you for the tips.
  I know the significant cons of virtualization, but I don't want to move
  this thread into a virtualization pros/cons in the Solr(Cloud) case.
 
  I've just asked what is the minimal code change should be made, in
 order to
  examine whether this is a possible solution or not.. :)
 
 
  On Sun, Jul 28, 2013 at 1:06 AM, Shawn Heisey s...@elyograg.org
 wrote:
 
  On 7/27/2013 3:33 PM, Isaac Hebsh wrote:
   I have about 40 shards. repFactor=2.
   The cause of slower shards is very interesting, and this is the main
   approach we took.
   Note that in every query, it is another shard which is the slowest.
 In
  20%
   of the queries, the slowest shard takes about 4 times more than the
  average
   shard qtime.
   While continuing investigation, remember it might be the
 virtualization /
   storage-access / network / gc /..., so I thought that reducing the
 effect
   of the slow shards might be a good (temporary or permanent) solution.
 
  Virtualization is not the best approach for Solr.  Assuming you're
  dealing with your own hardware and not something based in the cloud
 like
  Amazon, you can get better results by running on bare metal and having
  multiple shards per host.
 
  Garbage collection is a very likely source of this problem.
 
  http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems
 
   I thought it should be an almost trivial code change (for proving the
   concept). Isn't it?
 
  I have no idea what you're saying/asking here.  Can you clarify?
 
  It seems to me that sending requests to all replicas would just
 increase
  the overall load on the cluster, with no real benefit.
 
  Thanks,
  Shawn

RE: queryResultCache showing all zeros

Looks like the problem might not be related to Solr but to a proprietary system 
we have on top of it. 
I made some queries with facets and the cache was updated. We are looking into 
this... I should not have assumed that the problem was coming from Solr ;)

I'll let you know if there is anything

From: Chris Hostetter
Sent: Wednesday, July 31, 2013 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: queryResultCache showing all zeros

: We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran
: about 200 000 queries taken from our production environment and measured
: the performance of the cloud over a collection of 14M documents with the
: default Solr settings. We are now trying to tune the different caches
: and when I look at each node of the cloud, all of them are showing no
: activity (see below) regarding the queryResultCache... all other caches
: are showing some activity. Any idea what could cause this?

Can you show us some examples of hte types of queries you are executing?

Do you have useFilterForSortedQuery in your solrconfig.xml ?

-Hoss

RE: Highlighting externally stored text

2013-07-31 Thread JohnRodey

Hey Bryan, Thanks for the response!  To make use of the FastVectorHighlighter
you need to enable termVectors, termPositions, and termOffsets correct? 
Which takes a considerable amount of space, but is good to know and I may
possibly pursue this solution as well.  Just starting to look at the code
now, do you remember how substantial the change was?

Are there any other options?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387p4081719.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Alternative searches

2013-07-31 Thread Petersen, Robert

Hi Mark

Yes, it is something we implemented also.  We just try various subsets of the 
search terms when there are zero results.  To increase performance for all 
these searches we return only the first three results and no facets so we can 
simply display the result counts for the various subsets of the original search 
terms.  We only do this if the first search had zero results and then a double 
metaphone search (which is how we handle misspelled terms) also returned 
nothing.  We also apply various heuristics to the alternative searches being 
performed like no one word searches if the original search had many words etc

Thanks
Robi

-Original Message-
From: Mark [mailto:static.void@gmail.com] 
Sent: Wednesday, July 31, 2013 10:35 AM
To: solr-user@lucene.apache.org
Subject: Alternative searches

Can someone explain how one would go about providing alternative searches for a 
query... similar to Amazon.

For example say I search for Red Dump Truck

- 0 results for Red Dump Truck
- 500 results for  Red Truck
- 350 results for Dump Truck

Does this require multiple searches? 

Thanks

RE: queryResultCache showing all zeros

Ok I might have found an Solr issue after I fixed a problem in our system.

This the kind of query we are making:

http://10.0.5.214:8201/solr/Current/select?fq=position_refreshed_date_id:[2747%20TO%203501]fq=position_soc_2011_8_code:41101100fq=country_id:1fq=position_job_type_id:4fq=position_education_level_id:8fq=position_salary_range_id:2fq=is_dirty:falsefq=is_staffing:falsefq=-position_soc_2011_2_code:99fq=-covering_source_id:(839%20OR%201145%20OR%2025%20OR%20802%20OR%20777%20OR%2085%20OR%20881%20OR%20775%20OR%201558%20OR%20743%20OR%20800%20OR%201580%20OR%201147%20OR%201690%20OR%20674%20OR%20894%20OR%20791)q=%20(title:photographer%20OR%20ad_description:photographer%20OR%20super_alias:photographer)%20AND%20(_val_:%22sum(product(75,div(5000,sum(50,sub(3500,position_refreshed_date_id,product(0.75,job_score),product(0.75,source_score))%22)facet=truefacet.mincount=1f.state_id.facet.limit=10facet.field=state_idfacet.field=position_salary_range_idfacet.field=position_job_type_idfacet.field=position_naics_6_codefacet.field=place_idfacet.field=position_education_level_idfacet.field=position_soc_2011_8_codef.position_salary_range_id.facet.limit=10f.position_job_type_id.facet.limit=10f.position_naics_6_code.facet.limit=10f.place_id.facet.limit=10f.position_education_level_id.facet.limit=10f.position_soc_2011_8_code.facet.limit=10rows=10start=0fl=job_id,position_id,super_alias_id,advertiser,super_alias,credited_source_id,position_first_seen_date_id,position_last_seen_date_id,%20position_posted_date_id,%20position_refreshed_date_id,%20position_job_type_id,%20position_function_id,position_green_code,title_id,semi_clean_title_id,clean_title_id,position_empl_count,place_id,%20state_id,county_id,msa_id,country_id,position_id,position_job_type_mva,%20ad_activity_status_id,%20position_score,%20ad_score,position_salary,position_salary_range_id,position_salary_source,position_naics_6_code,position_education_level_id,%20is_staffing,is_bulk,is_anonymous,is_third_party,is_dirty,ref_num,tags,lat,long,position_duns_number,url,advertiser_id,%20title,%20semi_clean_title,%20ad_description,%20position_description,%20ad_bls_salary,%20position_bls_salary,%20covering_source_id,%20content_model_id,position_soc_2011_8_code,position_noc_2006_4_idgroup.field=position_idgroup=truegroup.ngroups=truegroup.main=truesort=score%20desc

it's quite long but this request uses both faceting and grouping. If I remove 
the grouping then the cache is used. Is this a normal behavior or a bug?

Thanks

From: Jean-Sebastien Vachon
Sent: Wednesday, July 31, 2013 2:38 PM
To: solr-user@lucene.apache.org
Subject: RE: queryResultCache showing all zeros

Looks like the problem might not be related to Solr but to a proprietary system 
we have on top of it.
I made some queries with facets and the cache was updated. We are looking into 
this... I should not have assumed that the problem was coming from Solr ;)

I'll let you know if there is anything

From: Chris Hostetter
Sent: Wednesday, July 31, 2013 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: queryResultCache showing all zeros

: We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran
: about 200 000 queries taken from our production environment and measured
: the performance of the cloud over a collection of 14M documents with the
: default Solr settings. We are now trying to tune the different caches
: and when I look at each node of the cloud, all of them are showing no
: activity (see below) regarding the queryResultCache... all other caches
: are showing some activity. Any idea what could cause this?

Can you show us some examples of hte types of queries you are executing?

Do you have useFilterForSortedQuery in your solrconfig.xml ?



-Hoss

RE: queryResultCache showing all zeros

Also we do not have any useFilterForSortedQuery in our config. So we are 
relying on the default which I guess is false.

From: Jean-Sebastien Vachon
Sent: Wednesday, July 31, 2013 3:44 PM
To: solr-user@lucene.apache.org
Subject: RE: queryResultCache showing all zeros

Ok I might have found an Solr issue after I fixed a problem in our system.

This the kind of query we are making:

http://10.0.5.214:8201/solr/Current/select?fq=position_refreshed_date_id:[2747%20TO%203501]fq=position_soc_2011_8_code:41101100fq=country_id:1fq=position_job_type_id:4fq=position_education_level_id:8fq=position_salary_range_id:2fq=is_dirty:falsefq=is_staffing:falsefq=-position_soc_2011_2_code:99fq=-covering_source_id:(839%20OR%201145%20OR%2025%20OR%20802%20OR%20777%20OR%2085%20OR%20881%20OR%20775%20OR%201558%20OR%20743%20OR%20800%20OR%201580%20OR%201147%20OR%201690%20OR%20674%20OR%20894%20OR%20791)q=%20(title:photographer%20OR%20ad_description:photographer%20OR%20super_alias:photographer)%20AND%20(_val_:%22sum(product(75,div(5000,sum(50,sub(3500,position_refreshed_date_id,product(0.75,job_score),product(0.75,source_score))%22)facet=truefacet.mincount=1f.state_id.facet.limit=10facet.field=state_idfacet.field=position_salary_range_idfacet.field=position_job_type_idfacet.field=position_naics_6_codefacet.field=place_idfacet.field=position_education_level_idfacet.field=position_soc_2011_8_codef.position_salary_range_id.facet.limit=10f.position_job_type_id.facet.limit=10f.position_naics_6_code.facet.limit=10f.place_id.facet.limit=10f.position_education_level_id.facet.limit=10f.position_soc_2011_8_code.facet.limit=10rows=10start=0fl=job_id,position_id,super_alias_id,advertiser,super_alias,credited_source_id,position_first_seen_date_id,position_last_seen_date_id,%20position_posted_date_id,%20position_refreshed_date_id,%20position_job_type_id,%20position_function_id,position_green_code,title_id,semi_clean_title_id,clean_title_id,position_empl_count,place_id,%20state_id,county_id,msa_id,country_id,position_id,position_job_type_mva,%20ad_activity_status_id,%20position_score,%20ad_score,position_salary,position_salary_range_id,position_salary_source,position_naics_6_code,position_education_level_id,%20is_staffing,is_bulk,is_anonymous,is_third_party,is_dirty,ref_num,tags,lat,long,position_duns_number,url,advertiser_id,%20title,%20semi_clean_title,%20ad_description,%20position_description,%20ad_bls_salary,%20position_bls_salary,%20covering_source_id,%20content_model_id,position_soc_2011_8_code,position_noc_2006_4_idgroup.field=position_idgroup=truegroup.ngroups=truegroup.main=truesort=score%20desc

it's quite long but this request uses both faceting and grouping. If I remove 
the grouping then the cache is used. Is this a normal behavior or a bug?

Thanks

From: Jean-Sebastien Vachon
Sent: Wednesday, July 31, 2013 2:38 PM
To: solr-user@lucene.apache.org
Subject: RE: queryResultCache showing all zeros

Looks like the problem might not be related to Solr but to a proprietary system 
we have on top of it.
I made some queries with facets and the cache was updated. We are looking into 
this... I should not have assumed that the problem was coming from Solr ;)

I'll let you know if there is anything

From: Chris Hostetter
Sent: Wednesday, July 31, 2013 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: queryResultCache showing all zeros

: We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran
: about 200 000 queries taken from our production environment and measured
: the performance of the cloud over a collection of 14M documents with the
: default Solr settings. We are now trying to tune the different caches
: and when I look at each node of the cloud, all of them are showing no
: activity (see below) regarding the queryResultCache... all other caches
: are showing some activity. Any idea what could cause this?

Can you show us some examples of hte types of queries you are executing?

Do you have useFilterForSortedQuery in your solrconfig.xml ?

-Hoss

RE: queryResultCache showing all zeros