using edismax without velocity

2013-04-06 Thread amit
I am using solr3.6 and trying to use the edismax handler.
The config has a /browse requestHandler, but it doesn't work because of
missing class definition VelocityResponseWriter error.
queryResponseWriter name=velocity class=solr.VelocityResponseWriter
startup=lazy/ 
I have copied the jars to solr/lib following the steps here, but no luck
http://wiki.apache.org/solr/VelocityResponseWriter#Using_the_VelocityResponseWriter_in_Solr_Core

I just want  to search on multiple fields with different boost. *Can I use
edismax with the /select requestHandler?* If I write a query like below,
does it search in both the fields name and description?
Does the query below solves my purpose?
http://localhost:8080/solr/select/?q=(coldfusion^2
cache^1)*defType=edismaxqf=name^2 description^1*fq=author:[* TO *] AND
-author:chinmoypstart=0rows=10fl=author,score, id





--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-edismax-without-velocity-tp4054190.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr restart is taking more than 1 hour

2013-04-06 Thread Shawn Heisey
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-restart-is-taking-more-than-1-hour-tp4054165p4054189.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Nabble says that the original message hasn't made it to the mailing list
yet, which explains why I only saw the reply come in.  Good thing nabble
sent along the URL so I could see the original question.

This is almost guaranteed to be caused by a huge updateLog - the tlog
directory added in version 4.0.  On Solr restart, all of the tlog data
that exists is replayed to ensure the index is fully up to date.  When
the tlog is huge, it takes a very long time.

A huge tlog is normally caused by one of two things: 1) only using soft
commits and never hard committing. 2) doing a very large import with the
dataimport handler and not committing until the end.

The solution is to do hard commits on intervals that are short (but not
super short) with openSearcher set to false.  A hard commit starts a new
tlog and flush index data to disk.  With openSearcher set to false, the
hard commit will not change document visibility - deleted documents are
still searchable, and new documents are not yet searchable.  You can
still make new content searchable with a commit (hard or soft) that has
openSearcher set to true.

By starting a new tlog on a regular basis, it will never get very big.
Solr trims old tlogs, only keeping a few of them around.  If you have
only a few tlogs and they are small, it won't take very long to replay
them on startup.

The easiest way to do this hard commit is to have Solr do it for you
automatically with the autoCommit feature.

updateHandler class=solr.DirectUpdateHandler2
  autoCommit
maxDocs25000/maxDocs
maxTime30/maxTime
openSearcherfalse/openSearcher
  /autoCommit
  updateLog /
/updateHandler

I've typed this often enough that I really need to just put it on the
wiki - when the question comes up, link the article. :)

Thanks,
Shawn



Does solr cloud support rename or swap function for collection?

2013-04-06 Thread bradhill99
Hi,
We are using solr 4.1 and we create a collection name my_data with 20
shards. 
Our index files are generated by using lucence api every one hour and load
into solr cloud using core admin API.
My problem is, for data generated every one hour, I need to create a new
collection name like my_data_001 and to load the index files under that
collection name. And my_data will be useless and my_data_001 is the latest
data. In order to keep query url unchanged, I need to rename my_data_001 to
my_data, but I can't see any collection API to do the rename or swap like
core admin supports.
How can I do this?

thanks,

Brad



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-solr-cloud-support-rename-or-swap-function-for-collection-tp4054193.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to skip test while building

2013-04-06 Thread parnab kumar
Hi All,

  I am new to Solr . I am using solr 3.4 . I want to build without
building  lucene tests files in lucene and skip the tests to be fired . Can
anyone please help where to make the necessary changes .

Thanks,
Pom


Re: using edismax without velocity

2013-04-06 Thread DC tech
Definitely in 4.x release. Did you try it and found a problem?



Re: using edismax without velocity

2013-04-06 Thread Jack Krupansky

Yes, qf will search in both fields and boost according.

If the only reason to try velocity or even /browse was because you wanted 
edismax, don't bother.


You can just add defType to t.he /select request handler in solrconfig, so 
that you don't need to add it to every request. Same for qf, if it has a 
common value.


And you can even copy /select and create one or more new request handlers 
with new paths, like /my-select, if you have more than one common 
combination of parameter settings that you want to avoid setting on every 
incoming query request.


-- Jack Krupansky

-Original Message- 
From: amit

Sent: Saturday, April 06, 2013 3:15 AM
To: solr-user@lucene.apache.org
Subject: using edismax without velocity

I am using solr3.6 and trying to use the edismax handler.
The config has a /browse requestHandler, but it doesn't work because of
missing class definition VelocityResponseWriter error.
queryResponseWriter name=velocity class=solr.VelocityResponseWriter
startup=lazy/
I have copied the jars to solr/lib following the steps here, but no luck
http://wiki.apache.org/solr/VelocityResponseWriter#Using_the_VelocityResponseWriter_in_Solr_Core

I just want  to search on multiple fields with different boost. *Can I use
edismax with the /select requestHandler?* If I write a query like below,
does it search in both the fields name and description?
Does the query below solves my purpose?
http://localhost:8080/solr/select/?q=(coldfusion^2
cache^1)*defType=edismaxqf=name^2 description^1*fq=author:[* TO *] AND
-author:chinmoypstart=0rows=10fl=author,score, id





--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-edismax-without-velocity-tp4054190.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Boost parameter with query function - how to pass in complex params?

2013-04-06 Thread dc tech
See example below
1. Search for SUVs and boost   Honda models
q=suvboost=query({! v='honda'},1)

2. Search for SUVs and boost   Honda OR  toyota model

a) Using OR in the query does NOT work
   q=suvboost=query({! v='honda or toyota'},1)

b) Using two query functions and summing the boosts DOES work
Works:   q=suvboost=sum(query({!v='honda'},1),query({!v='toyota'},1))

Any thoughts?


Use BM25Similarity for title field and default for others

2013-04-06 Thread kchellappa
We want the effect of the field length to have a lesser influence on score
for the title field (we don't want to completely disable it) -- so we get
the following behavior

Docs with more hits in the title rank higher
Docs with shorter titles rank higher if the hits are equal.

The DefaultSimilarity wasn't giving us this always (shorter titles were
preferred over longer titles with more hits.

Note -- we use edismax and search across title and other fields (like body)

Inorder to solve this we use BM25Similarity with a small value for b for the
title field.  We ended up using the SchemaSimilarityFactory for the global
similarity inorder to use the BM25Simiarlity for the title field.   This
gave us the results we are looking for with respect to the title field.


We also have keyword, tag and other metadata fields and we want them to be
mostly treated as filters and not influence the score at all.   Because of
the use of the SchemaSimilarityFactory, even though we get the
DefaultSimilarity for non title fields, it is not the same as
DefaultSimilarityFactory and so we have situations where the metadata fields
dominate the score (because PerFieldSimilarityWrapper uses queryNorm of 1.0) 

We are thinking that we have the following options to fix this issue

a)   Use BM25Similarity for all fields and adjust the k1, b values as
appropriate
b)   Send the metadata field clauses as part of  fq instead of q (but we
might have lot of dynamically generated clauses and not sure if fq is the
best suited for these as we don't want them to be cached as they could vary
from request to request)
   c)   Associate a boost of zero for the metadata fields in the query
   d)   Extend the SchemaSimilarityFactory and write custom code (at this
point, I am not sure what the custom class should do)


Are these correct?  Do we have any other options. Any advice on what is a
better option.
I appreciate any inputs on this.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-BM25Similarity-for-title-field-and-default-for-others-tp4054159.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr restart is taking more than 1 hour

2013-04-06 Thread gpssolr2020
Hi ,

We have 2 cores  one shard with Solr 4.1. After some configuration changes
when try to reload the core/restart the solr instance it is taking more than
one hour.In log it says opening  a searcher.(maxwarmingsearcher is 2). Can
any one help us on this to resolve??


Thanks.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-restart-is-taking-more-than-1-hour-tp4054165.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to generate multiple tokens on same position through TokenFilter

2013-04-06 Thread Abhishek Pratap Singh
Hi All,

Objective: I want to create a filter to generate multiple tokens (mentioned
below) of input stream and I want to put all generated tokens at same
position i.e. 1.

Although there is already a tokenizer (PathHierarchyTokenizerFactory) for
similar purpose but I also want my tokens to be stemmed so to achieve my
objective I created a filter, please look at the source code below (I am
not an Java expert, so code may not be optimized):


// File: ExtendedNameFilter.java
// Purpose: To combine multiple tokens such that apache solr foundation
generates tokens apachsolrfoundat, solrfoundat, foundat

package org.apache.lucene.analysis;

import java.io.IOException;
import java.util.LinkedList;
import java.util.ArrayList;
import java.util.Iterator;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.util.CharacterUtils;
import org.apache.lucene.util.Version;
import
org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;

public final class ExtendedNameFilter extends TokenFilter {
  private final CharTermAttribute termAtt =
addAttribute(CharTermAttribute.class);
  private PositionIncrementAttribute posIncAttr;
  private OffsetAttribute setOffsetAttr;
  private final int extendedWordCount;

  public ExtendedNameFilter(Version matchVersion, TokenStream in, int
extendedWordCount) {
super(in);
CharacterUtils.getInstance(matchVersion);
this.extendedWordCount = extendedWordCount;
this.posIncAttr = addAttribute(PositionIncrementAttribute.class);
this.setOffsetAttr = addAttribute(OffsetAttribute.class);
  }

  LinkedListString list = new LinkedListString();
  ArrayListInteger startOffsetList = new ArrayListInteger();
  int endOffset = 0;
  int count = 0;

  @Override
  public final boolean incrementToken() throws IOException {
  IteratorString iterator;
  int len = 0;

  while(input.incrementToken()) {
  list.add(termAtt.toString());
  startOffsetList.add(setOffsetAttr.startOffset());
  endOffset = setOffsetAttr.endOffset();
  }

  iterator = list.iterator();
  len = list.size();

  if (len  0  (extendedWordCount  0 || count  extendedWordCount)) {
  generateToken(iterator);
  return true;
  }
  else {
  return false;
  }
  }

  public void generateToken(IteratorString iterator) {
  termAtt.setEmpty();
  while (iterator.hasNext()){
  termAtt.append((CharSequence) iterator.next());
  }
  list.removeFirst();
  if(count == 0) {
  posIncAttr.setPositionIncrement(1);
  }
  else {
  posIncAttr.setPositionIncrement(0);
  }

  setOffsetAttr.setOffset(startOffsetList.get(count),endOffset);
  count++;
  }
}


// Code Ends



On analysis page of solr it worked fine, I've shared screenshot of analysis
page on google, anyone can see this by click on below link
https://docs.google.com/file/d/0BxNUkIJt2ma3TUN0YUF1dW1Pc2s/edit?usp=sharinghttps://docs.google.com/file/d/0BxNUkIJt2ma3SEE2SDBLTkpETE0/edit?usp=sharing

but while indexing documents Solr gives following exception:

Apr 6, 2013 12:05:45 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.IllegalArgumentException: first position increment must
be  0 (got 0)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:125)
at
org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:254)
at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:256)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:206)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:477)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at

Re: Please add me: FuadEfendi

2013-04-06 Thread Erick Erickson
Yeah, unfortunately we had to lock it down because of spam, but we're
(well Steve seems to be on it faster than I am) are adding people back
as fast as we get requests...

On Fri, Apr 5, 2013 at 3:34 PM, Fuad Efendi fuad.efe...@tokenizer.ca wrote:
 Hi,

 Few months ago I was able to modify Wiki; I can't do it now, probably
 because http://wiki.apache.org/solr/ContributorsGroup

 Please add me: FuadEfendi


 Thanks!


 --
 Fuad Efendi, PhD, CEO
 C: (416)993-2060
 F: (416)800-6479
 Tokenizer Inc., Canada
 http://www.tokenizer.ca






Re: Boost parameter with query function - how to pass in complex params?

2013-04-06 Thread Yonik Seeley
On Sat, Apr 6, 2013 at 9:42 AM, dc tech dctech1...@gmail.com wrote:
 See example below
 1. Search for SUVs and boost   Honda models
 q=suvboost=query({! v='honda'},1)

 2. Search for SUVs and boost   Honda OR  toyota model

 a) Using OR in the query does NOT work
q=suvboost=query({! v='honda or toyota'},1)

The or needs to be uppercase OR.

It might also be easier to compose and read like this:
q=suv
boost=query($boostQ)
boostQ=honda OR toyota

OF course something simpler like this might also serve your primary goal:
q=+suv (honda OR toyota)^10


-Yonik
http://lucidworks.com


Re: how to skip test while building

2013-04-06 Thread Erick Erickson
Don't know a good way to skip compiling the tests, but there isn't
any harm in compiling them...

changing to the solr directory and just issuing
ant example dist builds pretty much everything. You don't execute
tests unless you specify ant test.

ant -p shows you all the targets. Note that you have different
targets depending on whether you're executing it in solr_home or
solr_home/solr or solr_home/lucene.

Since you mention Solr, you probably want to work in solr_home/solr to start.

Best
Erick

On Sat, Apr 6, 2013 at 5:36 AM, parnab kumar parnab.2...@gmail.com wrote:
 Hi All,

   I am new to Solr . I am using solr 3.4 . I want to build without
 building  lucene tests files in lucene and skip the tests to be fired . Can
 anyone please help where to make the necessary changes .

 Thanks,
 Pom


Re: using edismax without velocity

2013-04-06 Thread Erick Erickson
In fact, just remove or comment these lines from the /browse handler
and you won't be using velocity, it might make a good place to start

  !-- VelocityResponseWriter settings --
   str name=wtvelocity/str
   str name=v.templatebrowse/str
   str name=v.layoutlayout/str
   str name=titleSolritas/str

Best
Erick

On Sat, Apr 6, 2013 at 6:55 AM, Jack Krupansky j...@basetechnology.com wrote:
 Yes, qf will search in both fields and boost according.

 If the only reason to try velocity or even /browse was because you wanted
 edismax, don't bother.

 You can just add defType to t.he /select request handler in solrconfig, so
 that you don't need to add it to every request. Same for qf, if it has a
 common value.

 And you can even copy /select and create one or more new request handlers
 with new paths, like /my-select, if you have more than one common
 combination of parameter settings that you want to avoid setting on every
 incoming query request.

 -- Jack Krupansky

 -Original Message- From: amit
 Sent: Saturday, April 06, 2013 3:15 AM
 To: solr-user@lucene.apache.org
 Subject: using edismax without velocity


 I am using solr3.6 and trying to use the edismax handler.
 The config has a /browse requestHandler, but it doesn't work because of
 missing class definition VelocityResponseWriter error.
 queryResponseWriter name=velocity class=solr.VelocityResponseWriter
 startup=lazy/
 I have copied the jars to solr/lib following the steps here, but no luck
 http://wiki.apache.org/solr/VelocityResponseWriter#Using_the_VelocityResponseWriter_in_Solr_Core

 I just want  to search on multiple fields with different boost. *Can I use
 edismax with the /select requestHandler?* If I write a query like below,
 does it search in both the fields name and description?
 Does the query below solves my purpose?
 http://localhost:8080/solr/select/?q=(coldfusion^2
 cache^1)*defType=edismaxqf=name^2 description^1*fq=author:[* TO *] AND
 -author:chinmoypstart=0rows=10fl=author,score, id





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/using-edismax-without-velocity-tp4054190.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need Help for schema definition

2013-04-06 Thread contact_pub...@mail-impact.com

Hello,

Is somebody kind enough to help me, at least by giving some direction 
for my research.


Regards

Le 05/04/2013 15:59, contact_pub...@mail-impact.com a écrit :

Hi all,

well i'm totally newbies on solr, and I need some help.

Ok raw definition of my needs :

I have a product database, with ordinary fields to describe a product. 
Name, reference, description, large description, product 
specifications, categories etc...


The needs :

1 - Being able to search thought product name, description, 
specification, reference
2 - Being able to find quickly all product from a category. For now it 
gave me more result.
3 - being able to find in a result set all the facets corresponding to 
the product specification (ex : number of products in wood, number of 
product having a diameter of 20cm or in a range). I look for a 
automatic process tell me the 5  most present specification in the 
result set and the number of product for each.


4 - last but not least : I have a particular type of product (spare 
parts) for them I need to be able to :

- find the by brand
- find the by name
- find the by reference
- compatibe model : as compatible model in the description 
field and need to be treated with regular expression to make a list of 
the different compatible model (org. text ex : spare part for *Pompe 
HG large model, model HGS v5)

*

I use*d* databaseimporthandler, to retreive the data, and it seem to 
be good for the first range of product, however I need to ajust the 
tokeniser and filter because it's to strict for now.


For the second set of data, I create a second entity in the 
data-import-config.xml adding the data in the same fields, but it 
doesn't feet my needs as the results are mixed and I can't select a 
specific entity to search into.


Thanks in advance for your help

David








Sharing index amongst multiple nodes

2013-04-06 Thread Daire Mac Mathúna
Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple
SOLR war files, sharing the same index (i.e. sharing the same solr_home)
where only one SOLR instance is used for writing and the others for reading?

Is this possible?

Is it beneficial - is it more performant than having just one solr instance?

How does it affect auto-commits i.e. how would the read nodes know the
index has been changed and re-populate cache etc.?

Sole 3.6.1

Thanks.


Re: Need Help for schema definition

2013-04-06 Thread Gora Mohanty
On 6 April 2013 20:58, contact_pub...@mail-impact.com
contact_pub...@mail-impact.com wrote:
 Hello,

 Is somebody kind enough to help me, at least by giving some direction for my
 research.

Your questions are too broad, and lack sufficient detail for someone
to be able to help you without asking more questions about each area.
It would help if you provided more details, e.g., what are the relationships
between various entities, and what the various fields mean. Ideally, you
would tell us what you have tried, and what is not working for you.
Please provide details about the schema, and what queries you are
making, and what the expected results should be.

On the face of it 1-3 should be straightforward with Solr, but I am
unable to make sense out of 4.

Regards,
Gora


Re: Need Help for schema definition

2013-04-06 Thread contact_pub...@mail-impact.com

thanks for your reply.

So on point

1 - I have been able to enter the data in the database and query them 
correctly. For now the result remain to strict.

2 - ok but not enough strict.


3 - here is my set of data. From an sgbd I have the folowing scema :
Feature name = Diameter
Feature value = 20
feature name = color
feautre value = blue
etc...

There is numbers of feature name with different feature values.

While importing the data in Solr, I made a multivalues field features 
with feature name : feature value ( I suppose this is not the right 
manner to proceed).


Feature name are of 200 and change/add often.

And I wish to be able to request the 5 most popular feature name with 
all the different features values and for each feature values the number 
of match (facets)


request = cat:swimming pool  facets=true fq=feature??
expected result = 40 product found
list of product
   Facets = - Width
- height
- with liner
- color

   width = 6m (10)
9m (15)
12m(24)
50m(1)
height = .

where the number between parenthesis is the number of product.

regards









Le 06/04/2013 18:26, Gora Mohanty a écrit :

On 6 April 2013 20:58, contact_pub...@mail-impact.com
contact_pub...@mail-impact.com wrote:

Hello,

Is somebody kind enough to help me, at least by giving some direction for my
research.

Your questions are too broad, and lack sufficient detail for someone
to be able to help you without asking more questions about each area.
It would help if you provided more details, e.g., what are the relationships
between various entities, and what the various fields mean. Ideally, you
would tell us what you have tried, and what is not working for you.
Please provide details about the schema, and what queries you are
making, and what the expected results should be.

On the face of it 1-3 should be straightforward with Solr, but I am
unable to make sense out of 4.

Regards,
Gora





Re: Solr metrics in Codahale metrics and Graphite?

2013-04-06 Thread Walter Underwood
Wow, that really doesn't help at all, since these seem to only be reported in 
the stats page. 

I don't need another non-standard app-specific set of metrics, especially one 
that needs polling. I need metrics delivered to the common system that we use 
for all our servers.

This is also why SPM is not useful for us, sorry Otis.

Also, there is no time period on these stats. How do you graph the 95th 
percentile? I know there was a lot of work on these, but they seem really 
useless to me. I'm picky about metrics, working at Netflix does that to you.

wunder

On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:

 In the Jira, but not in the docs. 
 
 It would be nice to have VM stats like GC, too, so we can have common 
 monitoring and alerting on all our services.
 
 wunder
 
 On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
 
 It's there! :)
 http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue
 
 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/
 
 On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood wun...@wunderwood.org 
 wrote:
 That sounds great. I'll check out the bug, I didn't see anything in the 
 docs about this. And if I can't find it with a search engine, it probably 
 isn't there.  --wunder
 
 On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote:
 
 On 3/29/2013 12:07 PM, Walter Underwood wrote:
 What are folks using for this?
 
 I don't know that this really answers your question, but Solr 4.1 and
 later includes a big chunk of codahale metrics internally for request
 handler statistics - see SOLR-1972.  First we tried including the jar
 and using the API, but that created thread leak problems, so the source
 code was added.
 
 Thanks,
 Shawn






Re: Sharing index amongst multiple nodes

2013-04-06 Thread Amit Nithian
I don't understand why this would be more performant.. seems like it'd be
more memory and resource intensive as you'd have multiple class-loaders and
multiple cache spaces for no good reason. Just have a single core with
sufficiently large caches to handle your response needs.

If you want to load balance reads consider having multiple physical nodes
with a master/slaves or SolrCloud.


On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.comwrote:

 Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple
 SOLR war files, sharing the same index (i.e. sharing the same solr_home)
 where only one SOLR instance is used for writing and the others for
 reading?

 Is this possible?

 Is it beneficial - is it more performant than having just one solr
 instance?

 How does it affect auto-commits i.e. how would the read nodes know the
 index has been changed and re-populate cache etc.?

 Sole 3.6.1

 Thanks.



Re: how to skip test while building

2013-04-06 Thread Amit Nithian
If you generate the maven pom files you can do this I think by doing mvn
whtaever here -DskipTests=true.


On Sat, Apr 6, 2013 at 7:25 AM, Erick Erickson erickerick...@gmail.comwrote:

 Don't know a good way to skip compiling the tests, but there isn't
 any harm in compiling them...

 changing to the solr directory and just issuing
 ant example dist builds pretty much everything. You don't execute
 tests unless you specify ant test.

 ant -p shows you all the targets. Note that you have different
 targets depending on whether you're executing it in solr_home or
 solr_home/solr or solr_home/lucene.

 Since you mention Solr, you probably want to work in solr_home/solr to
 start.

 Best
 Erick

 On Sat, Apr 6, 2013 at 5:36 AM, parnab kumar parnab.2...@gmail.com
 wrote:
  Hi All,
 
I am new to Solr . I am using solr 3.4 . I want to build without
  building  lucene tests files in lucene and skip the tests to be fired .
 Can
  anyone please help where to make the necessary changes .
 
  Thanks,
  Pom



Empty term vector component result with Solr 4.2

2013-04-06 Thread Yakov Bezrukov
Hi all,

I'm doing a test a migration from Solr 3.6.2 to Solr 4.2 and cannot make work 
the term vector component. Was not able to find changes in the TVC 
configuration in docs, so used approach, same as on my 3.6 server, where it 
works fine.
 
Relevant field in schema.xml is configured like this:

field indexed=true name=content stored=true termOffsets=true 
termPositions=true termVectors=true type=text_general_all/

In the solrconfig.xml I have (I tried also a configuration from 
example/solrconfig.xml, bundled with Solr 4.2 but result is the same):

searchComponent name=tvComponent 
class=org.apache.solr.handler.component.TermVectorComponent/
requestHandler name=tvrh 
class=org.apache.solr.handler.component.SearchHandler
lst name=defaults
bool name=tvtrue/bool
/lst
arr name=last-component
strtvComponent/str
/arr
/requestHandler

But, when I perform request to a server like this:

http://test.farm:8080/solr/TestCorpus/select/?q=content%3A*start=0rows=1indent=onqt=tvrhtv=truetv.tf=truetv.df=truetv.positionstv.offsets=true

I'm getting result which does not contain any TVC fields:

response
lst name=responseHeader
int name=status0/int
int name=QTime49/int
/lst
result name=response numFound=3877 start=0
doc.../doc
/result
/response

On the old 3.6 server with same request I'm getting the:

response
lst name=responseHeader
int name=status0/int
int name=QTime18/int
/lst
result name=response numFound=80698 start=0.../result
lst name=termVectors.../lst
/response

Could you please help me to find out what is wrong.

Regards,
Yakov

Re: Sharing index amongst multiple nodes

2013-04-06 Thread Furkan KAMACI
Hi Daire Mac Mathúna;

If there is a way copying one Solr's indexes into another Solr instance,
this may also solve the problem. Somebody generates indexes and some of
other instances could get a copy of them. At synchronizing process you may
eliminate some of indexes at reader instance. So you can filter something
to become unsearchable. *This may not be efficient and good thing and maybe
solved with built-in functionality somehow.* However I think somebody may
need that mechanism.


2013/4/6 Amit Nithian anith...@gmail.com

 I don't understand why this would be more performant.. seems like it'd be
 more memory and resource intensive as you'd have multiple class-loaders and
 multiple cache spaces for no good reason. Just have a single core with
 sufficiently large caches to handle your response needs.

 If you want to load balance reads consider having multiple physical nodes
 with a master/slaves or SolrCloud.


 On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com
 wrote:

  Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple
  SOLR war files, sharing the same index (i.e. sharing the same solr_home)
  where only one SOLR instance is used for writing and the others for
  reading?
 
  Is this possible?
 
  Is it beneficial - is it more performant than having just one solr
  instance?
 
  How does it affect auto-commits i.e. how would the read nodes know the
  index has been changed and re-populate cache etc.?
 
  Sole 3.6.1
 
  Thanks.
 



Re: Sharing index amongst multiple nodes

2013-04-06 Thread Walter Underwood
This is precisely how Solr replication works. It copies the indexes then does a 
commit.

wunder

On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote:

 Hi Daire Mac Mathúna;
 
 If there is a way copying one Solr's indexes into another Solr instance,
 this may also solve the problem. Somebody generates indexes and some of
 other instances could get a copy of them. At synchronizing process you may
 eliminate some of indexes at reader instance. So you can filter something
 to become unsearchable. *This may not be efficient and good thing and maybe
 solved with built-in functionality somehow.* However I think somebody may
 need that mechanism.
 
 
 2013/4/6 Amit Nithian anith...@gmail.com
 
 I don't understand why this would be more performant.. seems like it'd be
 more memory and resource intensive as you'd have multiple class-loaders and
 multiple cache spaces for no good reason. Just have a single core with
 sufficiently large caches to handle your response needs.
 
 If you want to load balance reads consider having multiple physical nodes
 with a master/slaves or SolrCloud.
 
 
 On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com
 wrote:
 
 Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple
 SOLR war files, sharing the same index (i.e. sharing the same solr_home)
 where only one SOLR instance is used for writing and the others for
 reading?
 
 Is this possible?
 
 Is it beneficial - is it more performant than having just one solr
 instance?
 
 How does it affect auto-commits i.e. how would the read nodes know the
 index has been changed and re-populate cache etc.?
 
 Sole 3.6.1
 
 Thanks.
 
 

--
Walter Underwood
wun...@wunderwood.org





Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-06 Thread Furkan KAMACI
Hi;

First of all should mention that I am new to Solr and making a research
about it. What I am trying to do that I will crawl some websites with Nutch
and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 )

I wonder about something. I have a cloud of machines that crawls websites
and stores that documents. Then I send that documents into SolrCloud. Solr
indexes that documents and generates indexes and save them. I know that
from Information Retrieval theory: it *may* not be efficient to store
indexes at a NoSQL database (they are something like linked lists and if
you store them in such kind of database you *may* have a sparse
representation -by the way there may be some solutions for it. If you
explain them you are welcome.)

However Solr stores some documents too (i.e. highlights) So some of my
documents will be doubled somehow. If I consider that I will have many
documents, that dobuled documents may cause a problem for me. So is there
any way not storing that documents at Solr and pointing to them at
Hbase(where I save my crawled documents) or instead of pointing directly
storing them at Hbase (is it efficient or not)?


Re: Sharing index amongst multiple nodes

2013-04-06 Thread Furkan KAMACI
Hi Walter;

I am new to Solr and digging into code to understand it. I think that when
indexer copies indexes, before the commit it is unsearchable.

Where exactly that commit occurs at code and can I say that: rollback
something because I don't want that indexes (reason maybe anything else,
maybe I will decline some indexes(index filtering) because of the documents
they points. Is it possible?



2013/4/7 Walter Underwood wun...@wunderwood.org

 This is precisely how Solr replication works. It copies the indexes then
 does a commit.

 wunder

 On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote:

  Hi Daire Mac Mathúna;
 
  If there is a way copying one Solr's indexes into another Solr instance,
  this may also solve the problem. Somebody generates indexes and some of
  other instances could get a copy of them. At synchronizing process you
 may
  eliminate some of indexes at reader instance. So you can filter something
  to become unsearchable. *This may not be efficient and good thing and
 maybe
  solved with built-in functionality somehow.* However I think somebody may
  need that mechanism.
 
 
  2013/4/6 Amit Nithian anith...@gmail.com
 
  I don't understand why this would be more performant.. seems like it'd
 be
  more memory and resource intensive as you'd have multiple class-loaders
 and
  multiple cache spaces for no good reason. Just have a single core with
  sufficiently large caches to handle your response needs.
 
  If you want to load balance reads consider having multiple physical
 nodes
  with a master/slaves or SolrCloud.
 
 
  On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com
  wrote:
 
  Hi. Wat are the thoughts on having multiple SOLR instances i.e.
 multiple
  SOLR war files, sharing the same index (i.e. sharing the same
 solr_home)
  where only one SOLR instance is used for writing and the others for
  reading?
 
  Is this possible?
 
  Is it beneficial - is it more performant than having just one solr
  instance?
 
  How does it affect auto-commits i.e. how would the read nodes know the
  index has been changed and re-populate cache etc.?
 
  Sole 3.6.1
 
  Thanks.
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Sharing index amongst multiple nodes

2013-04-06 Thread Walter Underwood
Indexing happens on one Solr server. After a commit, the documents are 
searchable. In Solr 4, there is a soft commit, which makes the documents 
searchable, but does not create on-disk indexes.

Solr replication copies the committed indexes to another Solr server.

Solr Cloud uses a transaction log to make documents available before a hard 
commit.

Solr does not have rollback. A commit succeeds or fails. After it succeeds, 
there is no going back.

wunder

On Apr 6, 2013, at 3:08 PM, Furkan KAMACI wrote:

 Hi Walter;
 
 I am new to Solr and digging into code to understand it. I think that when
 indexer copies indexes, before the commit it is unsearchable.
 
 Where exactly that commit occurs at code and can I say that: rollback
 something because I don't want that indexes (reason maybe anything else,
 maybe I will decline some indexes(index filtering) because of the documents
 they points. Is it possible?
 
 
 
 2013/4/7 Walter Underwood wun...@wunderwood.org
 
 This is precisely how Solr replication works. It copies the indexes then
 does a commit.
 
 wunder
 
 On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote:
 
 Hi Daire Mac Mathúna;
 
 If there is a way copying one Solr's indexes into another Solr instance,
 this may also solve the problem. Somebody generates indexes and some of
 other instances could get a copy of them. At synchronizing process you
 may
 eliminate some of indexes at reader instance. So you can filter something
 to become unsearchable. *This may not be efficient and good thing and
 maybe
 solved with built-in functionality somehow.* However I think somebody may
 need that mechanism.
 
 
 2013/4/6 Amit Nithian anith...@gmail.com
 
 I don't understand why this would be more performant.. seems like it'd
 be
 more memory and resource intensive as you'd have multiple class-loaders
 and
 multiple cache spaces for no good reason. Just have a single core with
 sufficiently large caches to handle your response needs.
 
 If you want to load balance reads consider having multiple physical
 nodes
 with a master/slaves or SolrCloud.
 
 
 On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com
 wrote:
 
 Hi. Wat are the thoughts on having multiple SOLR instances i.e.
 multiple
 SOLR war files, sharing the same index (i.e. sharing the same
 solr_home)
 where only one SOLR instance is used for writing and the others for
 reading?
 
 Is this possible?
 
 Is it beneficial - is it more performant than having just one solr
 instance?
 
 How does it affect auto-commits i.e. how would the read nodes know the
 index has been changed and re-populate cache etc.?
 
 Sole 3.6.1
 
 Thanks.
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Sharing index amongst multiple nodes

2013-04-06 Thread Furkan KAMACI
Hi Walter;

Thanks for your explanation. You said Indexing happens on one Solr
server. Is it true even for SolrCloud?


2013/4/7 Walter Underwood wun...@wunderwood.org

 Indexing happens on one Solr server. After a commit, the documents are
 searchable. In Solr 4, there is a soft commit, which makes the documents
 searchable, but does not create on-disk indexes.

 Solr replication copies the committed indexes to another Solr server.

 Solr Cloud uses a transaction log to make documents available before a
 hard commit.

 Solr does not have rollback. A commit succeeds or fails. After it
 succeeds, there is no going back.

 wunder

 On Apr 6, 2013, at 3:08 PM, Furkan KAMACI wrote:

  Hi Walter;
 
  I am new to Solr and digging into code to understand it. I think that
 when
  indexer copies indexes, before the commit it is unsearchable.
 
  Where exactly that commit occurs at code and can I say that: rollback
  something because I don't want that indexes (reason maybe anything else,
  maybe I will decline some indexes(index filtering) because of the
 documents
  they points. Is it possible?
 
 
 
  2013/4/7 Walter Underwood wun...@wunderwood.org
 
  This is precisely how Solr replication works. It copies the indexes then
  does a commit.
 
  wunder
 
  On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote:
 
  Hi Daire Mac Mathúna;
 
  If there is a way copying one Solr's indexes into another Solr
 instance,
  this may also solve the problem. Somebody generates indexes and some of
  other instances could get a copy of them. At synchronizing process you
  may
  eliminate some of indexes at reader instance. So you can filter
 something
  to become unsearchable. *This may not be efficient and good thing and
  maybe
  solved with built-in functionality somehow.* However I think somebody
 may
  need that mechanism.
 
 
  2013/4/6 Amit Nithian anith...@gmail.com
 
  I don't understand why this would be more performant.. seems like it'd
  be
  more memory and resource intensive as you'd have multiple
 class-loaders
  and
  multiple cache spaces for no good reason. Just have a single core with
  sufficiently large caches to handle your response needs.
 
  If you want to load balance reads consider having multiple physical
  nodes
  with a master/slaves or SolrCloud.
 
 
  On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com
  wrote:
 
  Hi. Wat are the thoughts on having multiple SOLR instances i.e.
  multiple
  SOLR war files, sharing the same index (i.e. sharing the same
  solr_home)
  where only one SOLR instance is used for writing and the others for
  reading?
 
  Is this possible?
 
  Is it beneficial - is it more performant than having just one solr
  instance?
 
  How does it affect auto-commits i.e. how would the read nodes know
 the
  index has been changed and re-populate cache etc.?
 
  Sole 3.6.1
 
  Thanks.
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Sharing index amongst multiple nodes

2013-04-06 Thread Walter Underwood
In Solr Cloud, a document is indexed on the shard leader. The replicas in that 
shard get the document and add it to their indexes. There is some indexing that 
happens on the replicas, but that is managed by Solr.

wunder

On Apr 6, 2013, at 3:58 PM, Furkan KAMACI wrote:

 Hi Walter;
 
 Thanks for your explanation. You said Indexing happens on one Solr
 server. Is it true even for SolrCloud?
 
 
 2013/4/7 Walter Underwood wun...@wunderwood.org
 
 Indexing happens on one Solr server. After a commit, the documents are
 searchable. In Solr 4, there is a soft commit, which makes the documents
 searchable, but does not create on-disk indexes.
 
 Solr replication copies the committed indexes to another Solr server.
 
 Solr Cloud uses a transaction log to make documents available before a
 hard commit.
 
 Solr does not have rollback. A commit succeeds or fails. After it
 succeeds, there is no going back.
 
 wunder
 
 On Apr 6, 2013, at 3:08 PM, Furkan KAMACI wrote:
 
 Hi Walter;
 
 I am new to Solr and digging into code to understand it. I think that
 when
 indexer copies indexes, before the commit it is unsearchable.
 
 Where exactly that commit occurs at code and can I say that: rollback
 something because I don't want that indexes (reason maybe anything else,
 maybe I will decline some indexes(index filtering) because of the
 documents
 they points. Is it possible?
 
 
 
 2013/4/7 Walter Underwood wun...@wunderwood.org
 
 This is precisely how Solr replication works. It copies the indexes then
 does a commit.
 
 wunder
 
 On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote:
 
 Hi Daire Mac Mathúna;
 
 If there is a way copying one Solr's indexes into another Solr
 instance,
 this may also solve the problem. Somebody generates indexes and some of
 other instances could get a copy of them. At synchronizing process you
 may
 eliminate some of indexes at reader instance. So you can filter
 something
 to become unsearchable. *This may not be efficient and good thing and
 maybe
 solved with built-in functionality somehow.* However I think somebody
 may
 need that mechanism.
 
 
 2013/4/6 Amit Nithian anith...@gmail.com
 
 I don't understand why this would be more performant.. seems like it'd
 be
 more memory and resource intensive as you'd have multiple
 class-loaders
 and
 multiple cache spaces for no good reason. Just have a single core with
 sufficiently large caches to handle your response needs.
 
 If you want to load balance reads consider having multiple physical
 nodes
 with a master/slaves or SolrCloud.
 
 
 On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com
 wrote:
 
 Hi. Wat are the thoughts on having multiple SOLR instances i.e.
 multiple
 SOLR war files, sharing the same index (i.e. sharing the same
 solr_home)
 where only one SOLR instance is used for writing and the others for
 reading?
 
 Is this possible?
 
 Is it beneficial - is it more performant than having just one solr
 instance?
 
 How does it affect auto-commits i.e. how would the read nodes know
 the
 index has been changed and re-populate cache etc.?
 
 Sole 3.6.1
 
 Thanks.
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Sharing index amongst multiple nodes

2013-04-06 Thread Furkan KAMACI
My last questions.

1) If I sent document to a replica does it pass document to shard leader
and do you mean that even if I send document to shard leader does it can
pass that document
one of replicas to be indexed.

2) Does it possible to copy a shard into another shard, or merge them?

By the way thanks for your explanations.


2013/4/7 Walter Underwood wun...@wunderwood.org

 In Solr Cloud, a document is indexed on the shard leader. The replicas in
 that shard get the document and add it to their indexes. There is some
 indexing that happens on the replicas, but that is managed by Solr.

 wunder

 On Apr 6, 2013, at 3:58 PM, Furkan KAMACI wrote:

  Hi Walter;
 
  Thanks for your explanation. You said Indexing happens on one Solr
  server. Is it true even for SolrCloud?
 
 
  2013/4/7 Walter Underwood wun...@wunderwood.org
 
  Indexing happens on one Solr server. After a commit, the documents are
  searchable. In Solr 4, there is a soft commit, which makes the
 documents
  searchable, but does not create on-disk indexes.
 
  Solr replication copies the committed indexes to another Solr server.
 
  Solr Cloud uses a transaction log to make documents available before a
  hard commit.
 
  Solr does not have rollback. A commit succeeds or fails. After it
  succeeds, there is no going back.
 
  wunder
 
  On Apr 6, 2013, at 3:08 PM, Furkan KAMACI wrote:
 
  Hi Walter;
 
  I am new to Solr and digging into code to understand it. I think that
  when
  indexer copies indexes, before the commit it is unsearchable.
 
  Where exactly that commit occurs at code and can I say that: rollback
  something because I don't want that indexes (reason maybe anything
 else,
  maybe I will decline some indexes(index filtering) because of the
  documents
  they points. Is it possible?
 
 
 
  2013/4/7 Walter Underwood wun...@wunderwood.org
 
  This is precisely how Solr replication works. It copies the indexes
 then
  does a commit.
 
  wunder
 
  On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote:
 
  Hi Daire Mac Mathúna;
 
  If there is a way copying one Solr's indexes into another Solr
  instance,
  this may also solve the problem. Somebody generates indexes and some
 of
  other instances could get a copy of them. At synchronizing process
 you
  may
  eliminate some of indexes at reader instance. So you can filter
  something
  to become unsearchable. *This may not be efficient and good thing and
  maybe
  solved with built-in functionality somehow.* However I think somebody
  may
  need that mechanism.
 
 
  2013/4/6 Amit Nithian anith...@gmail.com
 
  I don't understand why this would be more performant.. seems like
 it'd
  be
  more memory and resource intensive as you'd have multiple
  class-loaders
  and
  multiple cache spaces for no good reason. Just have a single core
 with
  sufficiently large caches to handle your response needs.
 
  If you want to load balance reads consider having multiple physical
  nodes
  with a master/slaves or SolrCloud.
 
 
  On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna 
 daire...@gmail.com
  wrote:
 
  Hi. Wat are the thoughts on having multiple SOLR instances i.e.
  multiple
  SOLR war files, sharing the same index (i.e. sharing the same
  solr_home)
  where only one SOLR instance is used for writing and the others for
  reading?
 
  Is this possible?
 
  Is it beneficial - is it more performant than having just one solr
  instance?
 
  How does it affect auto-commits i.e. how would the read nodes know
  the
  index has been changed and re-populate cache etc.?
 
  Sole 3.6.1
 
  Thanks.
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Does solr cloud support rename or swap function for collection?

2013-04-06 Thread Mark Miller
4.2 and 4.2.1 have collection aliasing (similar to what we had with SolrCore 
aliasing at one point). You can use that to have one url and swap the 
collection search by it by the scenes.

- Mark

On Apr 6, 2013, at 6:28 AM, bradhill99 bradhil...@yahoo.com wrote:

 Hi,
 We are using solr 4.1 and we create a collection name my_data with 20
 shards. 
 Our index files are generated by using lucence api every one hour and load
 into solr cloud using core admin API.
 My problem is, for data generated every one hour, I need to create a new
 collection name like my_data_001 and to load the index files under that
 collection name. And my_data will be useless and my_data_001 is the latest
 data. In order to keep query url unchanged, I need to rename my_data_001 to
 my_data, but I can't see any collection API to do the rename or swap like
 core admin supports.
 How can I do this?
 
 thanks,
 
 Brad
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Does-solr-cloud-support-rename-or-swap-function-for-collection-tp4054193.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Sharing index amongst multiple nodes

2013-04-06 Thread Walter Underwood
A document sent to any Solr Cloud node will be sent to the right place.

Shard merging and splitting is not supported now. There is work on shard 
splitting: https://issues.apache.org/jira/browse/SOLR-3755

wunder

On Apr 6, 2013, at 4:15 PM, Furkan KAMACI wrote:

 My last questions.
 
 1) If I sent document to a replica does it pass document to shard leader
 and do you mean that even if I send document to shard leader does it can
 pass that document
 one of replicas to be indexed.
 
 2) Does it possible to copy a shard into another shard, or merge them?
 
 By the way thanks for your explanations.
 
 
 2013/4/7 Walter Underwood wun...@wunderwood.org
 
 In Solr Cloud, a document is indexed on the shard leader. The replicas in
 that shard get the document and add it to their indexes. There is some
 indexing that happens on the replicas, but that is managed by Solr.
 
 wunder
 
 On Apr 6, 2013, at 3:58 PM, Furkan KAMACI wrote:
 
 Hi Walter;
 
 Thanks for your explanation. You said Indexing happens on one Solr
 server. Is it true even for SolrCloud?
 
 
 2013/4/7 Walter Underwood wun...@wunderwood.org
 
 Indexing happens on one Solr server. After a commit, the documents are
 searchable. In Solr 4, there is a soft commit, which makes the
 documents
 searchable, but does not create on-disk indexes.
 
 Solr replication copies the committed indexes to another Solr server.
 
 Solr Cloud uses a transaction log to make documents available before a
 hard commit.
 
 Solr does not have rollback. A commit succeeds or fails. After it
 succeeds, there is no going back.
 
 wunder
 
 On Apr 6, 2013, at 3:08 PM, Furkan KAMACI wrote:
 
 Hi Walter;
 
 I am new to Solr and digging into code to understand it. I think that
 when
 indexer copies indexes, before the commit it is unsearchable.
 
 Where exactly that commit occurs at code and can I say that: rollback
 something because I don't want that indexes (reason maybe anything
 else,
 maybe I will decline some indexes(index filtering) because of the
 documents
 they points. Is it possible?
 
 
 
 2013/4/7 Walter Underwood wun...@wunderwood.org
 
 This is precisely how Solr replication works. It copies the indexes
 then
 does a commit.
 
 wunder
 
 On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote:
 
 Hi Daire Mac Mathúna;
 
 If there is a way copying one Solr's indexes into another Solr
 instance,
 this may also solve the problem. Somebody generates indexes and some
 of
 other instances could get a copy of them. At synchronizing process
 you
 may
 eliminate some of indexes at reader instance. So you can filter
 something
 to become unsearchable. *This may not be efficient and good thing and
 maybe
 solved with built-in functionality somehow.* However I think somebody
 may
 need that mechanism.
 
 
 2013/4/6 Amit Nithian anith...@gmail.com
 
 I don't understand why this would be more performant.. seems like
 it'd
 be
 more memory and resource intensive as you'd have multiple
 class-loaders
 and
 multiple cache spaces for no good reason. Just have a single core
 with
 sufficiently large caches to handle your response needs.
 
 If you want to load balance reads consider having multiple physical
 nodes
 with a master/slaves or SolrCloud.
 
 
 On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna 
 daire...@gmail.com
 wrote:
 
 Hi. Wat are the thoughts on having multiple SOLR instances i.e.
 multiple
 SOLR war files, sharing the same index (i.e. sharing the same
 solr_home)
 where only one SOLR instance is used for writing and the others for
 reading?
 
 Is this possible?
 
 Is it beneficial - is it more performant than having just one solr
 instance?
 
 How does it affect auto-commits i.e. how would the read nodes know
 the
 index has been changed and re-populate cache etc.?
 
 Sole 3.6.1
 
 Thanks.
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: It seems a issue of deal with chinese synonym for solr

2013-04-06 Thread 李威
Hi Kuro Kurosaka,


Thanks for your attention.


It must Chinese query can reproduce this problem. because English word is 
seperate by space.


If I search 北京市  动物园, I inserted a space in the query. The query would be 
parsed to +(北京市 北京) +动物园, which is expected. So the Chinese query can aslo 
work only if I insert space to seperate words.


The query parser I used is ik-analyzer: http://code.google.com/p/ik-analyzer/




Thanks,
Wei Li


-- Original --
From:  Kuro Kurosakakuro...@sonic.net;
Date:  Thu, Apr 4, 2013 02:53 AM
To:  solr-usersolr-user@lucene.apache.org; 
Cc:  李威li...@antvision.cn; 罗佳luo...@antvision.cn; 
李景泽lijin...@antvision.cn; 
Subject:  Re: It seems a issue of deal with chinese synonym for solr

 
On 3/11/13 6:15 PM, 李威 wrote:
 in org.apache.solr.parser.SolrQueryParserBase, there is a function: 
 protected Query newFieldQuery(Analyzer analyzer, String field, String 
 queryText, boolean quoted)  throws SyntaxError

 The below code can't process chinese rightly.

   BooleanClause.Occur occur = positionCount  1  operator == 
 AND_OPERATOR ?
  BooleanClause.Occur.MUST : BooleanClause.Occur.SHOULD;

 

 For example, “北京市 and “北京 are synonym, if I seach 北京市动物园, the expected 
 parse result is +(北京市 北京) +动物园, but actually it would be parsed to +北京市 
 +北京 +动物园.

 The code can process English, because English word is seperate by space, and 
 only one position.

An interesting feature of this example is that difference between the two 
synonyms is
omission of one token 市 (city). Doesn't the same same problem happen if we 
define
London City and London as synonyms, and execute a query like London City 
Zoo?
Must Chinese Analyzer be used to reproduce this problem?

I tried to test this but I couldn't. The result of query string expansion using 
Solr 4.2's
query interface with debug output shows:

str name=parsedqueryMultiPhraseQuery(text:(london london) city zoo)/str

I see no plus (+). What query parser did you use?

-- 
Kuro Kurosaka

Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-06 Thread Jack Krupansky
Solr would not be storing the original source form of the documents in any 
case. Whether you use Tika or SolrCell, only the text stream of the content 
and the metadata would ever get indexed or stored in Solr.


Solr completely decouples indexing and storing of data values. If you 
don't want to store the text stream in Solr, then don't.


If you want to store the original blob of the source documents in some 
other data store, that's your choice. You can store the original URL or a 
document ID or URL for some alternate document store. That's your choice to 
make. Solr in no way forces you one way or the other. And whether that URL 
or document ID refers to HBase or a web site, doesn't matter to Solr either.


Whether or not you could more efficiently store the original document bytes 
in Lucene/Solr DocValues vs. HBase is a separate matter - I don't know one 
way or the other whether DocValues help or not. Or whether a Solr 
BinaryField might be suitable for store the original bytes of a document 
(but without indexing the bytes.)


In other words, maybe you could just use two separate Solr servers, one for 
text index and metadata store, and the other for raw store of the original 
document bytes.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Saturday, April 06, 2013 6:01 PM
To: solr-user@lucene.apache.org
Subject: Pointing to Hbase for Docuements or Directly Saving Documents at 
Hbase


Hi;

First of all should mention that I am new to Solr and making a research
about it. What I am trying to do that I will crawl some websites with Nutch
and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 )

I wonder about something. I have a cloud of machines that crawls websites
and stores that documents. Then I send that documents into SolrCloud. Solr
indexes that documents and generates indexes and save them. I know that
from Information Retrieval theory: it *may* not be efficient to store
indexes at a NoSQL database (they are something like linked lists and if
you store them in such kind of database you *may* have a sparse
representation -by the way there may be some solutions for it. If you
explain them you are welcome.)

However Solr stores some documents too (i.e. highlights) So some of my
documents will be doubled somehow. If I consider that I will have many
documents, that dobuled documents may cause a problem for me. So is there
any way not storing that documents at Solr and pointing to them at
Hbase(where I save my crawled documents) or instead of pointing directly
storing them at Hbase (is it efficient or not)?