Creating Solr servers dynamically in Multicore folder

2014-09-09 Thread nishwanth
Hello ,

I  am using solr 4.8.1 Version and and i am trying to create the cores
dynamically on server start up using the following piece of code.

 HttpSolrServer s = new HttpSolrServer( url );
s.setParser(new BinaryResponseParser());
s.setRequestWriter(new BinaryRequestWriter());
SolrServer server = s;
String instanceDir =/opt/solr/core/multicore/;
CoreAdminResponse e =  new CoreAdminRequest().createCore(name,
instanceDir,
server,/opt/solr/core/multicore/solrconfig.xml,/opt/solr/core/multicore/schema.xml);

I am getting the error 

org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error
CREA
TEing SolrCore 'hellocore': Could not create a new core in
/opt/solr/core/multic
ore/as another core is already defined there
at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSo
lrServer.java:554)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:210)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:206)
at
org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdm
inRequest.java:503)
at
org.apache.solr.client.solrj.request.CoreAdminRequest.createCore(Core
AdminRequest.java:580)
at
org.apache.solr.client.solrj.request.CoreAdminRequest.createCore(Core
AdminRequest.java:560)
at
app.services.OperativeAdminScheduler.scheduleTask(OperativeAdminSched
uler.java:154)
at Global.onStart(Global.java:31)

I am still getting the above error even  though the core0 and core1 folders
in multicore are deleted and the same is commented in
/opt/solr/core/multicore/solrconfig.xml. Also i enabled persistent=true in
the solrconfig.xml 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Creating-Solr-servers-dynamically-in-Multicore-folder-tp4157550.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR tuning

2014-09-09 Thread van...@bluewin.ch
Hi,

Newbie testing on laptop here. I've got two cores (shards) in my datastore.
When searching I get this error if result is above approx. 200'000 records,
below 200'000 it returns fine. I thought it was simply a case of upping Java
heap size, but no luck. I do not want to use start/cursorMark in this case,
anyone know what to tune exactly in order to get more than 200'000 records
back? Running in JBOSS.

Thanks!

09:25:53,812 ERROR [SolrCore] java.lang.NullPointerException
at java.io.StringReader.init(StringReader.java:33)
at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203)
at
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:101)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
com.incentage.ipc.index.InitializerDispatchFilter.doFilter(InitializerDispatchFilter.java:94)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:190)
at
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:92)
at
org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.process(SecurityContextEstablishmentValve.java:126)
at
org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:70)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:330)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:829)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:598)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

09:25:53,812 INFO  [SolrCore] [index.part.201409] webapp=/index path=/select
params={} status=500 QTime=0
09:25:53,812 ERROR [SolrDispatchFilter] java.lang.NullPointerException
at java.io.StringReader.init(StringReader.java:33)
at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203)
at
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:101)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
com.incentage.ipc.index.InitializerDispatchFilter.doFilter(InitializerDispatchFilter.java:94)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   

Re: SOLR tuning

2014-09-09 Thread J'roo
BTW - there are no HTTP timeouts active, so this part is OK



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-tuning-tp4157561p4157562.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr WARN Log

2014-09-09 Thread Joseph V J
Hi,

I'm trying to upgrade Solr from version 4.2 to 4.9, since then I'm
receiving the following warning from solr log. It would be great if anyone
could throw some light into it.

Level Logger Message
WARN ManagedResource *No registered observers for /rest/managed*

OS Used : Debian GNU/Linux 7

~Thanks
Joe


OpenNLP integration with Solr

2014-09-09 Thread Ankur Dulwani
I am using Solr 4.9 and want to integrate openNLP with it. I ran the patch
successfully  LUCENE-2899
https://issues.apache.org/jira/browse/LUCENE-2899  , the patch ran
successfully and following are my changes in schema.xmlBut no proper
outcomes can be seen. It is not recognizing the Named Entities like person,
organization etc, instead it gives all the text in person field.What am I
doing wrong, please help



--
View this message in context: 
http://lucene.472066.n3.nabble.com/OpenNLP-integration-with-Solr-tp4157569.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Language detection for multivalued field

2014-09-09 Thread lsanchez
Hi all,
I don't know if this can help somebody, I've changed the method process of
the class LanguageIdentifierUpdateProcessor in order to support of
multivalued fields and it works pretty well


protected SolrInputDocument process(SolrInputDocument doc) {
String docLang = null;
HashSetString docLangs = new HashSetString();
String fallbackLang = getFallbackLang(doc, fallbackFields,
fallbackValue);

if(langField == null || !doc.containsKey(langField) ||
(doc.containsKey(langField)  overwrite)) {
  String allText = concatFields(doc, inputFields);
  ListDetectedLanguage languagelist = detectLanguage(allText);
  docLang = resolveLanguage(languagelist, fallbackLang);
  docLangs.add(docLang);
  log.debug(Detected main document language from fields  +
inputFields.toString() + : +docLang);

  if(doc.containsKey(langField)  overwrite) {
log.debug(Overwritten old value +doc.getFieldValue(langField));
  }
  if(langField != null  langField.length() != 0) {
doc.setField(langField, docLang);
  }
} else {
  // langField is set, we sanity check it against whitelist and fallback
  docLang = resolveLanguage((String) doc.getFieldValue(langField),
fallbackLang);
  docLangs.add(docLang);
  log.debug(Field +langField+ already contained value +docLang+,
not overwriting.);
}

if(enableMapping) {
  for (String fieldName : allMapFieldsSet) {
if(doc.containsKey(fieldName)) {
  String fieldLang=;
  if(mapIndividual  mapIndividualFieldsSet.contains(fieldName)) {

Collection c = doc.getFieldValues(fieldName);
for (Object o : c){
if(o instanceof String ){
ListDetectedLanguage languagelist =
detectLanguage((String) o);
fieldLang = resolveLanguage(languagelist, docLang);
docLangs.add(fieldLang);
log.debug(Mapping multivalued  field +fieldName+
using individually detected language +fieldLang);
String mappedOutputField = getMappedField(fieldName,
fieldLang);
if (mappedOutputField != null) {
log.debug(Mapping multivalued field {} to {},
doc.getFieldValue(docIdField), fieldLang);
SolrInputField inField = new SolrInputField
(fieldName);
Collection currentContent
=doc.getFieldValues(mappedOutputField);
if (currentContent != null 
currentContent.size()0){
doc.addField(mappedOutputField, o);

}
else{
inField.setValue(o,
doc.getField(fieldName).getBoost());
doc.setField(mappedOutputField,
inField.getValue(), inField.getBoost());
}

   

if(!mapKeepOrig) {
  log.debug(Removing old field {}, fieldName);
  doc.removeField(fieldName);
}
  } else {
throw new
SolrException(SolrException.ErrorCode.BAD_REQUEST, Invalid output field
mapping for 
+ fieldName +  field and language:  +
fieldLang);
  }
}
}

  } else {

fieldLang = docLang;
log.debug(Mapping field +fieldName+ using document global
language +fieldLang);
String mappedOutputField = getMappedField(fieldName, fieldLang);

if (mappedOutputField != null) {
  log.debug(Mapping field {} to {},
doc.getFieldValue(docIdField), fieldLang);
  SolrInputField inField = doc.getField(fieldName);
  doc.setField(mappedOutputField, inField.getValue(),
inField.getBoost());
  if(!mapKeepOrig) {
log.debug(Removing old field {}, fieldName);
doc.removeField(fieldName);
  }
} else {
  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
Invalid output field mapping for 
  + fieldName +  field and language:  + fieldLang);
}
  }
  
}
  }
}

// Set the languages field to an array of all detected languages
if(langsField != null  langsField.length() != 0) {
  doc.setField(langsField, docLangs.toArray());
}

return doc;
  }



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Language-detection-for-multivalued-field-tp4096996p4157573.html
Sent from the Solr - User mailing list archive at Nabble.com.


Using def function in fl criteria,

2014-09-09 Thread Pigeyre Romain
Hi

I'm trying to use a query with 
fl=name_UK,name_FRA,itemDesc:def(name_UK,name_FRA)
As you can see, the itemDesc field (builded by solr) is truncated :

{
name_UK: MEN S SUIT\n,
name_FRA: 24 RELAX 2 BTS ST GERMAIN TOILE FLAMMEE LIN ET SOIE,
itemDesc: suit
  }

Do you have any idea to change it?

Thanks.

Regards,

Romain


Re: Master - Master / Upgrading a slave to master

2014-09-09 Thread Shawn Heisey
On 9/8/2014 9:54 PM, Salman Akram wrote:
 We have a redundant data center in case the primary goes down. Currently we
 have 1 master and multiple slaves on primary data center. This master also
 replicates to a slave in secondary data center. So if the primary goes down
 at least the read only part works. However, now we want writes to work on
 secondary data center too when primary goes down.
 
 - Is it possible in SOLR to have Master - Master?
 - If not then what's the best strategy to upgrade a slave to master?
 - Naturally there would be some latency due to data centers being in
 different geographical locations so what are the normal data issues and
 best practices in case primary goes down? We would also like to shift back
 to primary as soon as its back.

SolrCloud would work, but only if you have *three* datacenters.  Two of
them would need to remain fully operational.  SolrCloud is a true
cluster -- there is no master.  Each of the shards in a collection has
one or more replicas.  One of the replicas gets elected to be leader,
but the leader designation can change.

The reason that you need three is because of zookeeper, which is the
software that actually maintains the cluster and handles leader
elections.  A majority of zookeeper nodes (more than half of them) must
be operational for zookeeper to maintain quorum.  That means that the
minimum number of zookeepers is three, and in a three-node system, one
can go down without disrupting operation.

One thing that SolrCloud doesn't yet have is rack/datacenter awareness.
 Requests get load balanced across the entire cluster, regardless of
where they are located.  It's something that will eventually come, but I
don't have any kind of estimate for when.

Thanks,
Shawn



Re: Solr WARN Log

2014-09-09 Thread Shawn Heisey
On 9/9/2014 2:56 AM, Joseph V J wrote:
 I'm trying to upgrade Solr from version 4.2 to 4.9, since then I'm
 receiving the following warning from solr log. It would be great if anyone
 could throw some light into it.
 
 Level Logger Message
 WARN ManagedResource *No registered observers for /rest/managed*
 
 OS Used : Debian GNU/Linux 7

This message comes from the new Schema REST API.  Basically it means you
haven't configured it.  You can ignore this message.  To get it to go
away, you would need to configure the new feature.

https://cwiki.apache.org/confluence/display/solr/Schema+API

Thanks,
Shawn



Chronological partitioning of data - what does Solr offer in this area?

2014-09-09 Thread Gili Nachum
Hello!

*Does Solr support any sort of chronological ordering of data?*
I would like to divide my data to: Daily, weekly, monthly, yearly parts.
For performance sake.
Has anyone done something like this over SolrCloud?

More thoughts:
While Indexing: I'm soft committing every 2 seconds so I would rather do
that on the daily index only (open reader and cache invalidation effort) as
the total index shard size is 200GB
While Searching: I would rather search first the daily and weekly parts,
moving to older data only if results are not satisfying.

I guess there's a challenge to move the daily data to the weekly data at
the end of a day, and so on
Anything build-in that goes in this direction? If not, any example of some
custom collection/sharding configuration?

Thanks.
Gili.


Sorting docs by Hamming distance

2014-09-09 Thread michael.boom
Hi,

Did anybody try to embed into Solr sorting based on the Hamming distance on
a certain field. http://en.wikipedia.org/wiki/Hamming_distance
E.g. having a document doc1 with a field doc_hash:12345678 and doc2 with
doc_hash:12345699.
When searching for doc_hash:123456780 the sort order should be -
doc1,doc2.

What would be the best way to achieve this kind of behaviour? Writing a
plugin or maybe a custom function query ?

Thanks!




-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-docs-by-Hamming-distance-tp4157600.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Chronological partitioning of data - what does Solr offer in this area?

2014-09-09 Thread Alexandre Rafalovitch
Have you looked at collection aliasing already?
http://www.anshumgupta.net/2013/10/collection-aliasing-in-solrcloud.html

Regards,
   Alex
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Tue, Sep 9, 2014 at 9:23 AM, Gili Nachum gilinac...@gmail.com wrote:
 Hello!

 *Does Solr support any sort of chronological ordering of data?*
 I would like to divide my data to: Daily, weekly, monthly, yearly parts.
 For performance sake.
 Has anyone done something like this over SolrCloud?

 More thoughts:
 While Indexing: I'm soft committing every 2 seconds so I would rather do
 that on the daily index only (open reader and cache invalidation effort) as
 the total index shard size is 200GB
 While Searching: I would rather search first the daily and weekly parts,
 moving to older data only if results are not satisfying.

 I guess there's a challenge to move the daily data to the weekly data at
 the end of a day, and so on
 Anything build-in that goes in this direction? If not, any example of some
 custom collection/sharding configuration?

 Thanks.
 Gili.


Send nested doc with solrJ

2014-09-09 Thread Ali Nazemian
Dear all,
Hi,
I was wondering how can I use solrJ for sending nested document to solr?
Unfortunately I did not find any tutorial for this purpose. I really
appreciate if you can guide me through that. Thank you very much.
Best regards.

-- 
A.Nazemian


Re: Using a RequestHandler to expand query parameter

2014-09-09 Thread jimtronic
Never got a response on this ... Just looking for the best way to handle it?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-a-RequestHandler-to-expand-query-parameter-tp4155596p4157613.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using a RequestHandler to expand query parameter

2014-09-09 Thread jmlucjav
this is easily doable by a custom (java code) request handler. If you want
to avoid writing any java code, you should investigate using
https://issues.apache.org/jira/browse/SOLR-5005 (I am myself going to have
a look at this interesting feature)

On Tue, Sep 9, 2014 at 4:33 PM, jimtronic jimtro...@gmail.com wrote:

 Never got a response on this ... Just looking for the best way to handle
 it?





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Using-a-RequestHandler-to-expand-query-parameter-tp4155596p4157613.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Master - Master / Upgrading a slave to master

2014-09-09 Thread Salman Akram
You mean 3 'data centers' or 'nodes'? I am thinking if we have 2 nodes on
primary and 1 in secondary and we normally keep the secondary down would
that work? Basically secondary network is just for redundancy and won't be
as fast so normally we won't like to shift traffic there.

So can we just have nodes for redundancy and NOT load balancing i.e. it has
3 nodes but update is only on one of them? Similarly for the slave replicas
can we limit the searches to a certain slave or it will be auto balanced?

Also apart from SOLR cloud is it possible to have multiple master in SOLR
or a good guide to upgrade a slave to master?

Thanks

On Tue, Sep 9, 2014 at 5:40 PM, Shawn Heisey s...@elyograg.org wrote:

 On 9/8/2014 9:54 PM, Salman Akram wrote:
  We have a redundant data center in case the primary goes down. Currently
 we
  have 1 master and multiple slaves on primary data center. This master
 also
  replicates to a slave in secondary data center. So if the primary goes
 down
  at least the read only part works. However, now we want writes to work on
  secondary data center too when primary goes down.
 
  - Is it possible in SOLR to have Master - Master?
  - If not then what's the best strategy to upgrade a slave to master?
  - Naturally there would be some latency due to data centers being in
  different geographical locations so what are the normal data issues and
  best practices in case primary goes down? We would also like to shift
 back
  to primary as soon as its back.

 SolrCloud would work, but only if you have *three* datacenters.  Two of
 them would need to remain fully operational.  SolrCloud is a true
 cluster -- there is no master.  Each of the shards in a collection has
 one or more replicas.  One of the replicas gets elected to be leader,
 but the leader designation can change.

 The reason that you need three is because of zookeeper, which is the
 software that actually maintains the cluster and handles leader
 elections.  A majority of zookeeper nodes (more than half of them) must
 be operational for zookeeper to maintain quorum.  That means that the
 minimum number of zookeepers is three, and in a three-node system, one
 can go down without disrupting operation.

 One thing that SolrCloud doesn't yet have is rack/datacenter awareness.
  Requests get load balanced across the entire cluster, regardless of
 where they are located.  It's something that will eventually come, but I
 don't have any kind of estimate for when.

 Thanks,
 Shawn




-- 
Regards,

Salman Akram


Re: Using a RequestHandler to expand query parameter

2014-09-09 Thread Shawn Heisey
On 8/28/2014 7:43 AM, jimtronic wrote:
 I would like to send only one query to my custom request handler and have the
 request handler expand that query into a more complicated query.

 Example:

 */myHandler?q=kids+books*

 ... would turn into a more complicated EDismax query of:

 *kids books kids books*

 Is this achievable via a Request Handler definition in solrconfig.xml?

As someone else already said, you can write a custom request handler and
reference it in a handler definition in your solrconfig.xml file.  The
sky's the limit for that -- if you can write the code, Solr will use it.

This *specific* example that you've given is something that the edismax
parser will give you out of the box, when you define the qf and pf
parameters.  It will automatically search the individual terms you give
on the fields in the qf parameter, *and* do a phrase search for all
those terms on the fields in the pf parameter.

https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
http://wiki.apache.org/solr/ExtendedDisMax

Thanks,
Shawn



Re: Master - Master / Upgrading a slave to master

2014-09-09 Thread Shawn Heisey
On 9/9/2014 8:46 AM, Salman Akram wrote:
 You mean 3 'data centers' or 'nodes'? I am thinking if we have 2 nodes on
 primary and 1 in secondary and we normally keep the secondary down would
 that work? Basically secondary network is just for redundancy and won't be
 as fast so normally we won't like to shift traffic there.

 So can we just have nodes for redundancy and NOT load balancing i.e. it has
 3 nodes but update is only on one of them? Similarly for the slave replicas
 can we limit the searches to a certain slave or it will be auto balanced?

 Also apart from SOLR cloud is it possible to have multiple master in SOLR
 or a good guide to upgrade a slave to master?

You must have three zookeeper nodes for a redundant setup.  If you only
have two data centers, then you must put at least two of those nodes in
one data center.  If the data center with two zookeeper nodes goes down,
zookeeper cannot function, which means SolrCloud will not work
correctly.  There is no way to maintain SolrCloud redundancy with only
two data centers.  You might think to add a fourth ZK node and split
them between the data centers ... except that in that situation, at
least three nodes must be functional.  Two out of four nodes is not enough.

A minimal fault-tolerant SolrCloud install is three physical machines. 
Two of them run ZK and Solr, one of them runs ZK only.

If you don't use SolrCloud, then you have two choices to switch masters:

1) Change the replication config to redefine the master and reload the
core or restart Solr.
2) Write scripts that manually use the replication HTTP API to do all
your replication, rather than let Solr handle it automatically.  You can
choose the master for every replication with HTTP calls.

https://wiki.apache.org/solr/SolrReplication#HTTP_API

Thanks,
Shawn



Re: Master - Master / Upgrading a slave to master

2014-09-09 Thread Salman Akram
So realistically speaking you cannot have SolrCloud work for 2 data centers
as a redundant solution because no matter how many nodes you add you still
would need at least 1 node in the 2nd center working too.

So that just leaves with non-SolrCloud solutions.

1) Change the replication config to redefine the master and reload the core
or restart Solr.

That of course is a simple way but the real issue is about the possible
issues and some good practices e.g. normally the scenario would be that
primary data center goes down for few hours and till then we upgrade one of
the slaves in secondary to a master. Now

- IF there is no lag there won't be any issue in secondary at least but
what if there is lag and one of the files is not completely replicated?
That file would be discarded or there is a possibility that whole index is
not usable?

- Once the primary comes back how would we now copy the delta from
secondary? Make it a slave of secondary first, replicate the delta and then
set it as a master again?

In other words is there a good guide out there for this with possible
issues and solutions? Definitely before SolrCloud people would be doing
this and even now SolrCloud doesn't seem practical in quite a few
situations.

Thanks again!!

On Tue, Sep 9, 2014 at 8:02 PM, Shawn Heisey s...@elyograg.org wrote:

 On 9/9/2014 8:46 AM, Salman Akram wrote:
  You mean 3 'data centers' or 'nodes'? I am thinking if we have 2 nodes on
  primary and 1 in secondary and we normally keep the secondary down would
  that work? Basically secondary network is just for redundancy and won't
 be
  as fast so normally we won't like to shift traffic there.
 
  So can we just have nodes for redundancy and NOT load balancing i.e. it
 has
  3 nodes but update is only on one of them? Similarly for the slave
 replicas
  can we limit the searches to a certain slave or it will be auto balanced?
 
  Also apart from SOLR cloud is it possible to have multiple master in SOLR
  or a good guide to upgrade a slave to master?

 You must have three zookeeper nodes for a redundant setup.  If you only
 have two data centers, then you must put at least two of those nodes in
 one data center.  If the data center with two zookeeper nodes goes down,
 zookeeper cannot function, which means SolrCloud will not work
 correctly.  There is no way to maintain SolrCloud redundancy with only
 two data centers.  You might think to add a fourth ZK node and split
 them between the data centers ... except that in that situation, at
 least three nodes must be functional.  Two out of four nodes is not enough.

 A minimal fault-tolerant SolrCloud install is three physical machines.
 Two of them run ZK and Solr, one of them runs ZK only.

 If you don't use SolrCloud, then you have two choices to switch masters:

 1) Change the replication config to redefine the master and reload the
 core or restart Solr.
 2) Write scripts that manually use the replication HTTP API to do all
 your replication, rather than let Solr handle it automatically.  You can
 choose the master for every replication with HTTP calls.

 https://wiki.apache.org/solr/SolrReplication#HTTP_API

 Thanks,
 Shawn




-- 
Regards,

Salman Akram


[Announce] Apache Solr 4.10 with RankingAlgorithm 1.5.4 available now with complex-lsa algorithm (simulates human language acquisition and recognition)

2014-09-09 Thread nnagarajayya
Hi!

I am very excited to announce the availability of Apache Solr 4.10 with
RankingAlgorithm 1.5.4. 

Solr 4.10.0 with RankingAlgorithm 1.5.4 includes support for complex-lsa.
complex-lsa simulates human language acquisition and recognition (see demo
http://solr-ra.tgels.org/rankingsearchlsa.jsp ) and can retrieve
semantically related/hidden relationships between terms, sentences,
paragraphs, chapters, books, images, etc. Three new similarities,
TERM_SIMILARITY, DOCUMENT_SIMILARITY, TERM_DOCUMENT_SIMILARITY enable these
with improved precision.  A query for “holy AND ghost” returns jesus/christ
as the top results for the bible corpus with no effort to introduce this
relationship (see demo http://solr-ra.tgels.org/rankingsearchlsa.jsp ).

 

This version adds support for  multiple linear algebra libraries.
complex-lsa does a large amount of this calcs so speeding this up should
speed up the retrieval etc. EJML is the fastest if you are using complex-lsa
for a smaller set of documents, while MTJ is faster as your document
collection becomes bigger. MTJ can also use BLAS/LAPACK, etc installed on
your system to further improve performance with native execution. The
performance is similar to a C/C++ application. It can also make use of GPUs
or Intel's mkl library if you have access to it.

RankingAlgorithm 1.5.4 with complex-lsa supports the entire Lucene Query
Syntax , ± and/or boolean/dismax/glob/regular
expression/wildcard/fuzzy/prefix/suffix queries with boosting, etc. This
version increases performance, with increased accuracy and relevance for
Document similarity, fixes problems with phrase queries,  Boolean queries,
etc.


You can get more information about complex-lsa and realtime-search
performance from here: 
http://solr-ra.tgels.org/wiki/en/Complex-lsa-demo

You can download Solr 4.10 with RankingAlgorithm 1.5.4 from here: 
http://solr-ra.tgels.org

Please download and give the new version a try.

Regards, 

Nagendra Nagarajayya 
http://solr-ra.tgels.org 
http://elasticsearch-ra.tgels.org 
http://rankingalgorithm.tgels.org 

Note: 
1. Apache Solr 4.10 with RankingAlgorithm 1.5.4 is an external project. 





Re: Using a RequestHandler to expand query parameter

2014-09-09 Thread jimtronic
So, the problem I found that's driving this is that I have several phrase
synonyms set up. For example, ipod mini into ipad mini. This synonym is
only applied if you submit it as a phrase in quotes. 

So, the pf param doesn't help because it's not the right phrase in the first
place.

I can fix this by sending in the query as (ipod mini ipod mini).





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-a-RequestHandler-to-expand-query-parameter-tp4155596p4157637.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Sharding Help

2014-09-09 Thread Ethan
Thanks Jeff.  I had different idea of how replicationFactor worked.  I was
able to create the setup with that command.

Now as I import data into the cluster how can I determine that it's being
sharding?

On Mon, Sep 8, 2014 at 1:52 PM, Jeff Wartes jwar...@whitepages.com wrote:


 You need to specify a replication factor of 2 if you want two copies of
 each shard. Solr doesn¹t ³auto fill² available capacity, contrary to the
 misleading examples on the http://wiki.apache.org/solr/SolrCloud page.
 Those examples only have that behavior because they ask you to copy the
 examples directory, which brings some on-disk configuration with it.



 On 9/8/14, 1:33 PM, Ethan eh198...@gmail.com wrote:

 Thanks Erick.  That cleared my confusion.
 
 I have a follow up question -  If I run the CREATE command with 4 nodes in
 createNodeSet, I thought 2 leaders and 2 followers will be created
 automatically. Thats not the case, however.
 
 
 http://serv001:5258/solr/admin/collections?action=CREATEname=MainnumShar
 ds=2maxShardsPerNode=1createNodeSet=
  serv001:5258_solr, serv002:5258_solr,serv003:5258_solr, serv004:5258_solr
 
 I still get the same response.  I see 2 leaders being created, but I do
 not
 see other 2 nodes show up as followers in the cloud page in Solr Admin UI.
  It looks like collection was not created for those 2 nodes at all.
 
 Is there additional step involved to add them?
 
 On Mon, Sep 8, 2014 at 12:11 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Ahhh, this is a continual source of confusion. I've started a one-man
  campaign to talk about leaders and followers when relevant...
 
  _Every_ node is a replica. This is because a node can be a leader or
  follower, and the role can change.
 
  So your case is entirely normal. These nodes are probably the leaders
  too, and will remain so while you add more replicas/followers.
 
  Best,
  Erick
 
  On Mon, Sep 8, 2014 at 11:20 AM, Ethan eh198...@gmail.com wrote:
   I am trying to setup 2 shard cluster with 2 replicas with dedicated
 nodes
   for replicas.  I have 4 node SolrCloud setup that I am trying to shard
   using collections api .. (Like
  
 
 
 https://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_
 shard_replicas_and_zookeeper_ensemble
   )
  
   I ran this command -
  
  
 
 
 http://serv001:5258/solr/admin/collections?action=CREATEname=MainnumSha
 rds=2maxShardsPerNode=1createNodeSet=
serv001:5258_solr, serv002:5258_solr
  
   Response -
  
   response
   lst name=responseHeader
   int name=status0/int
   int name=QTime3932/int
   /lst
   lst name=success
   lst
   lst name=responseHeader
   int name=status0/int
   int name=QTime2982/int
   /lst
   str name=coreMain_shard2_replica1/str
   /lst
   lst
   lst name=responseHeader
   int name=status0/int
   int name=QTime3005/int
   /lst
   str name=coreMain_shard1_replica1/str
   /lst
   /lst
   /response
  
   I want to know what *_replica1 or *_replica2 means?  Are they actually
   replicas and not the shards?  I intended to add 2 more nodes as
 dedicated
   replication nodes.  How to accomplish that?
  
   Would appreciate any pointers.
  
   -E
 




Re: [Announce] Apache Solr 4.10 with RankingAlgorithm 1.5.4 available now with complex-lsa algorithm (simulates human language acquisition and recognition)

2014-09-09 Thread Diego Fernandez
Interesting.  Does anyone know how that compares to this 
http://www.searchbox.com/products/searchbox-plugins/solr-sense/?

Diego Fernandez - 爱国
Software Engineer
US GSS Supportability - Diagnostics


- Original Message -
 Hi!
 
 I am very excited to announce the availability of Apache Solr 4.10 with
 RankingAlgorithm 1.5.4.
 
 Solr 4.10.0 with RankingAlgorithm 1.5.4 includes support for complex-lsa.
 complex-lsa simulates human language acquisition and recognition (see demo
 http://solr-ra.tgels.org/rankingsearchlsa.jsp ) and can retrieve
 semantically related/hidden relationships between terms, sentences,
 paragraphs, chapters, books, images, etc. Three new similarities,
 TERM_SIMILARITY, DOCUMENT_SIMILARITY, TERM_DOCUMENT_SIMILARITY enable these
 with improved precision.  A query for “holy AND ghost” returns jesus/christ
 as the top results for the bible corpus with no effort to introduce this
 relationship (see demo http://solr-ra.tgels.org/rankingsearchlsa.jsp ).
 
  
 
 This version adds support for  multiple linear algebra libraries.
 complex-lsa does a large amount of this calcs so speeding this up should
 speed up the retrieval etc. EJML is the fastest if you are using complex-lsa
 for a smaller set of documents, while MTJ is faster as your document
 collection becomes bigger. MTJ can also use BLAS/LAPACK, etc installed on
 your system to further improve performance with native execution. The
 performance is similar to a C/C++ application. It can also make use of GPUs
 or Intel's mkl library if you have access to it.
 
 RankingAlgorithm 1.5.4 with complex-lsa supports the entire Lucene Query
 Syntax , ± and/or boolean/dismax/glob/regular
 expression/wildcard/fuzzy/prefix/suffix queries with boosting, etc. This
 version increases performance, with increased accuracy and relevance for
 Document similarity, fixes problems with phrase queries,  Boolean queries,
 etc.
 
 
 You can get more information about complex-lsa and realtime-search
 performance from here:
 http://solr-ra.tgels.org/wiki/en/Complex-lsa-demo
 
 You can download Solr 4.10 with RankingAlgorithm 1.5.4 from here:
 http://solr-ra.tgels.org
 
 Please download and give the new version a try.
 
 Regards,
 
 Nagendra Nagarajayya
 http://solr-ra.tgels.org
 http://elasticsearch-ra.tgels.org
 http://rankingalgorithm.tgels.org
 
 Note:
 1. Apache Solr 4.10 with RankingAlgorithm 1.5.4 is an external project.
 
 
 
 


Re: ExtractingRequestHandler indexing zip files

2014-09-09 Thread keeblerh
I am also having the issue where my zip contents (or kmz contents) are not
being processed - only the file names are processed.  It seems to recognize
the kmz extension and open the file just doesn't recurse the processing on
the contents.
The patch you mention has been around for a while.  I am running solr 4.8.1
and looks like the tika jar is 1.5. So I would think the patch would be
included already.  Do I need additional configuration?  My config is as
follows: 
dataConfigdataSource type=BinFileDataSource /documententity
name=kmlfiles dataSource=null rootEntity=false baseDir=mydirectory
fileName=.*\.kmz$ onError=skip processor=FileListEntityProcessor
recursive=false 
field defs
/
entity name=kmlImport processor=TikaEntityProcessor
datasource=kmlfiles htmlMapper=identity
transformer=TemplateTransformer url=${kmlfiles.fileAbsolutePath}
more field defs
//entity
/entity
/document/dataConfig

and I am using the dataImport option from the admin page  Thanks for any
assistance - I'm on a closed network and getting patches to it are not
trival.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ExtractingRequestHandler-indexing-zip-files-tp4138172p4157650.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Creating Solr servers dynamically in Multicore folder

2014-09-09 Thread Erick Erickson
Well, you already have a core.properties file defined in that
location. I presume you're operating in core discovery mode. Your
cores would all be very confused if new cores were defined over top of
old cores.

It is a little clumsy at this point in that you have to have a conf
directory in place but _not_ a core.properties file to create a core
like this. Config sets will eventually fix this.

Best,
Erick

On Mon, Sep 8, 2014 at 11:00 PM, nishwanth nishwanth.vupp...@gmail.com wrote:
 Hello ,

 I  am using solr 4.8.1 Version and and i am trying to create the cores
 dynamically on server start up using the following piece of code.

  HttpSolrServer s = new HttpSolrServer( url );
 s.setParser(new BinaryResponseParser());
 s.setRequestWriter(new BinaryRequestWriter());
 SolrServer server = s;
 String instanceDir =/opt/solr/core/multicore/;
 CoreAdminResponse e =  new CoreAdminRequest().createCore(name,
 instanceDir,
 server,/opt/solr/core/multicore/solrconfig.xml,/opt/solr/core/multicore/schema.xml);

 I am getting the error

 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error
 CREA
 TEing SolrCore 'hellocore': Could not create a new core in
 /opt/solr/core/multic
 ore/as another core is already defined there
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSo
 lrServer.java:554)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
 er.java:210)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
 er.java:206)
 at
 org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdm
 inRequest.java:503)
 at
 org.apache.solr.client.solrj.request.CoreAdminRequest.createCore(Core
 AdminRequest.java:580)
 at
 org.apache.solr.client.solrj.request.CoreAdminRequest.createCore(Core
 AdminRequest.java:560)
 at
 app.services.OperativeAdminScheduler.scheduleTask(OperativeAdminSched
 uler.java:154)
 at Global.onStart(Global.java:31)

 I am still getting the above error even  though the core0 and core1 folders
 in multicore are deleted and the same is commented in
 /opt/solr/core/multicore/solrconfig.xml. Also i enabled persistent=true in
 the solrconfig.xml



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Creating-Solr-servers-dynamically-in-Multicore-folder-tp4157550.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using def function in fl criteria,

2014-09-09 Thread Erick Erickson
I'm really confused about what you're trying to do here. What do you
intend the syntax
itemDesc:def(name_UK,name_FRA)
to do?

It's also really difficult to say much of anything unless we see the
schema definition for itemDesc and sample input.

Likely you're somehow applying an analysis chain that is truncating
the input. Or it's also possible that you aren't indexing quite what
you think you are.

Best,
Erick

On Tue, Sep 9, 2014 at 4:36 AM, Pigeyre Romain romain.pige...@sopra.com wrote:
 Hi

 I'm trying to use a query with 
 fl=name_UK,name_FRA,itemDesc:def(name_UK,name_FRA)
 As you can see, the itemDesc field (builded by solr) is truncated :

 {
 name_UK: MEN S SUIT\n,
 name_FRA: 24 RELAX 2 BTS ST GERMAIN TOILE FLAMMEE LIN ET SOIE,
 itemDesc: suit
   }

 Do you have any idea to change it?

 Thanks.

 Regards,

 Romain


field specified edismax

2014-09-09 Thread Jae Joo
Any way to apply different edismax parameter to field by field?
For ex.
q=keywords:(lung cancer) AND title:chemotherapy

I would like to apply different qf for fields, keywords and title.
f.keywords.qf=keywords^40 subkeywords^20
f.title.qf=title^80 subtitle^20

I know it can be done by field aliasing, but doesn't like to use field
aliasing.

Thanks,

Jae


Reading files in default Conf dir

2014-09-09 Thread Ramana OpenSource
Hi,

I am trying to load one of the file in conf directory in SOLR, using below
code.

return new HashSetString(new
SolrResourceLoader(null).getLines(stopwords.txt));

The stopwords.txt file is available in the location
solr\example\solr\collection1\conf.

When i debugged the SolrResourceLoader API, It is looking at the below
locations to load the file:

...solr\example\solr\conf\stopwords.txt
...solr\example\stopwords.txt

But as the file was not there in any of above location...it failed.

How to load the files in the default conf directory using
SolrResourceLoader API ?

I am newbie to SOLR. Any help would be appreciated.

Thanks,
Ramana.


Re: [Announce] Apache Solr 4.10 with RankingAlgorithm 1.5.4 available now with complex-lsa algorithm (simulates human language acquisition and recognition)

2014-09-09 Thread Alexandre Rafalovitch
On Tue, Sep 9, 2014 at 1:38 PM, Diego Fernandez difer...@redhat.com wrote:
 Interesting.  Does anyone know how that compares to this 
 http://www.searchbox.com/products/searchbox-plugins/solr-sense/?

Well, for one, the Solr-sense pricing seems to be so sense-tive that
you have to contact the sales team to find it out. The version
announced here is free for public and commercial use AFAIK.

I have not tested either one yet.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: Using def function in fl criteria,

2014-09-09 Thread Pigeyre Romain
I want to return :

-the field name_UK (if it exists)

-Otherwise the name_FRA field
... into an alias field (itemDesc, created at query time).

There is no schema definition for itemDesc because, it is only a virtual field 
declared in fl= criteria. I don't understand while filter is applying to this 
field.

On Tue, Sep 9, 2014 at 17:44 AM, Erick Erickson 
erickerick...@gmail.commailto:erickerick...@gmail.com wrote:

 I'm really confused about what you're trying to do here. What do you
 intend the syntax
 itemDesc:def(name_UK,name_FRA)
 to do?

 It's also really difficult to say much of anything unless we see the
 schema definition for itemDesc and sample input.

 Likely you're somehow applying an analysis chain that is truncating
 the input. Or it's also possible that you aren't indexing quite what
 you think you are.

 Best,
 Erick

 On Tue, Sep 9, 2014 at 4:36 AM, Pigeyre Romain 
 romain.pige...@sopra.commailto:romain.pige...@sopra.com wrote:
  Hi
 
  I'm trying to use a query with 
  fl=name_UK,name_FRA,itemDesc:def(name_UK,name_FRA)
  As you can see, the itemDesc field (builded by solr) is truncated :
 
  {
  name_UK: MEN S SUIT\n,
  name_FRA: 24 RELAX 2 BTS ST GERMAIN TOILE FLAMMEE LIN ET SOIE,
  itemDesc: suit
}
 
  Do you have any idea to change it?
 
  Thanks.
 
  Regards,
 
  Romain




Re: ExtractingRequestHandler indexing zip files

2014-09-09 Thread marotosg
hi keeblerh,

Patch has to be applied to the source code and compile again Solr.war.
If you do that then it works extracting the content of documents

Regards,
Sergio



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ExtractingRequestHandler-indexing-zip-files-tp4138172p4157673.html
Sent from the Solr - User mailing list archive at Nabble.com.


Czech stemmer

2014-09-09 Thread Shamik Bandopadhyay
Hi,

  I'm facing stemming issues with the Czech language search. Solr/Lucene
currently provides CzechStemFilterFactory as the sole option. Snowball
Porter doesn't seem to be available for Czech. Here's the issue.

I'm trying to search for posunout (means move in English) which returns
result, but fails if I use ''posunulo (means moved in English). I used the
following text as field for search.

Pomocí multifunkčních uzlů je možné odkazy mnoha způsoby upravovat. Můžete
přidat a odstranit odkazy, přidat a odstranit vrcholy, prodloužit nebo
přesunout prodloužení čáry nebo přesunout text odkazu. Přístup k požadované
možnosti získáte po přesunutí ukazatele myši na uzel. Z uzlu prodloužení
čáry můžete zvolit tyto možnosti: Protáhnout: Umožňuje posunout prodloužení
odkazové čáry. Délka prodloužení čáry: Umožňuje prodloužit prodloužení
čáry. Přidat odkaz: Umožňuje přidat jednu nebo více odkazových čar. Z uzlu
koncového bodu odkazu můžete zvolit tyto možnosti: Protáhnout: Umožňuje
posunout koncový bod odkazové čáry. Přidat vrchol: Umožňuje přidat vrchol k
odkazové čáře. Odstranit odkaz: Umožňuje odstranit vybranou odkazovou čáru.
Z uzlu vrcholu odkazu můžete zvolit tyto možnosti: Protáhnout: Umožňuje
posunout vrchol. Přidat vrchol: Umožňuje přidat vrchol na odkazovou čáru.
Odstranit vrchol: Umožňuje odstranit vrchol. 

Just wondering if there's a different stemmer available or a way to address
this.

Schema :

fieldType name=text_csy class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true 
analyzer  type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_cz.txt /
filter class=solr.SynonymFilterFactory synonyms=synonyms_csy.txt
ignoreCase=true expand=true/
filter class=solr.CzechStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_cz.txt /
filter class=solr.CzechStemFilterFactory/
/analyzer
/fieldType

Any pointers will be appreciated.

- Thanks,
Shamik


Wildcard in FL parameter not working with Solr 4.10.0

2014-09-09 Thread Mike Hugo
Hello,

With Solr 4.7 we had some queries that return dynamic fields by passing in
a fl=*_exact parameter; this is not working for us after upgrading to Solr
4.10.0.  This appears to only be a problem when requesting wildcarded
fields via SolrJ

With Solr 4.10.0 - I downloaded the binary and set up the example:

cd example
java -jar start.jar
java -jar post.jar solr.xml monitor.xml

In a browser, if I request

http://localhost:8983/solr/collection1/select?q=*:*wt=jsonindent=true
*fl=*d*

All is well with the world:

{responseHeader: {status: 0,QTime: 1,params: {fl: *d,indent: 
true,q: *:*,wt: json}},response: {numFound: 2,start: 0,docs
: [{id: SOLR1000},{id: 3007WFP}]}}

However if I do the same query with SolrJ (groovy script)


@Grab(group = 'org.apache.solr', module = 'solr-solrj', version = '4.10.0')

import org.apache.solr.client.solrj.SolrQuery
import org.apache.solr.client.solrj.impl.HttpSolrServer

HttpSolrServer solrServer = new HttpSolrServer(
http://localhost:8983/solr/collection1;)
SolrQuery q = new SolrQuery(*:*)
*q.setFields(*d)*
println solrServer.query(q)


No fields are returned:

{responseHeader={status=0,QTime=0,params={fl=*d,q=*:*,wt=javabin,version=2}},response={numFound=2,start=0,docs=[*SolrDocument{},
SolrDocument{}*]}}



Any ideas as to why when using SolrJ wildcarded fl fields are not returned?

Thanks,

Mike


Re: How to implement multilingual word components fields schema?

2014-09-09 Thread Paul Libbrecht
Ilia,

one aspect you surely loose with a single field approach is the differentiation 
of semantic fields in different languages for words that sounds the same.
The words sitting and directions are easy example that have fully different 
semantics in French and English, at least.
directions would appear common with, say, teacher advice in English but not 
in French.

I disagree that the storage should be an issue in your case…. most solr 
installations do not suffer from that, as far as I can read the list. 
Generally, you do not need all these stemmed fields to be stored, they're just 
indexed and that is pretty tiny a storage.

Using separate fields also has advantages in terms of IDF, I think.

I do not understand the last question to Tom, he provides URLs to at least one 
of the papers.

Also, if you can put a hand on it, the book of Peters, Braschler, and Clough is 
probably relevant: http://link.springer.com/book/10.1007%2F978-3-642-23008-0 
but, as the first article referenced by Tom says, the CLIR approach here relies 
on parallel corpora, e.g. created by automatic translations.


Paul




On 8 sept. 2014, at 07:33, Ilia Sretenskii sreten...@multivi.ru wrote:

 Thank you for the replies, guys!
 
 Using field-per-language approach for multilingual content is the last
 thing I would try since my actual task is to implement a search
 functionality which would implement relatively the same possibilities for
 every known world language.
 The closest references are those popular web search engines, they seem to
 serve worldwide users with their different languages and even
 cross-language queries as well.
 Thus, a field-per-language approach would be a sure waste of storage
 resources due to the high number of duplicates, since there are over 200
 known languages.
 I really would like to keep single field for cross-language searchable text
 content, witout splitting it into specific language fields or specific
 language cores.
 
 So my current choice will be to stay with just the ICUTokenizer and
 ICUFoldingFilter as they are without any language specific
 stemmers/lemmatizers yet at all.
 
 Probably I will put the most popular languages stop words filters and
 stemmers into the same one searchable text field to give it a try and see
 if it works correctly in a stack.
 Does specific language related filters stacking work correctly in one field?
 
 Further development will most likely involve some advanced custom analyzers
 like the SimplePolyGlotStemmingTokenFilter to utilize the ICU generated
 ScriptAttribute.
 http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/100236
 https://github.com/whateverdood/cross-lingual-search/blob/master/src/main/java/org/apache/lucene/sandbox/analysis/polyglot/SimplePolyGlotStemmingTokenFilter.java
 
 So I would like to know more about those academic papers on this issue of
 how best to deal with mixed language/mixed script queries and documents.
 Tom, could you please share them?



Re: Using def function in fl criteria,

2014-09-09 Thread Chris Hostetter

: I'm trying to use a query with 
fl=name_UK,name_FRA,itemDesc:def(name_UK,name_FRA)
: As you can see, the itemDesc field (builded by solr) is truncated :

functions get their values from the FieldCache (or DocValues if you've 
enabled them) so that they can be efficient across a lot of docs.

based on what you are getting back from the def() funcion, you almost 
certainly have a fieldTYpe for name_UK that uses an analyzer that 
tokenizes the field, so you're getting back one of the indexed terms.

you could theoretically index these fields again using something like 
StrField or KeyworkTokenizerFactory and use that via the def() function -- 
but honestly that's going to be a lot less efficient then just letting 
your client pick bewtween the two values, or writting your own 
DocTransformer to conditionally rename/remove the stored field values you 
don't want...

https://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/response/transform/TransformerFactory.html



-Hoss
http://www.lucidworks.com/


Solr multiple sources configuration

2014-09-09 Thread vineet yadav
Hi,
I am using solr to store data from multiple sources like social media,
news, journals etc. So i am using crawler, multiple scrappers and  apis to
gather data. I want to know which is the  best way to configure solr so
that I can store data which comes from multiple sources.

Thanks
Vineet Yadav


Re: Solr multiple sources configuration

2014-09-09 Thread Jack Krupansky
It is mostly a matter of how you expect to query that data - do you need 
different queries for different sources, or do you have a common conceptual 
model that covers all sources with a common set of queries?


-- Jack Krupansky

-Original Message- 
From: vineet yadav

Sent: Tuesday, September 9, 2014 6:40 PM
To: solr-user@lucene.apache.org
Subject: Solr multiple sources configuration

Hi,
I am using solr to store data from multiple sources like social media,
news, journals etc. So i am using crawler, multiple scrappers and  apis to
gather data. I want to know which is the  best way to configure solr so
that I can store data which comes from multiple sources.

Thanks
Vineet Yadav 



Re: Create collection dynamically in my program

2014-09-09 Thread xinwu
Hi , Jürgen:
Thanks for your reply.
What is the result of the call? Any status or error message?
——The call was ended normally ,and there was no error message.
Did you actually feed data into the collection?
——Yes,I feed data into the daily collection every day.

Thanks!
-Xinwu 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Create-collection-dynamically-in-my-program-tp4156601p4157742.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr WARN Log

2014-09-09 Thread Joseph V J
Thank you for the update Shawn.

~Regards
Joe

On Tue, Sep 9, 2014 at 6:14 PM, Shawn Heisey s...@elyograg.org wrote:

 On 9/9/2014 2:56 AM, Joseph V J wrote:
  I'm trying to upgrade Solr from version 4.2 to 4.9, since then I'm
  receiving the following warning from solr log. It would be great if
 anyone
  could throw some light into it.
 
  Level Logger Message
  WARN ManagedResource *No registered observers for /rest/managed*
 
  OS Used : Debian GNU/Linux 7

 This message comes from the new Schema REST API.  Basically it means you
 haven't configured it.  You can ignore this message.  To get it to go
 away, you would need to configure the new feature.

 https://cwiki.apache.org/confluence/display/solr/Schema+API

 Thanks,
 Shawn




Re: Creating Solr servers dynamically in Multicore folder

2014-09-09 Thread nishwanth
Hello Erick,

Thanks for the response . My cores got created now after removing the
core.properties in this location and the existing core folders . 

Also i commented the core related information on solr.xml . Are there going
to be any further problems with the approach i followed.

For the new cores i created could see the conf,data and core.properties file
getting created.

Thanks..






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Creating-Solr-servers-dynamically-in-Multicore-folder-tp4157550p4157747.html
Sent from the Solr - User mailing list archive at Nabble.com.