Re: Multiword synonym and query expansion

2012-10-17 Thread Bernd Fehling
Have a look at the report about EuroVoc integration into Solr
which gives you an idea about the problems and solutions with
multiword synonyms and query expansion.

http://www.ub.uni-bielefeld.de/~befehl/base/solr/eurovoc.html

Regards
Bernd Fehling


Am 18.10.2012 02:36, schrieb Nicholas Ding:
> Hi guys,
> 
> I'm trying to make query expansion and multiword synonym working at query
> time, and I spent the whole day in digging into source code of Lucene and
> Solr and writing custom tokenizer, filter and even query parser in order to
> make it work. Now I'm bit confused.
> 
> Requirement
> Searching "chinese cuisine", I want expand it to "chinese", "cuisine",
> "cuisine chinese" and "chinese cuisine". And I have synonyms like "chinese
> cuisines, chinese food, chinese dish".
> 
> My Plan
> 
> 
>   
> 
> 
>   
> 
> 
> 
> ExpandableKeywordTokenzierFactory is using a customized Tokenizer, that
> could permutate the words inside token.
> For example:
> Input Token "A B" => Output Token "A", "B", "A B", "B C"
> 
> It works fine even on Solr admin, see attachement. But when I perform the
> search, like q=Keyword:"chinese cuisine", from debug, I saw unexpected
> result.
> parsedquery_toString: Keyword:\"chinese cuisine chinese cuisine cuisine
> chinese\""
> Somehow, the tokens from tokenizer are concatenated.
> 
> Ideally, if Tokenizer works and do produce tokens, I can pass it to
> SynonymFilterFactory to apply synonyms.
> 
> I think I can write QParserPlugin to solve this problem by expanding the
> query before it goes into fieldType, but if that can be solved in
> Tokenizer, that could be great.
> 
> Thanks
> Nicholas
> 


Replication didn't work immediately after leader's SolrCore reload.

2012-10-17 Thread Minoru Osuka
Hi,

I am facing replication problem.
I had added a shard replica after the leader's core had been reloaded. I
had expected to start index replication, but it hadn't worked.
Please give me some workaround advice.

My operation commands are following.

ZooKeeper localhost:2181
Solr Shard1 Leader localhost:8983
Solr Shard1 Replica localhost:7574

1. Start Leader. (localhost:8983)
   $ java -Djetty.port=8983 -DzkHost=localhost:2181
-Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=configuration1 -DnumShards=1 -jar start.jar

2. Update the index. (localhost:8983)
   $ curl "http://localhost:8983/solr/update/?commit=true"; -H
"Content-Type: text/xml" --data-binary @./example/exampledocs/hd.xml

3. Reload Leader. (localhost:8983)
   $ curl "
http://localhost:8983/solr/admin/cores?action=RELOAD&core=collection1";

4. Start Replica. (localhost:7574)## I had expected to start index
replication, but it hadn't worked.
   $ java -Djetty.port=7574 -DzkHost=localhost:2181 -jar start.jar


Regards,
Minoru


-- 
---
Minoru Osuka
minoru.os...@gmail.com


Re: Query related to data source

2012-10-17 Thread Chris Hostetter

: I have installed lucidworks enterprise v2.1. In that, I want to create XML 
data source.
: But on the data source page I am unable to find the Solr XML in the dropdown 
list.
: Could you help me in this..??

Leena, this appears to be exactly the same as the question you posted 
yesterday, which Erik already answered...

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201210.mbox/%3c127a8876-f483-42a3-a12f-ce01c06d0...@gmail.com%3E


-Hoss


Re: 404 error for http://host:port/solr1-newmaster/dataimport

2012-10-17 Thread Shawn Heisey

On 10/17/2012 1:24 PM, srinalluri wrote:
Hi I am new to Solr 4.0, I am familiar with solr 3.6. I have set solr 
4.0. I am getting 'There are no SolrCores running. ' message for 
http://host:port/solr1-newmaster/ URL. What does it mean, what should 
I do? I am getting 404 error for this : /solr1-newmaster/dataimport 
Does it mean it is not recognizing conf files? My conf files are at 
http://host:port/newmaster/collection1/conf/


Check your log.  Exactly where the log resides is going to depend on 
which servlet container you are using and how your system is 
configured.  With the example jetty in its default configuration, the 
log is sent to stderr.


I ran into this error myself when migrating my config from 3.5 to 4.0.  
It happened when none of my cores were able to start up because of 
configuration errors or missing dependent jars.


Thanks,
Shawn



Re: What does _version_ field used for?

2012-10-17 Thread Shawn Heisey

On 10/17/2012 11:37 AM, Nicholas Ding wrote:

I have the same problem, does it mean I have to put _version_ field in
every schema.xml?


You would only *need* the field if updateLog is turned on in your 
updateHandler.  If you have the field in your schema but updateLog is 
turned off, I would bet that you'd never notice any difference in index 
size or performance.


Thanks,
Shawn



Query related to data source

2012-10-17 Thread Leena Jawale
Hi,

I have installed lucidworks enterprise v2.1. In that, I want to create XML data 
source.
But on the data source page I am unable to find the Solr XML in the dropdown 
list.
Could you help me in this..??

Thanks & regards,
Leena Jawale



The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L&T Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"


Re: Solr reports: "Can not read response from server" when running import

2012-10-17 Thread Romita Saha
Hi Shawn,

Thanks a lot for your guidance. I can import the database successfully 
now. Thanks once again.

Regards,
Romita Saha



From:   Shawn Heisey 
To: solr-user@lucene.apache.org, 
Date:   10/17/2012 09:49 PM
Subject:Re: Solr reports: "Can not read response from server" when 
running import



On 10/17/2012 12:29 AM, Romita Saha wrote:
> Hi Dave,
>
> I followed your guidance and loaded my database in MySQL. Presently the
> url reads like this:
>
> url = "jdbc:mysql://localhost:8983/var/lib/mysql/camerasys"
>
> The bin address in my.cnf file is :
> bind-address = 127.0.0.1
>
> However the issue still persists.  Kindly help me find out the issue. 
The
> error log is stated below.
>
> Caused by: com.mysql.jdbc.CommunicationsException: Communications link
> failure due to underlying exception:

Typically MySQL listens on port 3306, and if you haven't changed it from 
the default, you shouldn't even need to include it.  Your URL then needs 
to have the name of the database (schema), not the path to where MySQL 
is storing it.Port 8983 is Solr's port if you run under the included 
jetty container.

It looks like you probably can use this, assuming you named the database 
camerasys:

url="jdbc:mysql://localhost/camerasys"

Here's how my dataimport source is defined.  I pass the database host 
and schema in via the dataimport request URL, and I include the port 
number even though I don't have to:

   

Thanks,
Shawn




Re: DIH scheduling

2012-10-17 Thread Erick Erickson
Not quite. That patch has been submitted, but not compiled into the code.

In the meantime you can get by just by having whatever task scheduler
you have access to submit the HTTP request for DIH whenever you
need to, cron on *nix systems, I know Windows systems have something
that might work as well.

I guess you could also get the 4.0 code, apply the patch and work it that way...


Best
Erick

On Wed, Oct 17, 2012 at 6:07 PM, Kiran J  wrote:
> Hi everyone,
>
> Does Solr have out of the box data import handler scheduling ? This link
> looks like I need to run an additional JAR.
>
> http://wiki.apache.org/solr/DataImportHandler?highlight=%28%28DataImportHandler%29%29#Scheduling
>
> I need to invoke the import from .Net environment, so I'd like to avoid any
> non-Solr code. Any help is much appreciated.
>
> Thanks
> Kiran


Re: Sorl 4.0: ClassNotFoundException DataImportHandler

2012-10-17 Thread Chris Hostetter

: 
: 
: And I have all the dist jar files in dist folder. I restarted the tomcat,
: why I am still getting this error:
: 
: java.lang.ClassNotFoundException:
: org.apache.solr.handler.dataimport.DataImportHandler

is the path "../../dist/" correct relative to the instanceDir for your 
solr core?

you should see log messages when the SolrCore is loaded listing every 
plugin jar being added, what do those look like?

For example, when running the DIH example from 4.0...

hossman@frisbee:~/lucene/branch_4_0/solr/example$ java 
-Dsolr.solr.home=example-DIH/solr/ -jar start.jar 
...
Oct 17, 2012 6:05:42 PM org.apache.solr.core.CoreContainer create
INFO: Creating SolrCore 'db' using instanceDir: example-DIH/solr/db
Oct 17, 2012 6:05:42 PM org.apache.solr.core.SolrResourceLoader 
INFO: new SolrResourceLoader for directory: 'example-DIH/solr/db/'
Oct 17, 2012 6:05:42 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/home/hossman/lucene/branch_4_0/solr/example/example-DIH/solr/db/lib/.svn/'
 to classloader
Oct 17, 2012 6:05:42 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/home/hossman/lucene/branch_4_0/solr/example/example-DIH/solr/db/lib/hsqldb-1.8.0.10.jar'
 to classloader
Oct 17, 2012 6:05:42 PM org.apache.solr.core.SolrConfig initLibs
INFO: Adding specified lib dirs to ClassLoader
Oct 17, 2012 6:05:42 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/home/hossman/lucene/branch_4_0/solr/dist/apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar'
 to classloader
Oct 17, 2012 6:05:42 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/home/hossman/lucene/branch_4_0/solr/dist/apache-solr-dataimporthandler-4.0-SNAPSHOT.jar'
 to classloader
Oct 17, 2012 6:05:42 PM org.apache.solr.core.SolrConfig 
INFO: Using Lucene MatchVersion: LUCENE_40









-Hoss


Re: Re: how solr4.0 and zookeeper run on weblogic

2012-10-17 Thread rayvicky
i make it work on weblogic.
but when i add or update index  ,it error

  
<2012-10-17 ?Χ03?47·?3? CST> 
<2012-10-17 ?Χ03?47·?3? CST>
<[weblogic.servlet.internal.WebAppServletContext@425eab87 - appName: 'solr', 
name: 'solr', context-path: '/solr', spec-version: '2.5'] Servlet failed with 
Exception
java.lang.IllegalStateException: Failed to retrieve session: Cannot parse POST 
parameters of request: '/solr/collection1/update'
at 
weblogic.servlet.security.internal.SecurityModule.getUserSession(SecurityModule.java:486)
at 
weblogic.servlet.security.internal.ServletSecurityManager.checkAccess(ServletSecurityManager.java:81)
at 
weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2116)
at 
weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
at 
weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
Truncated. see log file for complete stacktrace
> 
<2012-10-17 ?Χ03?47·?3? CST> 

how to handle it ?

thanks,
ray.


2012-10-18 



zongweilei 



发件人: Jan_Høydahl_/_Cominvent_[via_Lucene] 
发送时间: 2012-10-17  23:13:10 
收件人: rayvicky 
抄送: 
主题: Re: how solr4.0 and zookeeper run on weblogic 
 
Did it work for you? You probably also have to set -Djetty.port=8080 in order 
for local ZK not to be started on port 9983. It's confusing, but you can also 
edit solr.xml to achieve the same. 

-- 
Jan H酶ydahl, search solution architect 
Cominvent AS - www.cominvent.com 
Solr Training - www.solrtraining.com 

17. okt. 2012 kl. 10:06 skrev rayvicky <[hidden email]>: 

> thanks 
> 
> 
> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-solr4-0-and-zookeeper-run-on-weblogic-tp4013882p4014167.html
> Sent from the Solr - User mailing list archive at Nabble.com. 






If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/how-solr4-0-and-zookeeper-run-on-weblogic-tp4013882p4014263.html
 
To unsubscribe from how solr4.0 and zookeeper run on weblogic, click here.
NAML 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-solr4-0-and-zookeeper-run-on-weblogic-tp4013882p4014382.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Disable Caching

2012-10-17 Thread Chris Hostetter

: If you are not searching against your master, and you shouldn't (and
: it sounds like you aren't), then you don't have to worry about
: disabling caches - they will just remain empty.  You could comment
: them out, but I think that won't actually disable them.

FWIW: what i generall advocate is that even if you have a machine that 
*should* never get queries (ie: master, repeater, backup, ... whatever) 
it's still a good idea to leave small caching enabled on that machine.

the reason being that *IF* some rogue client mistakenly starts sending 
requests to this server they are not suppose to be sending requests to, 
then i would rather those requests get cached to help reduce the likelyood 
of the machine rolling over and dieing under load -- but i advise 
disabling all the autowarming and any explicit newSearcher warming you 
might normally confugre so that if there is a blip and this situation does 
happen, then newSearcher events aren't delayed dealing with caching 
warming over and over again even long after the rogue client gets stopped.

Related suggestion: disable all the requestHandler names you clients 
nomraly query in your "master" solrconfig.xml, and only expose on using a 
really bizare unlikely name so you have someway to query that index for 
debugging ... that way if clients that normally query 
"http://slave:8983/solr/products/search?q=..."; get misconfigured to hit 
"http://master:8983/solr/products/search?q=..."; they'll get a 404.  but 
you can still use 
"http://master:8983/solr/products/secret-debug-search?q=..."; as needed.


-Hoss


CheckIndex question

2012-10-17 Thread Jie Sun
Hi -

with a corrupted core, 

1. if I run CheckIndex with -fix, it will drop the hook to the corrupted
segment, but the segment files are still there, when we have a lot of
corrupted segments, we have to manually pick them out and remove them, is
there a way the tool can suffix them or prefix them so it is easier to be
cleaned out?

2. we know the doc count in the corrupted segment, is it easy also output
the doc id on those docs?

thanks
Jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CheckIndex-question-tp4014366.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to do exact match?

2012-10-17 Thread Chris Hostetter

: I can put "chinese cuisines" and "chinese cuisine" as two different tokens.
: But I was wondering if there is better way to do it, like tweaking
: different FilterFactory.
: 
: My problem is, if it's not exact match, when I search "cuisine", that would
: match both, I don't want that happen.

this is where lots of specifics matter...

it sounds like what *you* mean by "exact match" is...

 * i want to be able to use analayzers that do stemming (and maybe 
lowercasing, and maybe stopwords, and maybe synonyms)
 * i want queries to match documents only if all of the query words are in 
the doc field in order
 * i only want documents to match if there are no other words in the 
doc field besides the words in the query.

does that sound about right?

If so, then KeywordTokenizer isn't going to help you.

i think the simplest way to do what you want is what you alluded to about 
inserting marker tokens at the begining and end of your field values when 
indexing, and then do "phrase queries" that include those marker tokens.

you could probably use something like PatternReplaceCharFilter to inject 
the start/end tokens fairly easily in both our index & query analyzer -- 
just make sure you don't pick something that would get removed by another 
tokenfilter later.

once you have that in place, using something like the "FieldQParser" may 
be the easiest way to generate the phrase queries...

   q={!field f=name}chinese cuisine&debugQuery=true


-Hoss


Semantics of group.ngroups for distributed queries (BUG or weak doc?)

2012-10-17 Thread Jack Krupansky
(Yonick?)

I want to do a distributed grouping query over multiple shards, using 
group.ngroups to find the total number of groups. It seems to be giving me the 
sum of ngroups for each shard, rather than the count of the union of the groups 
from each shard. Is this a bug or the “expected” behavior? I read the wiki 
carefully and didn’t see any disclaimer about ngroups for distributed search, 
so that suggests that it is a “bug”, but wikis tend to be unreliable 
“contracts.” There have been several recent Jira issues in this area 
(SOLR-3109, SOLR-3316, SOLR-3436), but none seemed specific to my scenario.

In my test case, my first shard has 4 groups and the second shard has 5 groups, 
with some groups overlapping. The total number of groups is 6, but Solr reports 
an ngroups value of 9.

Over-simplifying, my first shard has c1, c2, c3, c5 and my second shard has c1, 
c2, c3, c4, c6.

Note: The actual groups returned by the query are correct and as expected, 6 of 
them: c1, c2, c3, c5, c4, c6 when the query is send to the first node and c1, 
c2, c3, c4, c6, c5 when the query is sent to the second node.

I did find SOLR-2066 (Search Grouping: support distributed search) which has 
this comment: “It is important that all documents of one group are in the same 
shard. Otherwise the groupCount will be incorrect”, which seems to describe 
what I am seeing. But, a random comment in a Jira does not constitute a 
contract.

See:
https://issues.apache.org/jira/browse/SOLR-2066

I’ll file a Jira (bug), but only if nobody can convince me that 9 is the 
correct answer for my ngroups scenario. But if the comment in 2066 is 
“correct”, maybe it will be an “improvement” issue.

-- Jack Krupansky

Re: Error: _version_field must exist in schema

2012-10-17 Thread Rafał Kuć
Hello!

You can some find information about requirements of SolrCloud at
http://wiki.apache.org/solr/SolrCloud . I don't know if _version_ is
mentioned elsewhere.

As for Websolr - I'm afraid I can't say anything about the cause of
those errors without seeing the exception. 


-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> On Thu, Oct 18, 2012 at 12:09 AM, Rafał Kuć  wrote:
>> Hello!
>>
>> The _version_ field is needed by some of Solr 4.0 functionality like
>> transaction log or partial documents update. If you want to use them,
>> just update your schema.xml and put the _version_ field definition
>> there.
>>
>> However if you don't want those, you can remove the transaction log
>> configuration in your solrconfig.xml. However please remember that
>> when using SolrCloud you'll need that field.
>>

> Thanks. Where is that bit documented? I don't see it on the Solr wiki:
> http://wiki.apache.org/solr/SchemaXml

> I do have a Solr 4 Beta index running on Websolr that does not have
> such a field. It works, but throws many "Service Unavailable" and
> "Communication Error" errors. Might the lack of the _version_ field be
> the reason?

> Thanks.



Re: Error: _version_field must exist in schema

2012-10-17 Thread Dotan Cohen
On Thu, Oct 18, 2012 at 12:09 AM, Rafał Kuć  wrote:
> Hello!
>
> The _version_ field is needed by some of Solr 4.0 functionality like
> transaction log or partial documents update. If you want to use them,
> just update your schema.xml and put the _version_ field definition
> there.
>
> However if you don't want those, you can remove the transaction log
> configuration in your solrconfig.xml. However please remember that
> when using SolrCloud you'll need that field.
>

Thanks. Where is that bit documented? I don't see it on the Solr wiki:
http://wiki.apache.org/solr/SchemaXml

I do have a Solr 4 Beta index running on Websolr that does not have
such a field. It works, but throws many "Service Unavailable" and
"Communication Error" errors. Might the lack of the _version_ field be
the reason?

Thanks.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Error: _version_field must exist in schema

2012-10-17 Thread Rafał Kuć
Hello!

The _version_ field is needed by some of Solr 4.0 functionality like
transaction log or partial documents update. If you want to use them,
just update your schema.xml and put the _version_ field definition
there.

However if you don't want those, you can remove the transaction log
configuration in your solrconfig.xml. However please remember that
when using SolrCloud you'll need that field. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> On a stock Solr 4 install, I can run the server fine if I don't change
> any config files. When I replace the example
> solr/collection1/conf/schema.xml I get the following error:
> collection1:
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Unable to use updateLog: _version_field must exist in schema, using
> indexed="true" stored="true" and multiValued="false" (_version_ does
> not exist)

> Here is the replacement schema:
> 
> 
> 
> >
> 
> 
>  multiValued="false" indexed="true" required="true"/>
>  multiValued="false" indexed="true"/>
> 
> text
> 
> id
> 


> I see that the original schema.xml had a _version_ field, but how does
> Solr know that? I did remove the original index thinking that maybe
> the extant documents were crying for the _version_ field, but removing
> that index did not help. Grepping for _version_ doesn't show it
> existing anywhere else in the collection1 index:

> ~/apache-solr-4.0.0/example$ grep -ir "_version_" *
> multicore/core1/conf/schema.xml:   indexed="true"  stored="true"/>
> multicore/core0/conf/schema.xml:  indexed="true"  stored="true"/>
> solr/collection1/conf/schema.xml.ORIGINAL:  trailing underscores
> (e.g. _version_) are reserved.
> solr/collection1/conf/schema.xml.ORIGINAL:type="long" indexed="true" stored="true"/>

> Where might the problem lie? And other than the index and schema.xml
> (and possibly solrconfig.xml) what else should I purge to get a
> "clean" index?

> Thanks.



DIH scheduling

2012-10-17 Thread Kiran J
Hi everyone,

Does Solr have out of the box data import handler scheduling ? This link
looks like I need to run an additional JAR.

http://wiki.apache.org/solr/DataImportHandler?highlight=%28%28DataImportHandler%29%29#Scheduling

I need to invoke the import from .Net environment, so I'd like to avoid any
non-Solr code. Any help is much appreciated.

Thanks
Kiran


Re: How to do exact match?

2012-10-17 Thread Jack Krupansky

You can use a "stemmer" to match very similar words, such as plurals.

See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming

Look at the "text_en" field type in the example schema.

-- Jack Krupansky

-Original Message- 
From: Nicholas Ding

Sent: Wednesday, October 17, 2012 1:25 PM
To: solr-user@lucene.apache.org
Subject: Re: How to do exact match?

Hi,

I have several tokens in a field containing more than one keyword, like
"chinese food", "indian food", I want to do exact match.
And even more, if the token is "chinese cuisine", but the query is
""chinese cuisines", I still want the query to match to token.

Thanks
Nicholas

On Wed, Oct 17, 2012 at 11:46 AM, Jack Krupansky 
wrote:



The answer is "Yes." Solr and Lucene are quite flexible.

You neglected to offer any details about your specific use case which
might bias the answer one way or the other. What is some sample data and
some sample queries?

-- Jack Krupansky

-Original Message- From: Nicholas Ding
Sent: Wednesday, October 17, 2012 11:24 AM
To: solr-user@lucene.apache.org
Subject: How to do exact match?


Hello,

I want to do exact match on Solr. I found two ways on Internet, one is to
put PREFIX and SUFFIX around the text, another is to use KeywordTokenizer.

I was wondering which one is the better approach for doing exact match?

Thanks
Nicholas





Sorl 4.0: ClassNotFoundException DataImportHandler

2012-10-17 Thread srinalluri
I have the following line in solrconfig.xml.



And I have all the dist jar files in dist folder. I restarted the tomcat,
why I am still getting this error:

java.lang.ClassNotFoundException:
org.apache.solr.handler.dataimport.DataImportHandler

thanks
Srini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorl-4-0-ClassNotFoundException-DataImportHandler-tp4014348.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Disable Caching

2012-10-17 Thread Anderson vasconcelos
Thanks for the replies.



2012/10/17 Otis Gospodnetic 

> Hi,
>
> If you are not searching against your master, and you shouldn't (and
> it sounds like you aren't), then you don't have to worry about
> disabling caches - they will just remain empty.  You could comment
> them out, but I think that won't actually disable them.
>
> Warmup queries you can just comment our in solrconfig.xml.
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Wed, Oct 17, 2012 at 12:25 PM, Anderson vasconcelos
>  wrote:
> > Hi
> >
> > I have a server that just index data and sincronize this data to others
> > slaves. In my arquitecture, i have a one master server that only receive
> > index requests and n slaves that receives only search requests.
> >
> > I wanna to disable the cache of the master server, because they not
> receive
> > a search request, this is the best way? I can do this?
> >
> > Wat about warmingSearch, i must disable this too?
> >
> > I'm using solr 3.6.0
> >
> > Thanks
>


Re: Flushing RAM to disk

2012-10-17 Thread Lance Norskog
I do not know how to load an index from disk into a RAMDirectory in Solr.

- Original Message -
| From: "deniz" 
| To: solr-user@lucene.apache.org
| Sent: Wednesday, October 17, 2012 12:15:52 AM
| Subject: Re: Flushing RAM to disk
| 
| I heard about MMapDirectory - actually my test env is using that- ,
| but the
| question was just an idea... and how about using SolrCloud? I mean
| can we
| set shards to use ram and replicas to use MMapDirectory? is this
| possible?
| 
| 
| 
| -
| Zeki ama calismiyor... Calissa yapar...
| --
| View this message in context:
| http://lucene.472066.n3.nabble.com/Flushing-RAM-to-disk-tp4014128p4014155.html
| Sent from the Solr - User mailing list archive at Nabble.com.
| 


Re: Datefaceting on multiple value in solr

2012-10-17 Thread Chris Hostetter

: currently i have to do 3 request for all three values of name, with passing
: "name" as fq parameter and given facet date range.

Got it, the example of what you are looking for makes perfect sense.

In general, faceting on the same field in different ways isn't supported 
(SOLR-1351 is an open issue for it but it needs a lot of tests/work to) 
but in your specific use case where all you want to change is the "base 
set" of documents (but keep all of the other facet options the same) then 
what you are looking for should be possible -- the key is to take 
advantage of the "tagging and excluding" option for ignoring certian fq 
values when faceting...

https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters

Here is an example of what a request with taging and exclusions might look 
like on date faceting so that you get 3 different sets of date constraint 
counts for the same field/ranges, using different filters...

http://localhost:8983/solr/select?q=*:*&fq={!tag=fqa}inStock:true&fq={!tag=fqb}inStock:false&fq={!tag=fqc}popularity:6&facet=true&facet.range={!key=a%20ex=fqb,fqc}manufacturedate_dt&facet.range={!key=b%20ex=fqa,fqc}manufacturedate_dt&facet.range={!key=c%20ex=fqa,fqb}manufacturedate_dt&facet.range.gap=%2B1YEAR&facet.range.start=NOW/YEAR-10YEARS&facet.range.end=NOW/YEAR%2B1YEAR

...the caveat being that your main result set still has all of the filters 
applied (but it sounds like that doesn't matter for your usecase)

One final note...

: currently i have to do 3 request for all three values of name, with passing
: "name" as fq parameter and given facet date range.

...even if your real usecase is more complicated then the example you 
described, and can't be satisfied using taging & excluding filter queries, 
doing multiple requests to get this kind of info isn't as bad as it may 
sound: using http keep-alive will eliminat the overhead of th multiple 
GET requests and the filter cache will ensure that solr doesn't do 
unneccessary work computing the same document sets again in the second & 
third requests.


-Hoss


Re: WordDelimiterFilter and the dot character

2012-10-17 Thread Farkas István
Just for the archives - the removal of the "preservereOriginal" from the 
"query" analizer solved my problem.


Thank you, Jack!

You need to have separate "index" and "query" analyzers for that field 
type. The "query" analyzer would not have preserveOriginal="1", which 
would generate an extra term that would not match the exact term 
sequence that was indexed.


A query of "123 2012" would not split any terms and hence not generate 
the extra "preserved" term.


But a query of "123/2012" would actually query "123/2012 123 2012", 
which is not a term sequence that was indexed.


-- Jack Krupansky

-Original Message- From: Farkas István
Sent: Wednesday, October 17, 2012 8:58 AM
To: solr-user@lucene.apache.org
Subject: WordDelimiterFilter and the dot character

Hello,

I've ran into an interesting problem. I am using Solr 3.5 on an Ubuntu
server.

I have some data with a code field, which contains some identifiers
(mostly) in the following format: E.123/2012.

I've set up a fieldType for this code field:

|positionIncrementGap="100">



generateNumberParts="1" splitOnNumerics="1" preserveOriginal="1" />


  

|

If I search for the exact code ("E.123/2012."), I will get the expected
result. If I search for "123 2012", I also get the expected results. If
I search for the "123/2012" string, the result set is empty. Tried it
with catenateNumbers and catenateWords enabled, with the same results.

The interesting thing here is that using the field analysis tool, the
123/2012 gives a match if I select the "highlight matches" option. But
the same query yields nothing when I try to use it in the query debug
tool in the Solr admin. The query works if I use a wilcard search
(*123/2012*), but I would like to avoid that. What do I miss here?

Regards,
  Istvan




404 error for http://host:port/solr1-newmaster/dataimport

2012-10-17 Thread srinalluri
Hi I am new to Solr 4.0, I am familiar with solr 3.6.

I have set solr 4.0. 

I am getting 'There are no SolrCores running. ' message for
http://host:port/solr1-newmaster/ URL. What does it mean, what should I do?

I am getting 404 error for this  : /solr1-newmaster/dataimport
Does it mean it is not recognizing conf files? My conf files are at 
http://host:port/newmaster/collection1/conf/

thanks
Srini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/404-error-for-http-host-port-solr1-newmaster-dataimport-tp4014330.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Any filter to map mutiple tokens into one ?

2012-10-17 Thread T. Kuro Kurosaka

Jack,
Now I have a direct evidence.  I ran Solr under Eclipse and caught that 
NGramTokenizer was called from the edismax query parser with "*:*" as 
input text.

I have filed a bug with jira:
https://issues.apache.org/jira/browse/SOLR-3962

On 10/15/12 11:32 AM, T. Kuro Kurosaka wrote:

On 10/15/12 10:35 AM, Jack Krupansky wrote:
And you're absolutely certain you see "*:*" being passed to your 
analyzer in the final release of Solr 4.0???
I don't have a direct evidence. This is the only theory I have that 
explains why changing FieldType causes the sub-optimal scores.
If you know of a way to tell if a tokenizer is really invoked, let me 
know.





Re: solr4.0 LimitTokenCountFilterFactory NumberFormatException

2012-10-17 Thread Chris Hostetter

: Anybody want to guess what's wrong with this code:

Grrr... 

that's a abomination.

https://issues.apache.org/jira/browse/SOLR-3961


-Hoss


Re: solr4.0 LimitTokenCountFilterFactory NumberFormatException

2012-10-17 Thread Dirk Högemann
:-) great solution...will look funny in our production system.
Am 17.10.2012 16:12 schrieb "Jack Krupansky" :

> Anybody want to guess what's wrong with this code:
>
> String maxTokenCountArg = args.get("maxTokenCount");
> if (maxTokenCountArg == null) {
>  throw new IllegalArgumentException("**maxTokenCount is mandatory.");
> }
> maxTokenCount = Integer.parseInt(args.get(**maxTokenCountArg));
>
> Hmmm... try this "workaround":
>
>  maxTokenCount="foo" foo="1"/>
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Wednesday, October 17, 2012 11:50 AM
> To: solr-user@lucene.apache.org
> Subject: solr4.0 LimitTokenCountFilterFactory NumberFormatException
>
> Hi,
>
> I am trying to upgrade from Solr 3.5 to Solr 4.0.
> I read the following in the example solrconfig:
>
> 
>
> I tried that as follows:
>
> ...
>  positionIncrementGap="100">
>  
>
> maxTokenCount="10"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>
> language="German"
> />
> words="stopwords.txt" enablePositionIncrements="**true" />
>
>  
> ...
>
> The LimitTokenCountFilterFactory configured like that crashes the startup
> of the corresponding core with the following exception (without the Factory
> the core startup works):
>
>
> 17.10.2012 17:44:19 org.apache.solr.common.**SolrException log
> SCHWERWIEGEND: null:org.apache.solr.common.**SolrException: Plugin init
> failure for [schema.xml] fieldType "textgen": Plugin init failure for
> [schema.xml] analyze
> r/filter: null
>at
> org.apache.solr.util.plugin.**AbstractPluginLoader.load(**
> AbstractPluginLoader.java:177)
>at
> org.apache.solr.schema.**IndexSchema.readSchema(**IndexSchema.java:369)
>at org.apache.solr.schema.**IndexSchema.(**
> IndexSchema.java:113)
>at org.apache.solr.core.**CoreContainer.create(**
> CoreContainer.java:846)
>at org.apache.solr.core.**CoreContainer.load(**
> CoreContainer.java:534)
>at org.apache.solr.core.**CoreContainer.load(**
> CoreContainer.java:356)
>at
> org.apache.solr.core.**CoreContainer$Initializer.**
> initialize(CoreContainer.java:**308)
>at
> org.apache.solr.servlet.**SolrDispatchFilter.init(**
> SolrDispatchFilter.java:107)
>at
> org.apache.catalina.core.**ApplicationFilterConfig.**initFilter(**
> ApplicationFilterConfig.java:**277)
>at
> org.apache.catalina.core.**ApplicationFilterConfig.**getFilter(**
> ApplicationFilterConfig.java:**258)
>at
> org.apache.catalina.core.**ApplicationFilterConfig.**setFilterDef(**
> ApplicationFilterConfig.java:**382)
>at
> org.apache.catalina.core.**ApplicationFilterConfig.**
> (ApplicationFilterConfig.java:**103)
>at
> org.apache.catalina.core.**StandardContext.filterStart(**
> StandardContext.java:4638)
>at
> org.apache.catalina.core.**StandardContext.startInternal(**
> StandardContext.java:5294)
>at
> org.apache.catalina.util.**LifecycleBase.start(**LifecycleBase.java:150)
>at
> org.apache.catalina.core.**ContainerBase.**addChildInternal(**
> ContainerBase.java:895)
>at
> org.apache.catalina.core.**ContainerBase.addChild(**
> ContainerBase.java:871)
>at
> org.apache.catalina.core.**StandardHost.addChild(**StandardHost.java:615)
>at
> org.apache.catalina.startup.**HostConfig.deployDescriptor(**
> HostConfig.java:649)
>at
> org.apache.catalina.startup.**HostConfig$DeployDescriptor.**
> run(HostConfig.java:1581)
>at
> java.util.concurrent.**Executors$RunnableAdapter.**
> call(Executors.java:441)
>at
> java.util.concurrent.**FutureTask$Sync.innerRun(**FutureTask.java:303)
>at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
>at
> java.util.concurrent.**ThreadPoolExecutor$Worker.**
> runTask(ThreadPoolExecutor.**java:886)
>at
> java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.**java:662)
> Caused by: org.apache.solr.common.**SolrException: Plugin init failure for
> [schema.xml] analyzer/filter: null
>at
> org.apache.solr.util.plugin.**AbstractPluginLoader.load(**
> AbstractPluginLoader.java:177)
>at
> org.apache.solr.schema.**FieldTypePluginLoader.**readAnalyzer(**
> FieldTypePluginLoader.java:**377)
>at
> org.apache.solr.schema.**FieldTypePluginLoader.create(**
> FieldTypePluginLoader.java:95)
>at
> org.apache.solr.schema.**FieldTypePluginLoader.create(**
> FieldTypePluginLoader.java:43)
>at
> org.apache.solr.util.plugin.**AbstractPluginLoader.load(**
> AbstractPluginLoader.java:151)
>... 25 more
> Caused by: java.lang.**NumberFormatException: null
>at java.lang.Integer.parseInt(**Integer.java:417)
>at java.lang.Integer.parseInt(**Integer.java:499)
>at
> org.apache.lucene.analys

Re: What does _version_ field used for?

2012-10-17 Thread Nicholas Ding
I have the same problem, does it mean I have to put _version_ field in
every schema.xml?

Thanks
Nicholas

On Wed, Oct 17, 2012 at 3:44 AM, Jun Wang  wrote:

> Ok, I got it, thanks
>
> 2012/10/17 Alexandre Rafalovitch 
>
> > Yes, just make sure you have it in the scheme. Solr handles the rest.
> >
> > Regards,
> >Alex.
> > Personal blog: http://blog.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all
> > at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > book)
> >
> >
> > On Wed, Oct 17, 2012 at 12:57 PM, Jun Wang  wrote:
> > > Is that said we just need to add this filed, and there is no more work?
> > >
> > > 2012/10/17 Rafał Kuć 
> > >
> > >> Hello!
> > >>
> > >> It is used internally by Solr, for example by features like partial
> > >> update functionality and update log.
> > >>
> > >> --
> > >> Regards,
> > >>  Rafał Kuć
> > >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > ElasticSearch
> > >>
> > >> > I ma moving to solr4.0 from beta version. There is a exception was
> > >> thrown,
> > >>
> > >> > Caused by: org.apache.solr.common.SolrException: _version_field must
> > >> exist
> > >> > in schema, using indexed="true" stored="true" and
> multiValued="false"
> > >> > (_version_ does not exist)
> > >> > at
> > >> >
> > >>
> >
> org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> > >> > at org.apache.solr.core.SolrCore.(SolrCore.java:606)
> > >> > ... 26 more
> > >> > 2
> > >>
> > >> > It's seem that there need a field like
> > >> >  
> > >> > in schema.xml. I am wonder what does this used for?
> > >>
> > >>
> > >
> > >
> > > --
> > > from Jun Wang
> >
>
>
>
> --
> from Jun Wang
>


Re: SolrCloud is not returning all results

2012-10-17 Thread Scott Carlson

I was adding the docs via command line, with a commit

curl http://host:8983/solr/update/?commit=true -H "Content-Type: 
text/xml" --data-binary 'name="id">9235647339name="name">8983';


In some more testing... it does look like issuing another commit after 
this seems to "really" commit.

curl http://victor:8983/solr/update/?commit=true



On 10/17/2012 12:06 PM, Mark Miller wrote:

How are you issuing the commit that makes the docs visible?

On Wed, Oct 17, 2012 at 12:15 PM, Scott Carlson
  wrote:

I'm trying a very simple setup, and I'm not getting the results I would
expect.

Starting from : https://wiki.apache.org/solr/SolrCloud  example C.   I have
4 instances running locally.
I verify the cloud setup in the UI, that there are two servers per shard,
and they are all green.

I added a document to each of the URLs (four in total).  I then use one of
the server UIs to query for all the docs.   I would expect that executing a
"*:*" query should ALWAYS return 4 hits.  Instead it seems to randomly
return 4, 3, or even 2.

Any ideas?






Re: How to do exact match?

2012-10-17 Thread Nicholas Ding
I can put "chinese cuisines" and "chinese cuisine" as two different tokens.
But I was wondering if there is better way to do it, like tweaking
different FilterFactory.

My problem is, if it's not exact match, when I search "cuisine", that would
match both, I don't want that happen.

Thanks
Shunjia

On Wed, Oct 17, 2012 at 1:28 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> Aren't you contradicting yourself a bit here?
> You say you want exact matching, but then you say you want a query for
> "chinese cuisines" to match "chinese cuisine", which is not exact.
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Wed, Oct 17, 2012 at 1:25 PM, Nicholas Ding 
> wrote:
> > Hi,
> >
> > I have several tokens in a field containing more than one keyword, like
> > "chinese food", "indian food", I want to do exact match.
> > And even more, if the token is "chinese cuisine", but the query is
> > ""chinese cuisines", I still want the query to match to token.
> >
> > Thanks
> > Nicholas
> >
> > On Wed, Oct 17, 2012 at 11:46 AM, Jack Krupansky <
> j...@basetechnology.com>wrote:
> >
> >> The answer is "Yes." Solr and Lucene are quite flexible.
> >>
> >> You neglected to offer any details about your specific use case which
> >> might bias the answer one way or the other. What is some sample data and
> >> some sample queries?
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Nicholas Ding
> >> Sent: Wednesday, October 17, 2012 11:24 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: How to do exact match?
> >>
> >>
> >> Hello,
> >>
> >> I want to do exact match on Solr. I found two ways on Internet, one is
> to
> >> put PREFIX and SUFFIX around the text, another is to use
> KeywordTokenizer.
> >>
> >> I was wondering which one is the better approach for doing exact match?
> >>
> >> Thanks
> >> Nicholas
> >>
>


Re: How to do exact match?

2012-10-17 Thread Otis Gospodnetic
Hi,

Aren't you contradicting yourself a bit here?
You say you want exact matching, but then you say you want a query for
"chinese cuisines" to match "chinese cuisine", which is not exact.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Wed, Oct 17, 2012 at 1:25 PM, Nicholas Ding  wrote:
> Hi,
>
> I have several tokens in a field containing more than one keyword, like
> "chinese food", "indian food", I want to do exact match.
> And even more, if the token is "chinese cuisine", but the query is
> ""chinese cuisines", I still want the query to match to token.
>
> Thanks
> Nicholas
>
> On Wed, Oct 17, 2012 at 11:46 AM, Jack Krupansky 
> wrote:
>
>> The answer is "Yes." Solr and Lucene are quite flexible.
>>
>> You neglected to offer any details about your specific use case which
>> might bias the answer one way or the other. What is some sample data and
>> some sample queries?
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Nicholas Ding
>> Sent: Wednesday, October 17, 2012 11:24 AM
>> To: solr-user@lucene.apache.org
>> Subject: How to do exact match?
>>
>>
>> Hello,
>>
>> I want to do exact match on Solr. I found two ways on Internet, one is to
>> put PREFIX and SUFFIX around the text, another is to use KeywordTokenizer.
>>
>> I was wondering which one is the better approach for doing exact match?
>>
>> Thanks
>> Nicholas
>>


Re: How to do exact match?

2012-10-17 Thread Nicholas Ding
Hi,

I have several tokens in a field containing more than one keyword, like
"chinese food", "indian food", I want to do exact match.
And even more, if the token is "chinese cuisine", but the query is
""chinese cuisines", I still want the query to match to token.

Thanks
Nicholas

On Wed, Oct 17, 2012 at 11:46 AM, Jack Krupansky wrote:

> The answer is "Yes." Solr and Lucene are quite flexible.
>
> You neglected to offer any details about your specific use case which
> might bias the answer one way or the other. What is some sample data and
> some sample queries?
>
> -- Jack Krupansky
>
> -Original Message- From: Nicholas Ding
> Sent: Wednesday, October 17, 2012 11:24 AM
> To: solr-user@lucene.apache.org
> Subject: How to do exact match?
>
>
> Hello,
>
> I want to do exact match on Solr. I found two ways on Internet, one is to
> put PREFIX and SUFFIX around the text, another is to use KeywordTokenizer.
>
> I was wondering which one is the better approach for doing exact match?
>
> Thanks
> Nicholas
>


Re: SolrCloud is not returning all results

2012-10-17 Thread Mark Miller
How are you issuing the commit that makes the docs visible?

On Wed, Oct 17, 2012 at 12:15 PM, Scott Carlson
 wrote:
> I'm trying a very simple setup, and I'm not getting the results I would
> expect.
>
> Starting from : https://wiki.apache.org/solr/SolrCloud  example C.   I have
> 4 instances running locally.
> I verify the cloud setup in the UI, that there are two servers per shard,
> and they are all green.
>
> I added a document to each of the URLs (four in total).  I then use one of
> the server UIs to query for all the docs.   I would expect that executing a
> "*:*" query should ALWAYS return 4 hits.  Instead it seems to randomly
> return 4, 3, or even 2.
>
> Any ideas?
>



-- 
- Mark


Re: Disable Caching

2012-10-17 Thread Otis Gospodnetic
Hi,

If you are not searching against your master, and you shouldn't (and
it sounds like you aren't), then you don't have to worry about
disabling caches - they will just remain empty.  You could comment
them out, but I think that won't actually disable them.

Warmup queries you can just comment our in solrconfig.xml.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Wed, Oct 17, 2012 at 12:25 PM, Anderson vasconcelos
 wrote:
> Hi
>
> I have a server that just index data and sincronize this data to others
> slaves. In my arquitecture, i have a one master server that only receive
> index requests and n slaves that receives only search requests.
>
> I wanna to disable the cache of the master server, because they not receive
> a search request, this is the best way? I can do this?
>
> Wat about warmingSearch, i must disable this too?
>
> I'm using solr 3.6.0
>
> Thanks


RE: Disable Caching

2012-10-17 Thread Harshvardhan Ojha
Yes Anderson, you don't need cache for master, neither warming.

-Original Message-
From: Anderson vasconcelos [mailto:anderson.v...@gmail.com] 
Sent: Wednesday, October 17, 2012 9:55 PM
To: solr-user
Subject: Disable Caching

Hi

I have a server that just index data and sincronize this data to others slaves. 
In my arquitecture, i have a one master server that only receive index requests 
and n slaves that receives only search requests.

I wanna to disable the cache of the master server, because they not receive a 
search request, this is the best way? I can do this?

Wat about warmingSearch, i must disable this too?

I'm using solr 3.6.0

Thanks


SolrCloud is not returning all results

2012-10-17 Thread Scott Carlson
I'm trying a very simple setup, and I'm not getting the results I would 
expect.


Starting from : https://wiki.apache.org/solr/SolrCloud  example C.   I 
have 4 instances running locally.
I verify the cloud setup in the UI, that there are two servers per 
shard, and they are all green.


I added a document to each of the URLs (four in total).  I then use one 
of the server UIs to query for all the docs.   I would expect that 
executing a "*:*" query should ALWAYS return 4 hits.  Instead it seems 
to randomly return 4, 3, or even 2.


Any ideas?



Re: solr4.0 LimitTokenCountFilterFactory NumberFormatException

2012-10-17 Thread Scott Carlson

I'd think the last line should be this instead:

maxTokenCount = Integer.parseInt(maxTokenCountArg);


On 10/17/2012 11:11 AM, Jack Krupansky wrote:

Anybody want to guess what's wrong with this code:

String maxTokenCountArg = args.get("maxTokenCount");
if (maxTokenCountArg == null) {
 throw new IllegalArgumentException("maxTokenCount is mandatory.");
}
maxTokenCount = Integer.parseInt(args.get(maxTokenCountArg));

Hmmm... try this "workaround":

foo="1"/>


-- Jack Krupansky

-Original Message- From: Dirk Högemann
Sent: Wednesday, October 17, 2012 11:50 AM
To: solr-user@lucene.apache.org
Subject: solr4.0 LimitTokenCountFilterFactory NumberFormatException

Hi,

I am trying to upgrade from Solr 3.5 to Solr 4.0.
I read the following in the example solrconfig:



I tried that as follows:

...
positionIncrementGap="100">










...

The LimitTokenCountFilterFactory configured like that crashes the startup
of the corresponding core with the following exception (without the 
Factory

the core startup works):


17.10.2012 17:44:19 org.apache.solr.common.SolrException log
SCHWERWIEGEND: null:org.apache.solr.common.SolrException: Plugin init
failure for [schema.xml] fieldType "textgen": Plugin init failure for
[schema.xml] analyze
r/filter: null
   at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) 


   at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369)
   at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
   at 
org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)

   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
   at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308) 


   at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107) 


   at
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277) 


   at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258) 


   at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) 


   at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103) 


   at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638) 


   at
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294) 


   at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
   at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895) 


   at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871)
   at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615)
   at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649) 


   at
org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581) 


   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 


   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 


   at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/filter: null
   at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) 


   at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377) 


   at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) 


   at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) 


   at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) 


   ... 25 more
Caused by: java.lang.NumberFormatException: null
   at java.lang.Integer.parseInt(Integer.java:417)
   at java.lang.Integer.parseInt(Integer.java:499)
   at
org.apache.lucene.analysis.miscellaneous.LimitTokenCountFilterFactory.init(LimitTokenCountFilterFactory.java:48) 


   at
org.apache.solr.schema.FieldTypePluginLoader$3.init(FieldTypePluginLoader.java:367) 


   at
org.apache.solr.schema.FieldTypePluginLoader$3.init(FieldTypePluginLoader.java:358) 


   at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:159) 


   ... 29 more

Any ideas?

Best
Dirk


Re: solr4.0 LimitTokenCountFilterFactory NumberFormatException

2012-10-17 Thread Jack Krupansky

Anybody want to guess what's wrong with this code:

String maxTokenCountArg = args.get("maxTokenCount");
if (maxTokenCountArg == null) {
 throw new IllegalArgumentException("maxTokenCount is mandatory.");
}
maxTokenCount = Integer.parseInt(args.get(maxTokenCountArg));

Hmmm... try this "workaround":

foo="1"/>


-- Jack Krupansky

-Original Message- 
From: Dirk Högemann

Sent: Wednesday, October 17, 2012 11:50 AM
To: solr-user@lucene.apache.org
Subject: solr4.0 LimitTokenCountFilterFactory NumberFormatException

Hi,

I am trying to upgrade from Solr 3.5 to Solr 4.0.
I read the following in the example solrconfig:



I tried that as follows:

...

 
   
   
   
   
   
   
   
 
...

The LimitTokenCountFilterFactory configured like that crashes the startup
of the corresponding core with the following exception (without the Factory
the core startup works):


17.10.2012 17:44:19 org.apache.solr.common.SolrException log
SCHWERWIEGEND: null:org.apache.solr.common.SolrException: Plugin init
failure for [schema.xml] fieldType "textgen": Plugin init failure for
[schema.xml] analyze
r/filter: null
   at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
   at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369)
   at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
   at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
   at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
   at
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
   at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
   at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
   at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103)
   at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638)
   at
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294)
   at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
   at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895)
   at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871)
   at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615)
   at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649)
   at
org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581)
   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/filter: null
   at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
   at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377)
   at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
   at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
   at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
   ... 25 more
Caused by: java.lang.NumberFormatException: null
   at java.lang.Integer.parseInt(Integer.java:417)
   at java.lang.Integer.parseInt(Integer.java:499)
   at
org.apache.lucene.analysis.miscellaneous.LimitTokenCountFilterFactory.init(LimitTokenCountFilterFactory.java:48)
   at
org.apache.solr.schema.FieldTypePluginLoader$3.init(FieldTypePluginLoader.java:367)
   at
org.apache.solr.schema.FieldTypePluginLoader$3.init(FieldTypePluginLoader.java:358)
   at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:159)
   ... 29 more

Any ideas?

Best
Dirk 



Re: ICUTokenizer ArrayIndexOutOfBounds

2012-10-17 Thread Robert Muir
calling reset() is mandatory part of the consumer lifecycle before
calling incrementToken(), see:

https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/TokenStream.html

A lot of people don't consume these correctly, thats why these
tokenizers now try to throw exceptions if you do it wrong, rather than
wrong results otherwise.

If you really want to test that your consumer code (queryparser,
whatever) is doing this correctly, test your code with
MockTokenizer/MockAnalyzer in the test-framework package. This has a
little state machine with a lot more checks.

On Wed, Oct 17, 2012 at 6:56 AM, Shane Perry  wrote:
> Hi,
>
> I've been playing around with using the ICUTokenizer from 4.0.0.
> Using the code below, I was receiving an ArrayIndexOutOfBounds
> exception on the call to tokenizer.incrementToken().  Looking at the
> ICUTokenizer source, I can see why this is occuring (usableLength
> defaults to -1).
>
> ICUTokenizer tokenizer = new ICUTokenizer(myReader);
> CharTermAttribute termAtt = 
> tokenizer.getAttribute(CharTermAttribute.class);
>
> while(tokenizer.incrementToken())
> {
> System.out.println(termAtt.toString());
> }
>
> After poking around a little more, I found that I can just call
> tokenizer.reset() (initializes usableLength to 0) right after
> constructing the object
> (org.apache.lucene.analysis.icu.segmentation.TestICUTokenizer does a
> similar step in it's super class).  I was wondering if someone could
> explain why I need to call tokenizer.reset() prior to using the
> tokenizer for the first time.
>
> Thanks in advance,
>
> Shane


solr4.0 LimitTokenCountFilterFactory NumberFormatException

2012-10-17 Thread Dirk Högemann
Hi,

I am trying to upgrade from Solr 3.5 to Solr 4.0.
I read the following in the example solrconfig:

 

I tried that as follows:

...

  







  
...

The LimitTokenCountFilterFactory configured like that crashes the startup
of the corresponding core with the following exception (without the Factory
the core startup works):


17.10.2012 17:44:19 org.apache.solr.common.SolrException log
SCHWERWIEGEND: null:org.apache.solr.common.SolrException: Plugin init
failure for [schema.xml] fieldType "textgen": Plugin init failure for
[schema.xml] analyze
r/filter: null
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
at
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638)
at
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871)
at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615)
at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649)
at
org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/filter: null
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 25 more
Caused by: java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:417)
at java.lang.Integer.parseInt(Integer.java:499)
at
org.apache.lucene.analysis.miscellaneous.LimitTokenCountFilterFactory.init(LimitTokenCountFilterFactory.java:48)
at
org.apache.solr.schema.FieldTypePluginLoader$3.init(FieldTypePluginLoader.java:367)
at
org.apache.solr.schema.FieldTypePluginLoader$3.init(FieldTypePluginLoader.java:358)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:159)
... 29 more

Any ideas?

Best
Dirk


Re: How to do exact match?

2012-10-17 Thread Jack Krupansky

The answer is "Yes." Solr and Lucene are quite flexible.

You neglected to offer any details about your specific use case which might 
bias the answer one way or the other. What is some sample data and some 
sample queries?


-- Jack Krupansky

-Original Message- 
From: Nicholas Ding

Sent: Wednesday, October 17, 2012 11:24 AM
To: solr-user@lucene.apache.org
Subject: How to do exact match?

Hello,

I want to do exact match on Solr. I found two ways on Internet, one is to
put PREFIX and SUFFIX around the text, another is to use KeywordTokenizer.

I was wondering which one is the better approach for doing exact match?

Thanks
Nicholas 



Re: Issue using SpatialRecursivePrefixTreeFieldType

2012-10-17 Thread David Smiley (@MITRE.org)
Nice!

On Oct 17, 2012, at 10:50 AM, Eric Khoury [via Lucene] wrote:


I'm using the X axis for time availability start and end (total minutes since 
Jan 2012), each asset can have multiple rectangles (multiple avail start and 
end).  My original design had a bounding rect of 20 years (0 - 10,000,000 
minutes), with certain assets available for the whole time.  Since I'm certain 
that all my data gets reindexed at least once a month, I changed the design to 
simply generate availability for this month + next month, so rectangles are now 
(0 - 45,000 minutes).  And for assets that are available for the complete 
month, which will be the case for a large percentage of assets, I just mark 
with a flag, which avoids me creating a rect for that entry all together.  Eric.
 > Date: Tue, 16 Oct 2012 13:00:45 -0700

> From: [hidden 
> email]
> To: [hidden email]
> Subject: Re: Issue using SpatialRecursivePrefixTreeFieldType
>
> Eric,
>   Can you please elaborate on your workaround?  I'm not sure I get your drift.
> ~ David
> On Oct 16, 2012, at 12:54 PM, Eric Khoury [via Lucene] wrote:
>
> >
> > Thanks for the help David, makes sense.  I found a workaround, creating 
> > much smaller rectangles and updating them more often.Glad to have this 
> > functionality, thanks again!Eric.
>
>
>
>
>
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4014070.html
> Sent from the Solr - User mailing list archive at 
> Nabble.com.



If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4014257.html
To unsubscribe from Solr 4.0 - Join performance, click 
here.
NAML





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4014265.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how solr4.0 and zookeeper run on weblogic

2012-10-17 Thread Jan Høydahl
Did it work for you? You probably also have to set -Djetty.port=8080 in order 
for local ZK not to be started on port 9983. It's confusing, but you can also 
edit solr.xml to achieve the same.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

17. okt. 2012 kl. 10:06 skrev rayvicky :

> thanks 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-solr4-0-and-zookeeper-run-on-weblogic-tp4013882p4014167.html
> Sent from the Solr - User mailing list archive at Nabble.com.



RE: Issue using SpatialRecursivePrefixTreeFieldType

2012-10-17 Thread Eric Khoury

I'm using the X axis for time availability start and end (total minutes since 
Jan 2012), each asset can have multiple rectangles (multiple avail start and 
end).  My original design had a bounding rect of 20 years (0 - 10,000,000 
minutes), with certain assets available for the whole time.  Since I'm certain 
that all my data gets reindexed at least once a month, I changed the design to 
simply generate availability for this month + next month, so rectangles are now 
(0 - 45,000 minutes).  And for assets that are available for the complete 
month, which will be the case for a large percentage of assets, I just mark 
with a flag, which avoids me creating a rect for that entry all together.  Eric.
 > Date: Tue, 16 Oct 2012 13:00:45 -0700
> From: dsmi...@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Issue using SpatialRecursivePrefixTreeFieldType
> 
> Eric,
>   Can you please elaborate on your workaround?  I'm not sure I get your drift.
> ~ David
> On Oct 16, 2012, at 12:54 PM, Eric Khoury [via Lucene] wrote:
> 
> > 
> > Thanks for the help David, makes sense.  I found a workaround, creating 
> > much smaller rectangles and updating them more often.Glad to have this 
> > functionality, thanks again!Eric. 
> 
> 
> 
> 
> 
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4014070.html
> Sent from the Solr - User mailing list archive at Nabble.com.
  

Re: solrcloud: what if ZK instances are evanescent?

2012-10-17 Thread Erick Erickson
As far as I can tell (and please someone correct me if I'm wrong), currently
it's best to restart your Solr servers (probably on a rolling basis) pointing
to the new ZK machine and not pointing at the old one. I believe there's
some work afoot both with ZK and Solr to be more robust in this situation,
but the Solr work depends on the ZK work I think.

Although I wouldn't expect the ZK machines to be that fragile, or are you
thinking about things like AWS or Azure etc?

Erick

On Mon, Oct 15, 2012 at 1:39 PM, John Brinnand  wrote:
> Hi Folks,
>
> I have been looking at solrcloud to solve some of our problems with solr in
> a distributed environment. As you know, in such an environment, every
> instance of solr or zookeeper can come into existence and go out of
> existence - at any time. So what happens if instances of ZK disappear and
> re-appear with different hostnames and DNS entries? How would solr know
> about these instances and how would it re-sync with these instances?
>
> In essence my question is: what if the hostname and port of the ZK instance
> no longer exists - how will solrcloud discover the new instance(s)?
>
> Thanks,
>
> John
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solrcloud-what-if-ZK-instances-are-evanescent-tp4013740.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Several indexes

2012-10-17 Thread Erick Erickson
You've got it


On Wed, Oct 17, 2012 at 10:04 AM, blopez  wrote:
> Thank you both.
>
> At the end I decided to implement the multi-core approach. I think it's the
> fastest and easiest solution, and now it's working fine with two cores.
>
> By the way, to check if it's implemented properly... each 'core folder' (in
> my case core0, core1, ...) needs its 'bin', 'conf' and 'data' folders,
> right?
>
> Regards,
> Borja.
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Several-indexes-tp4014181p4014244.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Several indexes

2012-10-17 Thread blopez
Thank you both.

At the end I decided to implement the multi-core approach. I think it's the
fastest and easiest solution, and now it's working fine with two cores.

By the way, to check if it's implemented properly... each 'core folder' (in
my case core0, core1, ...) needs its 'bin', 'conf' and 'data' folders,
right?

Regards,
Borja.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Several-indexes-tp4014181p4014244.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to boost query term after tokenizer

2012-10-17 Thread dirk
Hi,

I think that`s not the way you can do it, because you cannot give a hint to
your analyzer, which text fragment is more relevant than another at runtime.
There is no marker so a filter process cannot know, which terms are to
boost. You could write your own filter and let it read a file with some
important terms in order to compare each term with your queryterms, but I
think that would not be a good way. 

If you have a way in order to split search query text into relevant terms.
The first step is done. That`s a possible way for analysis at query time in
order to search with right terms. In order to provide index data you can try
to pre-process your data in order to save most important keywords in
seperated search fields. Then you boost those fields on query time. 
Hope I could help, Dirk



-
erste Erfahrungen mit SOLR u. Vufind 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-boost-query-term-after-tokenizer-tp4010889p4014245.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: WordDelimiterFilter and the dot character

2012-10-17 Thread Shawn Heisey

On 10/17/2012 7:24 AM, dirk wrote:

Hi,

I had a very similar Problem while searching in a bibliographic field called
"signatur". I could solve it by the help of additional Filterclasses. At the
moment I use the following Filters. Then it works for me:

...
 
 
 
 




...
The MappingCharFilterFactory I have added in order to have a better support
of german "Umlaute". Concerning the Wildcards:
It is important that you use the ReversedWildcardFilterFactory only at index
time. All other Filters I also use at query time.


This is unrelated to the original question, but I hope it's helpful.  
You can replace both MappingCharFilterFactory and LowerCaseFilterFactory 
with the following, at the current position of the lowercase filter.  It 
might produce better results, but you would have to reindex:




In order to do this, you must place the icu4j and lucene-icu 
(lucene-analyzers-icu in 4.x) jars in a lib folder accessed by Solr.  If 
you're using solr.xml (multicore) the best place is typically the 
sharedLib defined there.  If you're not using multicore, the lib folder 
would need to be defined in your solrconfig.xml.


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory

Thanks,
Shawn



Re: solr 4 tika config

2012-10-17 Thread Jan Høydahl
Hi,

Try the new post.jar in version 4.0.0

It will allow you to say
java -Dauto -Drecursive -Dfiletypes=ppt -jar post.jar "d:\myfiles" 

You can inspect your Solr log file to see what ExtractingRequestHandler URLs 
are actually called for each

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

16. okt. 2012 kl. 13:14 skrev cmd.ares :

> I want to index all pdf files in "d:\myfiles\*.*" 
> file fullname as the field id
> file content as the field txt
> the index should be like this:
> 
> -id---txt--
> d:\myfiles\0.pdfa
> d:\myfiles\subfolder1\1.pdf b
> d:\myfiles\subfolder2\2.pdf c
> 
> how to config dih?
> thanks.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-4-tika-config-tp4013947.html
> Sent from the Solr - User mailing list archive at Nabble.com.



ICUTokenizer ArrayIndexOutOfBounds

2012-10-17 Thread Shane Perry
Hi,

I've been playing around with using the ICUTokenizer from 4.0.0.
Using the code below, I was receiving an ArrayIndexOutOfBounds
exception on the call to tokenizer.incrementToken().  Looking at the
ICUTokenizer source, I can see why this is occuring (usableLength
defaults to -1).

ICUTokenizer tokenizer = new ICUTokenizer(myReader);
CharTermAttribute termAtt = 
tokenizer.getAttribute(CharTermAttribute.class);

while(tokenizer.incrementToken())
{
System.out.println(termAtt.toString());
}

After poking around a little more, I found that I can just call
tokenizer.reset() (initializes usableLength to 0) right after
constructing the object
(org.apache.lucene.analysis.icu.segmentation.TestICUTokenizer does a
similar step in it's super class).  I was wondering if someone could
explain why I need to call tokenizer.reset() prior to using the
tokenizer for the first time.

Thanks in advance,

Shane


Re: Solr reports: "Can not read response from server" when running import

2012-10-17 Thread Shawn Heisey

On 10/17/2012 12:29 AM, Romita Saha wrote:

Hi Dave,

I followed your guidance and loaded my database in MySQL. Presently the
url reads like this:

url = "jdbc:mysql://localhost:8983/var/lib/mysql/camerasys"

The bin address in my.cnf file is :
bind-address = 127.0.0.1

However the issue still persists.  Kindly help me find out the issue. The
error log is stated below.

Caused by: com.mysql.jdbc.CommunicationsException: Communications link
failure due to underlying exception:


Typically MySQL listens on port 3306, and if you haven't changed it from 
the default, you shouldn't even need to include it.  Your URL then needs 
to have the name of the database (schema), not the path to where MySQL 
is storing it.Port 8983 is Solr's port if you run under the included 
jetty container.


It looks like you probably can use this, assuming you named the database 
camerasys:


url="jdbc:mysql://localhost/camerasys"

Here's how my dataimport source is defined.  I pass the database host 
and schema in via the dataimport request URL, and I include the port 
number even though I don't have to:


  

Thanks,
Shawn



Re: WordDelimiterFilter and the dot character

2012-10-17 Thread Jack Krupansky
Oh, and I forgot to mention that you should try your field type and query 
terms in the Solr Admin analyzer page. There you can see what sequence is 
generated for the query.


-- Jack Krupansky

-Original Message- 
From: Farkas István

Sent: Wednesday, October 17, 2012 9:33 AM
To: solr-user@lucene.apache.org
Subject: Re: WordDelimiterFilter and the dot character

Hm, that makes sense, thank you, I will try this one.

Regards,
  Istvan
You need to have separate "index" and "query" analyzers for that field 
type. The "query" analyzer would not have preserveOriginal="1", which 
would generate an extra term that would not match the exact term sequence 
that was indexed.


A query of "123 2012" would not split any terms and hence not generate the 
extra "preserved" term.


But a query of "123/2012" would actually query "123/2012 123 2012", which 
is not a term sequence that was indexed.


-- Jack Krupansky

-Original Message- From: Farkas István
Sent: Wednesday, October 17, 2012 8:58 AM
To: solr-user@lucene.apache.org
Subject: WordDelimiterFilter and the dot character

Hello,

I've ran into an interesting problem. I am using Solr 3.5 on an Ubuntu
server.

I have some data with a code field, which contains some identifiers
(mostly) in the following format: E.123/2012.

I've set up a fieldType for this code field:

|positionIncrementGap="100">



generateNumberParts="1" splitOnNumerics="1" preserveOriginal="1" />


  

|

If I search for the exact code ("E.123/2012."), I will get the expected
result. If I search for "123 2012", I also get the expected results. If
I search for the "123/2012" string, the result set is empty. Tried it
with catenateNumbers and catenateWords enabled, with the same results.

The interesting thing here is that using the field analysis tool, the
123/2012 gives a match if I select the "highlight matches" option. But
the same query yields nothing when I try to use it in the query debug
tool in the Solr admin. The query works if I use a wilcard search
(*123/2012*), but I would like to avoid that. What do I miss here?

Regards,
  Istvan 




Re: differences of LockFactory between solr 3.6.1 and 4.0.0?

2012-10-17 Thread Yonik Seeley
On Wed, Oct 17, 2012 at 9:33 AM, Bernd Fehling
 wrote:
> Hi list,
>
> while checking the runtime behavior of solr 4.0.0 I recognized that the 
> handling
> of write.lock seams to be different.
>
> With solr 3.6.1 after calling optimize the index is optimzed and write.lock 
> removed.
> This tells me everything is flushed to disk and its save to copy the index.
>
> With solr 4.0.0 after calling optimize the index is optimized but the 
> write.lock remains.
>
> They both use NativeFSLockFactory.
>
> What could be the cause that write.lock remains with solr 4.0.0?

The IndexWriter is left open now on optimizes / commits to enable
better NRT and better concurrency with adds (commits no longer block
adds).

-Yonik
http://lucidworks.com


differences of LockFactory between solr 3.6.1 and 4.0.0?

2012-10-17 Thread Bernd Fehling
Hi list,

while checking the runtime behavior of solr 4.0.0 I recognized that the handling
of write.lock seams to be different.

With solr 3.6.1 after calling optimize the index is optimzed and write.lock 
removed.
This tells me everything is flushed to disk and its save to copy the index.

With solr 4.0.0 after calling optimize the index is optimized but the 
write.lock remains.

They both use NativeFSLockFactory.

What could be the cause that write.lock remains with solr 4.0.0?

Any new options for optimize to force flush to disk and remove write.lock?

Regards
Bernd


Re: WordDelimiterFilter and the dot character

2012-10-17 Thread Farkas István

Hm, that makes sense, thank you, I will try this one.

Regards,
  Istvan
You need to have separate "index" and "query" analyzers for that field 
type. The "query" analyzer would not have preserveOriginal="1", which 
would generate an extra term that would not match the exact term 
sequence that was indexed.


A query of "123 2012" would not split any terms and hence not generate 
the extra "preserved" term.


But a query of "123/2012" would actually query "123/2012 123 2012", 
which is not a term sequence that was indexed.


-- Jack Krupansky

-Original Message- From: Farkas István
Sent: Wednesday, October 17, 2012 8:58 AM
To: solr-user@lucene.apache.org
Subject: WordDelimiterFilter and the dot character

Hello,

I've ran into an interesting problem. I am using Solr 3.5 on an Ubuntu
server.

I have some data with a code field, which contains some identifiers
(mostly) in the following format: E.123/2012.

I've set up a fieldType for this code field:

|positionIncrementGap="100">



generateNumberParts="1" splitOnNumerics="1" preserveOriginal="1" />


  

|

If I search for the exact code ("E.123/2012."), I will get the expected
result. If I search for "123 2012", I also get the expected results. If
I search for the "123/2012" string, the result set is empty. Tried it
with catenateNumbers and catenateWords enabled, with the same results.

The interesting thing here is that using the field analysis tool, the
123/2012 gives a match if I select the "highlight matches" option. But
the same query yields nothing when I try to use it in the query debug
tool in the Solr admin. The query works if I use a wilcard search
(*123/2012*), but I would like to avoid that. What do I miss here?

Regards,
  Istvan




Re: WordDelimiterFilter and the dot character

2012-10-17 Thread Jack Krupansky
You need to have separate "index" and "query" analyzers for that field type. 
The "query" analyzer would not have preserveOriginal="1", which would 
generate an extra term that would not match the exact term sequence that was 
indexed.


A query of "123 2012" would not split any terms and hence not generate the 
extra "preserved" term.


But a query of "123/2012" would actually query "123/2012 123 2012", which is 
not a term sequence that was indexed.


-- Jack Krupansky

-Original Message- 
From: Farkas István

Sent: Wednesday, October 17, 2012 8:58 AM
To: solr-user@lucene.apache.org
Subject: WordDelimiterFilter and the dot character

Hello,

I've ran into an interesting problem. I am using Solr 3.5 on an Ubuntu
server.

I have some data with a code field, which contains some identifiers
(mostly) in the following format: E.123/2012.

I've set up a fieldType for this code field:

|positionIncrementGap="100">



generateNumberParts="1" splitOnNumerics="1" preserveOriginal="1" />


  

|

If I search for the exact code ("E.123/2012."), I will get the expected
result. If I search for "123 2012", I also get the expected results. If
I search for the "123/2012" string, the result set is empty. Tried it
with catenateNumbers and catenateWords enabled, with the same results.

The interesting thing here is that using the field analysis tool, the
123/2012 gives a match if I select the "highlight matches" option. But
the same query yields nothing when I try to use it in the query debug
tool in the Solr admin. The query works if I use a wilcard search
(*123/2012*), but I would like to avoid that. What do I miss here?

Regards,
  Istvan 



Re: WordDelimiterFilter and the dot character

2012-10-17 Thread dirk
Hi,

I had a very similar Problem while searching in a bibliographic field called
"signatur". I could solve it by the help of additional Filterclasses. At the
moment I use the following Filters. Then it works for me:

...

 



   
 

...
The MappingCharFilterFactory I have added in order to have a better support
of german "Umlaute". Concerning the Wildcards: 
It is important that you use the ReversedWildcardFilterFactory only at index
time. All other Filters I also use at query time.
Perhaps it could help.
Dirk



-
erste Erfahrungen mit SOLR u. Vufind 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/WordDelimiterFilter-and-the-dot-character-tp4014220p4014225.html
Sent from the Solr - User mailing list archive at Nabble.com.


WordDelimiterFilter and the dot character

2012-10-17 Thread Farkas István

Hello,

I've ran into an interesting problem. I am using Solr 3.5 on an Ubuntu 
server.


I have some data with a code field, which contains some identifiers 
(mostly) in the following format: E.123/2012.


I've set up a fieldType for this code field:

|




  

|

If I search for the exact code ("E.123/2012."), I will get the expected 
result. If I search for "123 2012", I also get the expected results. If 
I search for the "123/2012" string, the result set is empty. Tried it 
with catenateNumbers and catenateWords enabled, with the same results.


The interesting thing here is that using the field analysis tool, the 
123/2012 gives a match if I select the "highlight matches" option. But 
the same query yields nothing when I try to use it in the query debug 
tool in the Solr admin. The query works if I use a wilcard search 
(*123/2012*), but I would like to avoid that. What do I miss here?


Regards,
  Istvan



Re: Flushing RAM to disk

2012-10-17 Thread Erick Erickson
Deniz:

Unless you have a _proven_ bottleneck that this will address, I'd just
let Solr/Lucene do their thing without interference. Explicitly using a
RAM directory is one of those things that _seems_ like would be a
Good Thing, but outside of very specialized situations it often
isn't worth the effort.

Measure first. Then fix.
Best
Erick

On Wed, Oct 17, 2012 at 3:15 AM, deniz  wrote:
> I heard about MMapDirectory - actually my test env is using that- , but the
> question was just an idea... and how about using SolrCloud? I mean can we
> set shards to use ram and replicas to use MMapDirectory? is this possible?
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Flushing-RAM-to-disk-tp4014128p4014155.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: results with no default sort order...

2012-10-17 Thread Erick Erickson
You probably have to do that at the application layer, there's
no good way that I know of to do this in Solr. That is, get the
response (which contains the original query) and re-sort the
results list before display.

Best
Erick

On Wed, Oct 17, 2012 at 12:13 AM, trilok2000  wrote:
> Hi there,
>
> I'm searching with the following query:
> /select?q=*:*&fq=fld: OR fld: OR fld:
> where, the field fld is a String type and uniqueKey.
>
> I'm getting results as:
>
> 
>
> 
> 
>
> 
> 
>
> 
>
> Looks like the results are sorted by fld. But I want the results to be
> "un-sorted"... meaning, I want the results to be in the order in which I
> have given in the fq condition. That is, I want the results as follows:
>
> 
>
> 
> 
>
> 
> 
>
> 
>
> How do we do that? Thanks in advance!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/results-with-no-default-sort-order-tp4014125.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search multiple tags within multiple categories

2012-10-17 Thread Otis Gospodnetic
Hi,

It sounds like you want to use Solr as classifier, which is doable. Take a
bunch of text for some category and index it as one doc. Make sure it has a
field with category name. Do that for each category.  Then use tags to
search against this index and return the field with category name.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 16, 2012 7:29 AM, "Eneko chan"  wrote:

> Hello,
>
> May be I explained myself wrong or I still don't know what OpenNLP is
> capable of.
>
> I don't need to automatically create the relations between tags and
> categories. That would be done (at first) manually. I need somehow to send
> a query with the tags [plane, airport, pilot, parachuting, base jump] and
> receive as response [Aviation, Extreme Sports] for example. But if I send
> [plane, wood, cycling, jogging] receive as response [Bricolage, Sports].
>
> Wouldn't that be a matter of searching and not classifying?
>
> By the way I wanted to say "Apache Jena" before and not "Apache Jade"...
>
> Thank you very much.
>
> On Tue, Oct 16, 2012 at 12:28 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
> > Hello,
> >
> > Sounds like you want to look into classification tools like OpenNLP.
> There
> > is a wiki page on Solr wiki, too...
> >
> > Otis
> > --
> > Performance Monitoring - http://sematext.com/spm
> > On Oct 16, 2012 5:45 AM, "Eneko chan"  wrote:
> >
> > > Hi,
> > >
> > > I'm new to Solr and I'm felling quite lost. I'm not sure if Sorl could
> be
> > > the answer to the problem I'm dealing with. I'll try to explain it as
> > > simple as I can:
> > >
> > > The users of my website have a profile (just as usual with name, email,
> > > twitter account, etc.). Every user can write down some tags that
> describe
> > > themselves. For example I could write "programming", "PHP", "Java" and
> > "web
> > > development" for example. Then the user chooses one (and only one)
> > > category, in my case it could be "Computer Science".
> > >
> > > The new feature we want to add is this: the user would write down only
> > the
> > > tags as ussual but depending on those tags *one or more categories*
> would
> > > be selected for him *automatically*. Depending on the amount and
> > relevancy
> > > of the tags some or other categories would be choosen. The idea is to
> > look
> > > for tag words using Nutch (in newspaper web sites, twitter, etc,) and
> > then
> > > relate those to categories.  But one of the problems is that some words
> > can
> > > belong at the same time to different categories depending of the other
> > > tags. For example "plane" with "airport" would be in "Aviation"
> category
> > > but "plane" (meaning the tool) with "wood" would be in "Bricolage"
> > > category.
> > >
> > > Is Solr capable of doing what I want? If not, does any body know a
> > > framework or library (preferably open source) that could do this? I've
> > read
> > > about Apache Jade for Semantic Web applications but I'm not sure.
> > >
> > >
> > > Thank you!
> > >
> >
>


Re: Charfilter keep "dates" but skeep "number"

2012-10-17 Thread darul
Thank you Erik,

Probably right to put this business rule above processing or via a
updateRequestProcessorChain.

Thanks again, I love this forum, so efficient.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Charfilter-keep-dates-but-skip-number-tp4014049p4014211.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query related to data source in lucid imagination

2012-10-17 Thread Erik Hatcher
Leena -

First, best to deal with LucidWorks topics at our support forum at 
http://support.lucidworks.com

For this particular issue, I believe the issue is likely that you've installed 
LucidWorks on a Windows server and used spaces in the installation path.  See 
 
where that is mentioned in the Release Notes.  To correct, simply reinstall in 
a directory without spaces.

Erik

On Oct 17, 2012, at 07:38 , Leena Jawale wrote:

> Hi,
> 
> I have installed lucidworks enterprise v2.1. In that, I want to create XML 
> data source.
> But on the data source page I am unable to find the Solr XML in the dropdown 
> list.
> Could you help me in this..??
> 
> Thanks & regards,
> Leena Jawale
> 
> 
> The contents of this e-mail and any attachment(s) may contain confidential or 
> privileged information for the intended recipient(s). Unintended recipients 
> are prohibited from taking action on the basis of information in this e-mail 
> and using or disseminating the information, and must notify the sender and 
> delete it from their system. L&T Infotech will not accept responsibility or 
> liability for the accuracy or completeness of, or the presence of any virus 
> or disabling code in this e-mail"



Re: Several indexes

2012-10-17 Thread Erick Erickson
Jochen's suggestion is a good one. Alternately you could just
index all the fields into a single schema with, say, a "type" field
to use in a filter query to separate the searches.

Which you choose is largely a matter of taste. Unless there are
a LOT of documents the penalty for having unused fields is
pretty small.

Best
Erick

On Wed, Oct 17, 2012 at 8:22 AM, Jochen Just  wrote:
> You probably should try a multi core installation:
> http://wiki.apache.org/solr/CoreAdmin should get you started.
> Am 17.10.2012 12:21, schrieb blopez:
>> Hi all,
>>
>> I'm facing a problem that I think is easier to solve than I really
>> think.
>>
>> Overview: I have an application working on Solr which manages
>> indexing and retrieval operations. Everything's working fine, I can
>> index some docs (for example schema with attributes A, B and C) in
>> a Solr index and then perform query operations on it.
>>
>> The problem is that I want to implement another process in the
>> same application to retrieve information, but with a different
>> schema. For example, docs with attributes X and Y.
>>
>> I tried to set two different schemas in the schema.xml file, but it
>> crashes the Solr instance. Moreover, I've been thinking about a
>> workaround but it's not clear for me. Another point could be
>> creating a new instance of Solr, so that there are two Solr
>> instances open... but I think it's not a real solution.
>>
>> Regards, Borja.
>>
>>
>>
>> -- View this message in context:
>> http://lucene.472066.n3.nabble.com/Several-indexes-tp4014181.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
> --
> Jochen Just   Fon:   (++49) 711/28 07 57-193
> avono AG  Mobil: (++49) 172/73 85 387
> Breite Straße 2   Mail:  jochen.j...@avono.de
> 70173 Stuttgart   WWW:   http://www.avono.de


Re: Several indexes

2012-10-17 Thread Jochen Just
You probably should try a multi core installation:
http://wiki.apache.org/solr/CoreAdmin should get you started.
Am 17.10.2012 12:21, schrieb blopez:
> Hi all,
> 
> I'm facing a problem that I think is easier to solve than I really
> think.
> 
> Overview: I have an application working on Solr which manages
> indexing and retrieval operations. Everything's working fine, I can
> index some docs (for example schema with attributes A, B and C) in
> a Solr index and then perform query operations on it.
> 
> The problem is that I want to implement another process in the
> same application to retrieve information, but with a different
> schema. For example, docs with attributes X and Y.
> 
> I tried to set two different schemas in the schema.xml file, but it
> crashes the Solr instance. Moreover, I've been thinking about a
> workaround but it's not clear for me. Another point could be
> creating a new instance of Solr, so that there are two Solr
> instances open... but I think it's not a real solution.
> 
> Regards, Borja.
> 
> 
> 
> -- View this message in context:
> http://lucene.472066.n3.nabble.com/Several-indexes-tp4014181.html 
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


-- 
Jochen Just   Fon:   (++49) 711/28 07 57-193
avono AG  Mobil: (++49) 172/73 85 387
Breite Straße 2   Mail:  jochen.j...@avono.de
70173 Stuttgart   WWW:   http://www.avono.de


Re: Charfilter keep "dates" but skeep "number"

2012-10-17 Thread Erick Erickson
This kind of thing doesn't lend itself to OOB filters, I'd recommend
one of two approaches:
1> put the logic in your indexing process, easy if you use SolrJ
2> create a custom element to an updateRequestProcessorChain
 that modifies the document. This seems daunting, but it's
 actually quite easy, the solrconfig.xml has some hints. The key
 here is that you just have a map that has all the field:value pairs
 and you can do whatever you want with it.

Best
Erick

On Tue, Oct 16, 2012 at 12:37 PM, darul  wrote:
> Hello all,A long time I have not posted, but do not worry I am still using
> Solr everyday and enjoy it.Here the details of my requirement:According to a
> specific content with "dates", "number", (maybe number%), we would like to
> *skip number* and *keep dates (+number%)* in indexation process. Do you see
> a common way to achieve this with provided analysers or charfilter
> (PatternReplaceCharFilterFactory...). I have used
> PatternReplaceCharFilterFactory to skip number, but results are not relevant
> for what we are looking for.Example (stupid one ;)):After processing, may
> be:
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Charfilter-keep-dates-but-skeep-number-tp4014049.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Datefaceting on multiple value in solr

2012-10-17 Thread Sagar Joshi1304
Thanks Chris,

but i want something like below,

docid:1  name=test date=oct12 docid:7  name=test1 date=oct13
docid:13  name=test3 date=oct12
docid:2 name=test date=oct12  docid:8  name=test1 date=oct13 
docid:14  name=test3 date=oct13 
docid:3 name=test date=oct13  docid:9  name=test1 date=oct14 
docid:15  name=test3 date=oct14
docid:4 name=test date=oct14  docid:10  name=test1 date=oct15   
docid:16  name=test3 date=oct15
docid:5 name=test date=oct14  docid:11  name=test1 date=oct16   
docid:17  name=test3 date=oct16
docid:6 name=test date=oct14  docid:12  name=test1 date=oct16   
docid:18  name=test1 date=oct16

now i want something like below in one request

   test
  2
  1
  3
  0
  0 
   
   test1
  0
  2
  1
  1
  2 

test3
  1
  1
  1
  1
  2 

currently i have to do 3 request for all three values of name, with passing
"name" as fq parameter and given facet date range.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Datefaceting-on-multiple-value-in-solr-tp4014021p4014196.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query related to data source in lucid imagination

2012-10-17 Thread Leena Jawale
Hi,

I have installed lucidworks enterprise v2.1. In that, I want to create XML data 
source.
But on the data source page I am unable to find the Solr XML in the dropdown 
list.
Could you help me in this..??

Thanks & regards,
Leena Jawale


The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L&T Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"


Several indexes

2012-10-17 Thread blopez
Hi all,

I'm facing a problem that I think is easier to solve than I really think. 

Overview: I have an application working on Solr which manages indexing and
retrieval operations. Everything's working fine, I can index some docs (for
example schema with attributes A, B and C) in a Solr index and then perform
query operations on it.

The problem is that I want to implement another process in the same
application to retrieve information, but with a different schema. For
example, docs with attributes X and Y.

I tried to set two different schemas in the schema.xml file, but it crashes
the Solr instance. Moreover, I've been thinking about a workaround but it's
not clear for me. Another point could be creating a new instance of Solr, so
that there are two Solr instances open... but I think it's not a real
solution.

Regards,
Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Several-indexes-tp4014181.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities

2012-10-17 Thread Dominik Siebel
Hi folks,

I am currently migrating our Solr servers from a 4.0.0 nightly build
(aprox. November 2011, which worked very well) to the newly released
4.0.0 and am running into some issues concerning the existing
DataImportHandler configuratiions. Maybe you have an idea where I am
going wrong here.

The following lines are a highly simplified excerpt from one of the
problematic imports:







While this configuration worked without any problem for over half a
year now, when upgrading to 4.0.0-BETA AND 4.0.0 the Import throws the
followeing Stacktrace and exits:

 SEVERE: Exception while processing: path document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException

which is caused by

Caused by: java.lang.NullPointerException
at 
org.apache.solr.handler.dataimport.EvaluatorBag$1.evaluate(EvaluatorBag.java:79)

In other words: The EvaluatorBag doesn't seem to resolve the given
path.name variable properly and returns null.

Does anyone have any idea?
Appreciate your input!

Regards
Dom


Re: Having trouble getting boosting queries to work with multiple terms

2012-10-17 Thread Asfand Qazi
Several ideas there, thanks guys.  I'll re-evaluate how I build and use 
the index from now on.


Asfand Qazi

On 16/10/12 17:35, Tomás Fernández Löbbe wrote:

I think sorting should work too, as I suggested before. In this case
(because by coincidence you need alphabetic sort on the type) "sort=type
asc, score desc" should work.

If you need to add other types, maybe add an int field that represents how
you would like those to be sorted. between types, the regular score will be
used for sorting.

Tomás

On Tue, Oct 16, 2012 at 1:18 PM, Walter Underwood wrote:


Here is an approach that avoids the IDF problem.

Add another field, perhaps named "priority". In that field, put a boost
value, like 100 for allele docs, 10 for mi_attempt docs, and so on. In the
boost part of the query, use the value of that field boost=priority.

If you cannot change the index, you may be able do the same thing with if
statements in the function query, see
http://wiki.apache.org/solr/FunctionQuery

This is a common design request, to show all results of type A before all
results of type B, and it has a common and severe problem. If your query
term is common, the user will see 10,000 hits of type A, all the way to the
least relevant, before they see the highly-relevant first hit of type B.
So, the search is broken for all common query terms and there is nothing
the user can do to fix it.

Instead, use a smaller boost, maybe a bit more than a tiebreaker, but not
enough to force a total ordering. You may also want to use facets or fixed
filters, so that users can select only alleles or only my_attempts.

wunder

On Oct 16, 2012, at 8:21 AM, Asfand Qazi wrote:


On 16/10/12 16:15, Walter Underwood wrote:

Why do you want that ordering? That isn't what Solr is designed to do.

It is designed for relevance. I expect that idf (the rarity of the terms)
is being used in the ordering. "mi_attempt" is probably much more rare than
"allele".


If you want that strict ordering, I recommend doing three queries and

concatenating the three result sets.


wunder


I want that ordering because alleles are more 'important' to a biologist

than an mi_attempt, which is 'more important' than a phenotype_attempt.


If Solr isn't designed for this kind of stuff, then I will do the

sorting manually after I have received all the documents.  I could give
huge boost values to each term, but then I guess I'm just using a
sledgehammer to crack a nut.


Thanks

--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute



--
The Wellcome Trust Sanger Institute is operated by Genome Research

Limited, a charity registered in England with number 1021457 and a company
registered in England with number 2742969, whose registered office is 215
Euston Road, London, NW1 2BE.

--
Walter Underwood
wun...@wunderwood.org









--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute



--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


Re: java.io.IOException: Malformed chunk who can help me?

2012-10-17 Thread Alexandre Rafalovitch
At least one of those messages seem to imply that perhaps you secured
your servlet and the security manager can't log you in. If that was
not the intention, maybe you need to create an exclusion public zone
for that specific URL.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Oct 17, 2012 at 1:39 PM, rayvicky  wrote:
> 
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/java-io-IOException-Malformed-chunk-who-can-help-me-tp4014168.html
> Sent from the Solr - User mailing list archive at Nabble.com.


DIH: Variables and pk-attribute in database.xml

2012-10-17 Thread Stefan Burkard
Hi all

In the DIH configfile "database.xml" I defined an entity like this:




...


With this definition, the full-load runs fine, but as soon as the
delta-loads run, I get a huge number (I guess 1 for every
delta-record) of warnings who say "WARNING: Unable to resolve
variable: entityName.OTHER_ID while parsing expression:
${entityName.ID}_${entityName.OTHER_ID}"
Actually this warning appears for every field used as template
variable except the ID field.

Therefore I have two questions:
1. What is the entity-attribute "pk" exactly for?
2. Why do I get these warnings? Is it because the "deltaQuery" does
not select "OTHER_ID"? I thought that "deltaQuery" only needs to
select the IDs of the records to be loaded with the
"deltaImportQuery".

Thanks for any help
Stefan


Re: Can we retrieve deleted records before optimized

2012-10-17 Thread Dmitry Kan
You could split your index. There are some tools available, e.g. this one
on github https://github.com/HON-Khresmoi/hash-based-index-splitter does
hash based index splitting. You could probably start with the tool and
modify it for your needs (like moving docs with certain timestamp criteria
to another Lucene index). After you have partitioned your index, you should
__in_principle__ be able to start up another SOLR instance with each of the
index parts.

Another approach is the one you have mentioned, that is querying SOLR and
e.g. storing output into an csv file. Then load that csv into another SOLR
instance. This one is probably the fastest solution in terms of coding
effort, but might put a strain onto your (production?) Solr instance. So
should be done during non biz hours. Could be done incrementally as well,
that as soon as you know the data became old enough to be moved, move it in
smaller pieces.

Be aware of the fact, that you will only be able to retrieve stored fields
of each document with both approaches.

Regards,

Dmitry

On Wed, Oct 17, 2012 at 6:12 AM, Zeng Lames  wrote:

> Thanks kan for your prompt help. It is really a great solution to recovery
> those deleted records.
>
> Another question is about Solr history data housekeep problem. the scenario
> is as below:
>
> we have a solr core to store biz records, which is large volume that the
> index files is more than 50GB in one month. Due to disk space limitation,
> we need to delete one month ago records from solr.  but in the other hand,
> we need to keep those data in the cheaper disk for analysis.
> the problem is how to keep those data one month ago into cheaper disk in
> quickly solution. one simple but so slow solution is that we search out
> those records and add into the solr in the cheaper disk.
>
> wanna to know is there any other solution for such kind of problem. e.g.
> move the index files directly?
>
> thanks a lot!
>
> On Wed, Oct 17, 2012 at 12:31 AM, Dmitry Kan  wrote:
>
> > Hello,
> >
> > One approach (not a solrish one, but still) would be to use Lucene API
> and
> > set up an IndexReader onto the solr index in question. You can then do:
> >
> > [code]
> > Directory indexDir = FSDirectory.open(new File(pathToDir));
> > IndexReader input = IndexReader.open(indexDir, true);
> >
> > FieldSelector fieldSelector = new SetBasedFieldSelector(
> > null, // to retrive all stored fields
> > Collections.emptySet());
> >
> > int maxDoc = input.maxDoc();
> > for (int i = 0; i < maxDoc; i++) {
> > if (input.isDeleted(i)) {
> > // deleted document found, retrieve it
> > Document document = input.document(i, fieldSelector);
> > // analyze its field values here...
> > }
> > }
> > [/code]
> >
> > I haven't compiled this code myself, you'll need to experiment with it.
> >
> > Dmitry
> >
> > On Tue, Oct 16, 2012 at 11:06 AM, Zeng Lames 
> wrote:
> >
> > > Hi,
> > >
> > > as we know, when we delete document from solr, it will add .del file to
> > > related segment index files, then delete them from disk after optimize.
> > > Now, the question is that before optimized, can we retrieve those
> deleted
> > > records? if yes, how to?
> > >
> > > thanks a lot!
> > >
> > > Best Wishes
> > > Lames
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>



-- 
Regards,

Dmitry Kan


Re: What does _version_ field used for?

2012-10-17 Thread Jun Wang
Ok, I got it, thanks

2012/10/17 Alexandre Rafalovitch 

> Yes, just make sure you have it in the scheme. Solr handles the rest.
>
> Regards,
>Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Wed, Oct 17, 2012 at 12:57 PM, Jun Wang  wrote:
> > Is that said we just need to add this filed, and there is no more work?
> >
> > 2012/10/17 Rafał Kuć 
> >
> >> Hello!
> >>
> >> It is used internally by Solr, for example by features like partial
> >> update functionality and update log.
> >>
> >> --
> >> Regards,
> >>  Rafał Kuć
> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> ElasticSearch
> >>
> >> > I ma moving to solr4.0 from beta version. There is a exception was
> >> thrown,
> >>
> >> > Caused by: org.apache.solr.common.SolrException: _version_field must
> >> exist
> >> > in schema, using indexed="true" stored="true" and multiValued="false"
> >> > (_version_ does not exist)
> >> > at
> >> >
> >>
> org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> >> > at org.apache.solr.core.SolrCore.(SolrCore.java:606)
> >> > ... 26 more
> >> > 2
> >>
> >> > It's seem that there need a field like
> >> >  
> >> > in schema.xml. I am wonder what does this used for?
> >>
> >>
> >
> >
> > --
> > from Jun Wang
>



-- 
from Jun Wang


Re: What does _version_ field used for?

2012-10-17 Thread Alexandre Rafalovitch
Yes, just make sure you have it in the scheme. Solr handles the rest.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Oct 17, 2012 at 12:57 PM, Jun Wang  wrote:
> Is that said we just need to add this filed, and there is no more work?
>
> 2012/10/17 Rafał Kuć 
>
>> Hello!
>>
>> It is used internally by Solr, for example by features like partial
>> update functionality and update log.
>>
>> --
>> Regards,
>>  Rafał Kuć
>>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>>
>> > I ma moving to solr4.0 from beta version. There is a exception was
>> thrown,
>>
>> > Caused by: org.apache.solr.common.SolrException: _version_field must
>> exist
>> > in schema, using indexed="true" stored="true" and multiValued="false"
>> > (_version_ does not exist)
>> > at
>> >
>> org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
>> > at org.apache.solr.core.SolrCore.(SolrCore.java:606)
>> > ... 26 more
>> > 2
>>
>> > It's seem that there need a field like
>> >  
>> > in schema.xml. I am wonder what does this used for?
>>
>>
>
>
> --
> from Jun Wang


Re: What does _version_ field used for?

2012-10-17 Thread Jun Wang
Is that said we just need to add this filed, and there is no more work?

2012/10/17 Rafał Kuć 

> Hello!
>
> It is used internally by Solr, for example by features like partial
> update functionality and update log.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > I ma moving to solr4.0 from beta version. There is a exception was
> thrown,
>
> > Caused by: org.apache.solr.common.SolrException: _version_field must
> exist
> > in schema, using indexed="true" stored="true" and multiValued="false"
> > (_version_ does not exist)
> > at
> >
> org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> > at org.apache.solr.core.SolrCore.(SolrCore.java:606)
> > ... 26 more
> > 2
>
> > It's seem that there need a field like
> >  
> > in schema.xml. I am wonder what does this used for?
>
>


-- 
from Jun Wang


Re: What does _version_ field used for?

2012-10-17 Thread Rafał Kuć
Hello!

It is used internally by Solr, for example by features like partial
update functionality and update log.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> I ma moving to solr4.0 from beta version. There is a exception was thrown,

> Caused by: org.apache.solr.common.SolrException: _version_field must exist
> in schema, using indexed="true" stored="true" and multiValued="false"
> (_version_ does not exist)
> at
> org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> at org.apache.solr.core.SolrCore.(SolrCore.java:606)
> ... 26 more
> 2

> It's seem that there need a field like
>  
> in schema.xml. I am wonder what does this used for?



What does _version_ field used for?

2012-10-17 Thread Jun Wang
I ma moving to solr4.0 from beta version. There is a exception was thrown,

Caused by: org.apache.solr.common.SolrException: _version_field must exist
in schema, using indexed="true" stored="true" and multiValued="false"
(_version_ does not exist)
at
org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
at org.apache.solr.core.SolrCore.(SolrCore.java:606)
... 26 more
2

It's seem that there need a field like
 
in schema.xml. I am wonder what does this used for?
-- 
from Jun Wang