date:20130605

On 6/5/2013 11:25 PM, TwoFirst TwoLast wrote:
> 1) If I change one field's type in my schema, will that cause problems with
> the index or searching?  My data is pulled in chunks off of a mysql server
> so one field in the currently indexed data is simply an "int" type field in
> solr.  I would like to change this to a string moving forward, but still
> expect to search across the int/string field.  Will this be ok?
> 
> 2) My motivation for #1 is that I have thousands of records that are
> exactly the same in mysql aside from a user_id column.  Prior to inserting
> into mysql I am thinking that I can concatenate the user_ids together into
> a space separated string and let solr just parse the string.  So the
> database and my data import handler would change a bit.
> 
> 3) If #2 is an appropriate approach, will a solr.TextField with
> a solr.WhitespaceTokenizerFactory be an ok way to approach this?  This does
> produce words where I would expect integers. I tried using a
> solr.TrieIntField with the solr.WhitespaceTokenizerFactory, but it throws
> an error.

If you change the field type, you have to completely reindex or Solr
will not work correctly.  This is true of most changes in schema.xml.

A field that was indexed as an int will not work correctly if you later
tell Solr that the field is a string.  The underlying Lucene index is
schema-less and has no idea what high-level data type each field is
using.  Solr reads and writes a particular data representation on each
field according to schema.xml.

Using an analysis chain (tokenizers and filters) is only supported on a
TextField.  It will not work on any other type, including StrField and
TrieIntField.

Thanks,
Shawn

Schema Change: Int -> String

2013-06-05 Thread TwoFirst TwoLast

1) If I change one field's type in my schema, will that cause problems with
the index or searching?  My data is pulled in chunks off of a mysql server
so one field in the currently indexed data is simply an "int" type field in
solr.  I would like to change this to a string moving forward, but still
expect to search across the int/string field.  Will this be ok?

2) My motivation for #1 is that I have thousands of records that are
exactly the same in mysql aside from a user_id column.  Prior to inserting
into mysql I am thinking that I can concatenate the user_ids together into
a space separated string and let solr just parse the string.  So the
database and my data import handler would change a bit.

3) If #2 is an appropriate approach, will a solr.TextField with
a solr.WhitespaceTokenizerFactory be an ok way to approach this?  This does
produce words where I would expect integers. I tried using a
solr.TrieIntField with the solr.WhitespaceTokenizerFactory, but it throws
an error.

Finally I need to make sure that exact matches will be performed on
user_ids in the string when searching.

Much appreciated!

Re: OPENNLP problems

2013-06-05 Thread Lance Norskog


Patrick-
I found the problem with multiple documents. The problem was that the 
API for the life cycle of a Tokenizer changed, and I only noticed part 
of the change. You can now upload multiple documents in one post, and 
the OpenNLPTokenizer will process each document.


You're right, the example on the wiki is wrong. The FilterPayloadsFilter 
default is to remove the given payloads, and needs keepPayloads="true" 
to retain them.


The fixed patch is up as LUCENE-2899-x.patch. Again, thanks for trying it.

Lance

https://issues.apache.org/jira/browse/LUCENE-2899

On 05/28/2013 10:08 PM, Patrick Mi wrote:

Hi there,

Checked out branch_4x and applied the latest patch
LUCENE-2899-current.patch however I ran into 2 problems

Followed the wiki page instruction and set up a field with this type aiming
to keep nouns and verbs and do a facet on the field
==

   
 
 
 
 
   
 
==

Struggled to get that going until I put the extra parameter
keepPayloads="true" in as below.
  

Question: am I doing the right thing? Is this a mistake on wiki

Second problem:

Posted the document xml one by one to the solr and the result was what I
expected.



   1
   check in the hotel


However if I put multiple documents into the same xml file and post it in
one go only the first document gets processed( only 'check' and 'hotel' were
showing in the facet result.)
  



   1
   check in the hotel


   2
   removes the payloads


   3
   retains only nouns and verbs 



Same problem when updated the data using csv upload.

Is that a bug or something I did wrong?

Thanks in advance!

Regards,
Patrick

Re: facet.missing=true returns null records with zero count also

2013-06-05 Thread Yonik Seeley

On Wed, Jun 5, 2013 at 6:11 PM, Chris Hostetter
 wrote:
> and think that conceptually it
> doesn't make sense for facet.missing to consider facet.mincount.

+1

"facet.missing" asks for the missing count - regardless of what it is.
Although it might make sense in some use cases to make facet.missing
pay attention to mincount, that's tougher to deal with on the client
side if it's not the behavior you want.

-Yonik
http://lucidworks.com

Re: How to update a particular document on multi-shards configuration?

That's what SolrCloud was invented for - fully-distributed indexing where 
any update can be sent to any node.


With non-SolrCloud distributed Solr, YOU, the developer are responsible for 
figuring out what node a document should be sent to, both for updates and 
for the original insertion. The point is that you get to decide how the node 
will be chosen.


So, go ahead and come up with your own function that takes one of your 
unique keys and generates a shard number. A simple approach is to hash the 
key, and take the low bits modulo the number of shards.


-- Jack Krupansky

-Original Message- 
From: bbarani

Sent: Wednesday, June 05, 2013 7:58 PM
To: solr-user@lucene.apache.org
Subject: How to update a particular document on multi-shards configuration?

I have 5 shards that has different data indexed in them (each document has a
unique id).

Now when I perform dynamic updates (push indexing) I need to update the
document corresponding to the unique id that is needs to be updated but I
wont know which core that corresponding document is present in.

I can do a search across all the shards for that unique id and get back the
document but I am not sure how to get the core information corresponding to
that document (unless I index that info, also this method requires one extra
search to find the document).

Is there a way to automatically push the document to proper core based on
unique id? I am  not using solrcloud yet, just the basic sharding. I know
this might not be possible without solrcloud feature, just thought of
getting your inputs..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-update-a-particular-document-on-multi-shards-configuration-tp4068476.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to update a particular document on multi-shards configuration?

I have 5 shards that has different data indexed in them (each document has a
unique id).

Now when I perform dynamic updates (push indexing) I need to update the
document corresponding to the unique id that is needs to be updated but I
wont know which core that corresponding document is present in. 

I can do a search across all the shards for that unique id and get back the
document but I am not sure how to get the core information corresponding to
that document (unless I index that info, also this method requires one extra
search to find the document). 

Is there a way to automatically push the document to proper core based on
unique id? I am  not using solrcloud yet, just the basic sharding. I know
this might not be possible without solrcloud feature, just thought of
getting your inputs..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-update-a-particular-document-on-multi-shards-configuration-tp4068476.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search across multiple collections

2013-06-05 Thread abillavara


hi
I've successfully searched over several separate collections (cores 
with unique schemas) using this kind of syntax.  This demonstrates a 2 
core search


http://localhost:8983/solr/collection1/select?
q=my phrase to search on&
start=0&
rows=25&
fl=*,score&
fq={!join+fromIndex=collection2+from=sku+to=sku}id:1571&


I've split up the parameters so you see easily
fq={!join+fromIndex=collection2+from=sku+to=sku}id:1571&

--> collection1/select  = use the select requestHandler out of 
collection1 as a base

--> collection2 is the 2nd core : equivalent of a table join in SQL
--> sku is the field shared in both collection1, and collection2
--> id is the field I want to find the id=1571 in.

Hope this helps
Anria



On 2013-06-05 16:17, bbarani wrote:
I am not sure the best way to search across multiple collection using 
SOLR

4.3.

Suppose, each collection have their own config files and I perform 
various
operations on collections individually but when I search I want the 
search
to happen across all collections. Can someone let me know how to 
perform

search on multiple collections? Do I need to use sharding again?



--
View this message in context:
http://lucene.472066.n3.nabble.com/Search-across-multiple-collections-tp4068469.html
Sent from the Solr - User mailing list archive at Nabble.com.

Search across multiple collections

I am not sure the best way to search across multiple collection using SOLR
4.3. 

Suppose, each collection have their own config files and I perform various
operations on collections individually but when I search I want the search
to happen across all collections. Can someone let me know how to perform
search on multiple collections? Do I need to use sharding again?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-across-multiple-collections-tp4068469.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuring seperate db-data-config.xml per shard

Might not be a solution but I had asked a similar question before..Check out
this thread..

http://lucene.472066.n3.nabble.com/Is-there-a-way-to-load-multiple-schema-when-using-zookeeper-td4058358.html

You can create multiple collection and each collecion can use completley 
differnet sets of configs. You then can have a cloud cluster for performing
searching across multiple collections.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuring-seperate-db-data-config-xml-per-shard-tp4068383p4068466.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search for docs where location not present

I tested using the new geospatial class, works fine with new spatial type
using class="solr.SpatialRecursivePrefixTreeFieldType"

http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

you can dynamically set the boolean value by using script transformer when
indexing the data. you dont really need to store it in DB.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444p4068462.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing Heavy dataset

I used to face this issue more often when I used CachedSqlEntityProcessor in
DIH. 

I then started indexing in batches (by including where condition) instead of
indexing everything at once..

You can refer to other available options for mysql driver

http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html

I also included the below stuff in datasource settings..





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Heavy-dataset-tp4068279p4068460.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search for docs where location not present

2013-06-05 Thread kevinlieb

Thanks for the replies.

I found that -location_field:* returns documents that both have and don't
have the field set.
I should clarify that I am using Solr 3.4
the location type is set to solr.LatLonType

Although I could add a boolean field that is true if location is set I'd
rather not have redundant data in the db (harkens back to my normalize sql
type days)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444p4068459.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing Heavy dataset


: Furthermore, I have realized that the issue is with MySQL as its not
: processing this table when a "where" is applied

http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F


-Hoss

Re: copyField generates "multiple values encountered for non multiValued field"


: I updated the Index using SolrJ and got the exact same error message

there aren't a lot of specifics provided in this thread, so this may not 
be applicable, but if you mean you actaully using the "atomic updates" 
feature to update an existing document then the problem is that you still 
have the existing value in your name2 field, as well as another copy of 
the "name" field evaluated by copyField after the updates are applied...

http://wiki.apache.org/solr/Atomic_Updates#Stored_Values


-Hoss

Re: search for docs where location not present

Either have your update client explicitly set a boolean field that indicates 
whether location is present, or use an update processor to set an explicit 
boolean field that means no location present:



 
   location_field
   has_location_b
 
 
   has_location_b
   [^\s]+
   true
 
 
   has_location_b
   false
 
 
 


-- Jack Krupansky

-Original Message- 
From: kevinlieb

Sent: Wednesday, June 05, 2013 5:43 PM
To: solr-user@lucene.apache.org
Subject: search for docs where location not present

I have a location-type field in my schema where I store lat / lon of a
document when this data is available.  In around half of my documents this
info is not available and I just don't store anything.

I am trying to find the documents where the location is not set but nothing
is working.
I tried q=location_field:* and get back no results
I tried q=-location_field:[* TO *] but got back an error
I even tried something like:
q=*:*&fq={!geofilt sfield=location_field}&pt=34.02093,-118.210755&d=25000
(distance set to a very large number)
but it returned fields even if they had no location_field set.

Can anyone think of a way to do this?

Thanks in advance!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search for docs where location not present

select?q=*-location_field:** worked for me



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444p4068452.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: facet.missing=true returns null records with zero count also


: filter again with the same facet. Also, when a facet has only one value, it
: doesn't make sense to show it to the user, since searching with that facet
: is just going to give the same result set again. So when facet.missing does
: not work with facet.mincount, it is a bit of a hassle for us Will work
: on handling it in our program.Thank you for the clarification

yeah .. i totally unerstand where you are coming from, i'm just not 
certain that it's clear cut that we should change the current behavior 
since: 1) it's trivial to work arround client side; b) some other users 
might be depending on the current behavior and think that conceptually it 
doesn't make sense for facet.missing to consider facet.mincount.

i should have said before but: feel free to open an issue baout this and 
propose a patch, i'm just not sure it's a slam dunk unless we make an easy 
way to configure it to continue working the current way as well.


-Hoss

Re: Solrj Stats encoding problem


On 6/5/2013 2:11 PM, ethereal wrote:

Hi,

I've tested a query using solr admin web interface and it works fine.
But when I'm trying to execute the same search using solrj, it doesn't
include Stats information.
I've figured out that it's because my query is encoded.
Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO
2013-06-30T11:59:59.999Z]&stats=true&stats.field=numberOfBytes&stats.facet=eventType
The query in java is like
q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType
If I copy this query to browser address bar, it doesn't work, but it does if
I replace encoded &:= with original values. What should I do do make it work
through java?
The code is like the following:

SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(queryBuilder.toString());
QueryResponse query = getSolrServer().query(solrQuery);


The only QueryBuilder objects I can find are in the Lucene API, so I 
have no idea what that part of your code is doing.  Here's how I would 
duplicate the query you reference in SolrJ.  The query string is broken 
apart so that the lines won't wrap awkwardly:


String url = "http://localhost:8983/solr/collection1";;
SolrServer server = new HttpSolrServer(url);


String qs = "eventTimestamp:"
  + "[2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]";
SolrQuery query = new SolrQuery();
query.setQuery(qs);
query.set("stats", "true");
query.set("stats.field", "numberOfBytes");
query.set("stats.facet", "eventType");

QueryResponse rsp = server.query(query);


Thanks,
Shawn

Re: data-import problem

A Solr index does not need a unique key, but almost all indexes use one.

http://wiki.apache.org/solr/UniqueKey

Try the below query passing id as id instead of titleid..

 
  


A proper dataimport config will look like,









--
View this message in context: 
http://lucene.472066.n3.nabble.com/data-import-problem-tp4068345p4068447.html
Sent from the Solr - User mailing list archive at Nabble.com.

search for docs where location not present

2013-06-05 Thread kevinlieb

I have a location-type field in my schema where I store lat / lon of a
document when this data is available.  In around half of my documents this
info is not available and I just don't store anything.

I am trying to find the documents where the location is not set but nothing
is working.  
I tried q=location_field:* and get back no results
I tried q=-location_field:[* TO *] but got back an error
I even tried something like:
q=*:*&fq={!geofilt sfield=location_field}&pt=34.02093,-118.210755&d=25000
(distance set to a very large number)
but it returned fields even if they had no location_field set.

Can anyone think of a way to do this?

Thanks in advance!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Entire query is stopwords

Your problem statement is fairly odd. You say
you've defined "object" as a stopword, but then
you want your query to return documents that
contain "object". By definition stopwords are
something that is considered irrelevant for searching
and are ignored.

So why not just take "object" out of your stopwords
file? Perhaps a separate stopwords file for that
particular field? Or just not use stopwords at all
for that field?

Best
Erick

On Wed, Jun 5, 2013 at 3:36 PM, Vardhan Dharnidharka
 wrote:
>
>
>
> Hi,
>
> I am using the standard edismax parser and my example query is as follows:
>
> {!edismax qf='object_description ' rows=10 start=0 mm=-40% v='object'}
>
> In this case, 'object' happens to be a stopword in the StopWordsFilter in my 
> datatype 'object_description'. Now, since 'object' is not indexed at all, the 
> query does not return any results. In an ideal case, I would want documents 
> containing the term 'object' to be returned.
>
> What is the best practice to achieve this? Index stop-words and re-query with 
> 'stopwords=false'. Or can this be done without re-querying?
>
> Thanks,
> Vardhan
>

Re: data-import problem

My usual admonishment is that Solr isn't a database, and when
you try to use it like one you're just _asking_ for problems. That
said

Consider two options:
1> use a different core for each table.
2> in schema.xml, remove the id field (required="true" _might_ be specified)
  remove the  definition in schema.xml
You'll have to re-index of course.

But do not, that while Solr does not _require_ a  definition,
almost all solr installations have one.

Best
Erick

On Wed, Jun 5, 2013 at 3:19 PM, Stavros Delisavas  wrote:
> Thanks for the hints.
> I am not sure how to solve this issue. I previously made a typo, there are
> definetly two different tables.
> Here is my real configuration:
>
> http://pastebin.com/JUDzaMk0
>
> For testing purposes I added "LIMIT 10" to the SQL-statements because my
> tables are very huge and tests would take too long (about 5gb, 6.5million
> rows). I included my whole data-config and what I have changed from the
> default schema.xml. I don't know how to solve the "all ids have to be
> unique"-problem. I can not believe that Solr does not offer any solution at
> all to handle multiple data sources with their own individual ids. Maybe its
> possible to have solr create its own ids while importing the data?
>
> Actually there is no direct relation between my "name"-table and my
> "title"-table. All I want is to be able to do fast text-search in those two
> tables in order to find the belonging ids of these entries.
>
> Let me know if you need more information.
>
> Thank you!
>
>
>
>
>
> Am 05.06.2013 20:54, schrieb Gora Mohanty:
>
>> On 6 June 2013 00:09, Stavros Delisavas  wrote:
>>>
>>> Thanks so far.
>>>
>>> This change makes Solr work over the title-entries too, yay! Unfortunatly
>>> they don't get processed(skipped rows). In my log it says
>>> "missing required field id" for every entry.
>>>
>>> I checked my schema.xml. In there "id" is not set as a required field.
>>> removing the uniquekey-property also leads to no improvement.
>>
>> [...]
>>
>> There are several things wrong with your problem statement.
>> You say that you have two tables, but both SELECTs seem
>> to use the same table. I am going to assume that you really
>> have two different tables.
>>
>> Unless you have changed the default schema.xml, "id" should
>> be defined as the uniqueKey for the document. You probably
>> do not want to remove that, and even if you just remove the
>> uniqueKey property, the field "id" remains defined as a required
>> field.
>>
>> The issue is with with your SELECT for the second entity:
>> 
>> This renames "id" to titleid, and hence the required field
>> "id" in schema.xml is missing.
>>
>> While you do need something like:
>> 
>>
>>
>> 
>>
>> However, you will need to ensure that the ids are unique
>> in the two tables, else entries from the second entity will
>> overwrite matching ids from the first.
>>
>> Also, do you have field definitions within the entities? Please
>> share the complete schema.xml and the DIH configuration
>> file with us, rather than snippets: Use pastebin.com if they
>> are large.
>>
>> Regards,
>> Gora
>>
>

Re: Solrj Stats encoding problem


: I've tested a query using solr admin web interface and it works fine.
: But when I'm trying to execute the same search using solrj, it doesn't
: include Stats information.
: I've figured out that it's because my query is encoded.

I don't think you are understading how to use SolrJ andthe SolrQuery 
object

: Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO
: 
2013-06-30T11:59:59.999Z]&stats=true&stats.field=numberOfBytes&stats.facet=eventType
: The query in java is like
: 
q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType

...

: SolrQuery solrQuery = new SolrQuery();
: solrQuery.setQuery(queryBuilder.toString());
: QueryResponse query = getSolrServer().query(solrQuery);

it looks like you are passing the setQuery method an entire URL encoded 
set of params from a request you made in your browser.  the setQuery 
method is syntactic sugar for for specifying just the "q" param containing 
the query string, and it should not alreayd 
be escaped (ie: "eventTimestamp:[2013-06-01T12:00:00.000Z TO 
2013-06-30T11:59:59.999Z]").  Other methods exist on the SolrQuery 
object to provide syntactic sugar for other things (ie: specifying facet 
fields, enabling highlighting, etc...)

If you want to provide a list of params using explicit names (q, stats, 
stats,field, etc...) you can ignore the helper methods on SolrQuery and 
just direct use the low level methods it inherits from 
ModifibleSolrParams like "setParam" ...


SolrQuery query = new SolrQuery();
query.setParam("q", "eventTimestamp:[2013-06-01T12:00:00.000Z TO 
2013-06-30T11:59:59.999Z]");
query.setParam("stats", true);
query.setParam("stats.field", "numberOfBytes","eventType");
QueryResponse response = getSolrServer().query(query);


-Hoss

Re: Indexing Heavy dataset

Note that stored=true/false is irrelevant to the raw search time.

What it _is_ relevant to is the time it takes to assemble the doc
for return, if (and only if) you return that field. I claim your search
time would be fast if you went ahead and stored the field,
and specified an fl clause that did NOT contain the big field. Oh,
and you'd have to have lazy field loading enabled too.

FWIW,
Erick

On Wed, Jun 5, 2013 at 10:29 AM, Raheel Hasan  wrote:
> some values in the field are up to a 1M as well
>
>
> On Wed, Jun 5, 2013 at 7:27 PM, Raheel Hasan wrote:
>
>> ok thanks for the reply The field having values like 60kb each
>>
>> Furthermore, I have realized that the issue is with MySQL as its not
>> processing this table when a "where" is applied
>>
>> Secondly, I have turned this field to "*stored=false*" and now the "*
>> select/*" is fast working again
>>
>>
>>
>> On Wed, Jun 5, 2013 at 6:56 PM, Shawn Heisey  wrote:
>>
>>> On 6/5/2013 3:08 AM, Raheel Hasan wrote:
>>> > Hi,
>>> >
>>> > I am trying to index a heavy dataset with 1 particular field really too
>>> > heavy...
>>> >
>>> > However, As I start, I get Memory warning and rollback
>>> (OutOfMemoryError).
>>> > So, I have learned that we can use -Xmx1024m option with java command to
>>> > start the solr and allocate more memory to the heap.
>>> >
>>> > My question is, that since this could also become insufficient later,
>>> so it
>>> > the issue related to cacheing?
>>> >
>>> > here is my cache block in solrconfig:
>>> >
>>> > >> >  size="512"
>>> >  initialSize="512"
>>> >  autowarmCount="0"/>
>>> >
>>> > >> >  size="512"
>>> >  initialSize="512"
>>> >  autowarmCount="0"/>
>>> >
>>> > >> >size="512"
>>> >initialSize="512"
>>> >autowarmCount="0"/>
>>> >
>>> > I am thinking like maybe I need to turn of the cache for
>>> "documentClass".
>>> > Anyone got a better idea? Or perhaps there is another issue here?
>>>
>>> Exactly how big is this field?  Do you need this giant field returned
>>> with your results, or is it just there for searching?
>>>
>>> Caches of size 512, especially with autowarm disabled, are probably not
>>> a major cause for concern, unless the big field is big enough so that
>>> 512 of them is really really huge.  If that's the case, I would reduce
>>> the size of your documentCache, not turn it off.
>>>
>>> The value of ramBufferSizeMB elsewhere in your config is more likely to
>>> affect how much RAM gets used during indexing.  The default for this
>>> field as of Solr 4.1.0 is 100.  Most people can reduce this value.
>>>
>>> I'm writing a reply to another thread where you are participating, with
>>> info that will likely be useful for you too.  Look for that.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>>
>> --
>> Regards,
>> Raheel Hasan
>>
>
>
>
> --
> Regards,
> Raheel Hasan

Re: Heap space problem with mlt query

To add some numbers to adityab's comment.

Each entry in your filter cache will probably consist
of maxDocs/8 bytes plus some overhead. Or about 16G.
This will only grow as you fire queries at Solr, so
it's no surprise you're running out of memory as you
process queries.

Your documentCache is probably also a problem, although
I'm extrapolating based on an 80G index with only 1M docs.

The result cache is also very big, but it's usually much smaller.
Still, I'd set it back to the defaults.

Why did you change these from the defaults? The very
first thing I'd do is change them back.

Your autowarm counts are also a problem at 2,048.
Again, take the filterCache. It's essentially a map
where each entry's key is the fq clause and the
value is the set of documents that match the query,
often stored as a bit set (thus the maxDocs/8 above).
Whenever a new searcher is opened in your setup, the
most recent 2,048 fq clauses will be re-executed. Which
should really kill your searcher open times. Try something
reasonable like 16-32.

These are caches that are intended to age out the oldest
entries, not hold all the entries you ever send at Solr.

Best
Erick

On Wed, Jun 5, 2013 at 9:35 AM, adityab  wrote:
> Did you try reducing filter and query cache. They are fairly large too unless
> you really need them to be cached for your use cache.
> Do you have that many distinct filter queries hitting solr for the size you
> have defined for filterCache?
> Are you doing any sorting? as this will chew up a lot of memory because of
> lucene's internal field cache
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068326.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solrj Stats encoding problem

Sounds like the Solr Admin UI is too-aggressively encoding the query part of 
the URL for display. Each query parameter value needs to be encoded, not the 
entire URL query string as a whole.


-- Jack Krupansky

-Original Message- 
From: ethereal

Sent: Wednesday, June 05, 2013 4:11 PM
To: solr-user@lucene.apache.org
Subject: Solrj Stats encoding problem

Hi,

I've tested a query using solr admin web interface and it works fine.
But when I'm trying to execute the same search using solrj, it doesn't
include Stats information.
I've figured out that it's because my query is encoded.
Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO
2013-06-30T11:59:59.999Z]&stats=true&stats.field=numberOfBytes&stats.facet=eventType
The query in java is like
q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType
If I copy this query to browser address bar, it doesn't work, but it does if
I replace encoded &:= with original values. What should I do do make it work
through java?
The code is like the following:

SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(queryBuilder.toString());
QueryResponse query = getSolrServer().query(solrQuery);



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-05 Thread Mikhail Khludnev

Please excuse my misunderstanding, but I always wonder why this index time
processing is suggested usually. from my POV is the case for query-time
processing i.e. PrefixQuery aka wildcard query Jason* .
Ultra-fast term retrieval also provided by TermsComponent.

On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky wrote:

> ngrams?
>
> See:
> http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
> apache/lucene/analysis/ngram/**NGramFilterFactory.html
>
> -- Jack Krupansky
>
> -Original Message- From: Prathik Puthran
> Sent: Wednesday, June 05, 2013 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: Configuring lucene to suggest the indexed string for all the
> searches of the substring of the indexed string
>
>
> Hi,
>
> Is it possible to configure solr to suggest the indexed string for all the
> searches of the substring of the string?
>
> Thanks,
> Prathik
>

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Solrj Stats encoding problem

2013-06-05 Thread ethereal

Hi,

I've tested a query using solr admin web interface and it works fine.
But when I'm trying to execute the same search using solrj, it doesn't
include Stats information.
I've figured out that it's because my query is encoded.
Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO
2013-06-30T11:59:59.999Z]&stats=true&stats.field=numberOfBytes&stats.facet=eventType
The query in java is like
q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType
If I copy this query to browser address bar, it doesn't work, but it does if
I replace encoded &:= with original values. What should I do do make it work
through java?
The code is like the following:

SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(queryBuilder.toString());
QueryResponse query = getSolrServer().query(solrQuery);



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: problem with zkcli.sh linkconfig


On 6/5/2013 10:05 AM, Mark Miller wrote:

Sounds like a bug - we probably don't have a test that updates a link - if you 
can make a JIRA issue, I'll be happy to look into it soon.


I will go ahead and create an issue so that a test can be built, but I 
have some more info: It works perfectly when running the script from the 
4.3.1 example, and from the 4.2.1 example.


I am using slf4j 1.7.2 and log4j 1.4.17 in my production 4.2.1 lib/ext. 
 That is the only difference I can think of at the moment.


Thanks,
Shawn

Re: Solr 4.3 with Internationalization.

Check out this
http://stackoverflow.com/questions/5549880/using-solr-for-indexing-multiple-languages

http://wiki.apache.org/solr/LanguageAnalysis#French

French stop words file (sample):
http://trac.foswiki.org/browser/trunk/SolrPlugin/solr/multicore/conf/stopwords-fr.txt

Solr includes three stemmers for French: one via
solr.SnowballPorterFilterFactory, an alternative stemmer  Solr3.1 via
solr.FrenchLightStemFilterFactory, and an even less aggressive approach 
Solr3.1 via solr.FrenchMinimalStemFilterFactory. Solr can also removing
elisions via solr.ElisionFilterFactory, and Lucene includes an example
stopword list.


...
  
  
  
  
...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-3-with-Internationalization-tp4068368p4068426.html
Sent from the Solr - User mailing list archive at Nabble.com.

Entire query is stopwords

2013-06-05 Thread Vardhan Dharnidharka




Hi, 

I am using the standard edismax parser and my example query is as follows:

{!edismax qf='object_description ' rows=10 start=0 mm=-40% v='object'}

In this case, 'object' happens to be a stopword in the StopWordsFilter in my 
datatype 'object_description'. Now, since 'object' is not indexed at all, the 
query does not return any results. In an ideal case, I would want documents 
containing the term 'object' to be returned. 

What is the best practice to achieve this? Index stop-words and re-query with 
'stopwords=false'. Or can this be done without re-querying?

Thanks, 
Vardhan

Re: data-import problem


Thanks for the hints.
I am not sure how to solve this issue. I previously made a typo, there 
are definetly two different tables.

Here is my real configuration:

http://pastebin.com/JUDzaMk0

For testing purposes I added "LIMIT 10" to the SQL-statements because my 
tables are very huge and tests would take too long (about 5gb, 
6.5million rows). I included my whole data-config and what I have 
changed from the default schema.xml. I don't know how to solve the "all 
ids have to be unique"-problem. I can not believe that Solr does not 
offer any solution at all to handle multiple data sources with their own 
individual ids. Maybe its possible to have solr create its own ids while 
importing the data?


Actually there is no direct relation between my "name"-table and my 
"title"-table. All I want is to be able to do fast text-search in those 
two tables in order to find the belonging ids of these entries.


Let me know if you need more information.

Thank you!





Am 05.06.2013 20:54, schrieb Gora Mohanty:

On 6 June 2013 00:09, Stavros Delisavas  wrote:

Thanks so far.

This change makes Solr work over the title-entries too, yay! Unfortunatly
they don't get processed(skipped rows). In my log it says
"missing required field id" for every entry.

I checked my schema.xml. In there "id" is not set as a required field.
removing the uniquekey-property also leads to no improvement.

[...]

There are several things wrong with your problem statement.
You say that you have two tables, but both SELECTs seem
to use the same table. I am going to assume that you really
have two different tables.

Unless you have changed the default schema.xml, "id" should
be defined as the uniqueKey for the document. You probably
do not want to remove that, and even if you just remove the
uniqueKey property, the field "id" remains defined as a required
field.

The issue is with with your SELECT for the second entity:

This renames "id" to titleid, and hence the required field
"id" in schema.xml is missing.

While you do need something like:

   
   


However, you will need to ensure that the ids are unique
in the two tables, else entries from the second entity will
overwrite matching ids from the first.

Also, do you have field definitions within the entities? Please
share the complete schema.xml and the DIH configuration
file with us, rather than snippets: Use pastebin.com if they
are large.

Regards,
Gora

Re: No files added to classloader from lib

2013-06-05 Thread O. Olson

Good call Jack. I totally missed that. I am curious how dataimport handler
worked before – if I made a mistake in the specification and it did not get
the jar. Anyway, it works now. Thanks again.
O.O.


"apache-solr-dataimporthandler-.*\.jar" - note that the "apache-" prefix has 
been removed from Solr jar files.

-- Jack Krupansky





--
View this message in context: 
http://lucene.472066.n3.nabble.com/No-files-added-to-classloader-from-lib-tp4068374p4068421.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: copyField generates "multiple values encountered for non multiValued field"

Look in the Solr log - the error message should tell you what the multiple 
values are. For example,


95484 [qtp2998209-11] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: ERROR: [doc=doc-1] multiple values 
encountered for non multiValued field content_s: [def, abc]


One of the values should be the value of the field that is the source of the 
copyField. Maybe the other value will give you a clue as to where it came 
from.


Check your SolrJ code - maybe you actually do try to initialize a value in 
the field that is the copyField target.


-- Jack Krupansky

-Original Message- 
From: Robert Krüger

Sent: Wednesday, June 05, 2013 1:17 PM
To: solr-user@lucene.apache.org
Subject: Re: copyField generates "multiple values encountered for non 
multiValued field"


OK, I have two fields defined as follows:

 
 

and this copyField directive



I updated the Index using SolrJ and got the exact same error message
that is in the subject. However, while waiting for feedback I built a
workaround at the application level and now reconstructing the
original state, to be able to answer you, I have different behaviour.
What happens now is that the field "name2" is populated with multiple
values although it is not defined as multiValued (see above).

Although this is strange, it is consistent with the earlier problem in
that copyField does not seem to overwrite the existing field values. I
may be using it incorrectly (it's the first time I am using copyField)
but the docs in the wiki did not say anything about an overwrite
option.

Cheers,

Robert


On Wed, Jun 5, 2013 at 5:16 PM, Jack Krupansky  
wrote:

Try describing your own symptom in your own words - because his issue
related to Solr 1.4. I mean, where exactly are you setting
"allowDuplicates=false"?? And why do you think it has anything to do with
adding documents to Solr? Solr 1.4 did not have atomic update, so sending
the exact same document twice would not result in a change in the index
(unless you had a date field with a value of "NOW".) Copy field only uses
values from the current document.

-- Jack Krupansky

-Original Message- From: Robert Krüger
Sent: Wednesday, June 05, 2013 10:37 AM
To: solr-user@lucene.apache.org
Subject: copyField generates "multiple values encountered for non
multiValued field"


I have the exact same problem as the guy here:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E

AFAICS he did not get an answer. Is this a known issue? What can I do
other than doing what copyField should do in my application?

I am using solr 4.0.0.

Thanks,

Robert

Re: data-import problem

2013-06-05 Thread Gora Mohanty

On 6 June 2013 00:09, Stavros Delisavas  wrote:
>
> Thanks so far.
>
> This change makes Solr work over the title-entries too, yay! Unfortunatly
> they don't get processed(skipped rows). In my log it says
> "missing required field id" for every entry.
>
> I checked my schema.xml. In there "id" is not set as a required field.
> removing the uniquekey-property also leads to no improvement.
[...]

There are several things wrong with your problem statement.
You say that you have two tables, but both SELECTs seem
to use the same table. I am going to assume that you really
have two different tables.

Unless you have changed the default schema.xml, "id" should
be defined as the uniqueKey for the document. You probably
do not want to remove that, and even if you just remove the
uniqueKey property, the field "id" remains defined as a required
field.

The issue is with with your SELECT for the second entity:

This renames "id" to titleid, and hence the required field
"id" in schema.xml is missing.

While you do need something like:

However, you will need to ensure that the ids are unique
in the two tables, else entries from the second entity will
overwrite matching ids from the first.

Also, do you have field definitions within the entities? Please
share the complete schema.xml and the DIH configuration
file with us, rather than snippets: Use pastebin.com if they
are large.

Regards,
Gora

Re: data-import problem

2013-06-05 Thread Raymond Wiker

On Jun 5, 2013, at 20:39 , Stavros Delisavas  wrote:
> Thanks so far.
> 
> This change makes Solr work over the title-entries too, yay! Unfortunatly 
> they don't get processed(skipped rows). In my log it says
> "missing required field id" for every entry.
> 
> I checked my schema.xml. In there "id" is not set as a required field. 
> removing the uniquekey-property also leads to no improvement.
> 
> Any further ideas?

You need a field to hold a unique identifier for the document, and your 
data-import setup must ensure that that specific fields gets a unique 
identifier. "Unique" here means unique across all documents, no matter where 
they come from.

Re: data-import problem


Thanks so far.

This change makes Solr work over the title-entries too, yay! 
Unfortunatly they don't get processed(skipped rows). In my log it says

"missing required field id" for every entry.

I checked my schema.xml. In there "id" is not set as a required field. 
removing the uniquekey-property also leads to no improvement.


Any further ideas?





Am 05.06.2013 18:01, schrieb sodoo:

Maybe problem is two document declare in data-config.xml.

You will try change this one.


  
  





--
View this message in context: 
http://lucene.472066.n3.nabble.com/data-import-problem-tp4068306p4068373.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Pivot Facets refining datetime, bleh

2013-06-05 Thread Stein Gran

This may be more suitable on the dev-list, but distributed pivot facets is
a very powerful feature. The Jira issue for this is SOLR-2894 (
https://issues.apache.org/jira/browse/SOLR-2894). I have done some testing
of the last patch for this issue, and it is as Andrew says: Everything but
datetime fields works just fine. There are no error messages for datetime
fields when used in a SolrCloud setup, the expected values are just not
there.

Best,
Stein J. Gran


On Thu, May 30, 2013 at 5:49 PM, Andrew Muldowney <
andrew.muldowne...@gmail.com> wrote:

> I've been trying to get into how distributed field facets do their work but
> I haven't been able to uncover how they deal with this issue.
>
> Currently distrib pivot facets does a getTermCounts(first_field) to
> populate a list at the level its working on.
>
> When putting together the data structure we set up a BytesRef, fill it in
> with the value using the FieldType.ReadableToIndexed call and then add the
> FieldType.ToObject of that bytesRef and associated field.
> --From getTermCounts comes fieldValue--
>   termval = new BytesRef();
>  ftype.readableToIndexed(fieldValue, termval);
> pivot.add( "value", ftype.toObject(sfield, termval) );
>
>
> This works great for everything but datetime, as datetime's .ToObject turns
> it into a human readable string that is unconvertable -at least in my
> investigation.
>
> I've tried to use the FieldType.ToInternal but that also fails on the human
> readable datetime format.
>
> My original idea was to skip the aformentioned block of code and just
> straight add the fieldValue to the data structure. This caused some pivot
> facet tests to return wonky results, I'm not sure if I should go down the
> path of trying to figure out those problems or if there is a different
> approach I should be taking.
>
> Any general guidance on how distributed field facets deals with this would
> be much appreciated.
>

Re: Phrase matching with set union as opposed to set intersection on query terms

On Wed, Jun 5, 2013 at 9:04 PM, Eustache Felenc
 wrote:
> There is also http://wiki.apache.org/solr/SolrRelevancyCookbook with nice
> examples.
>

Thank you.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: java.lang.NumberFormatException when adding latitude,longitude using DIH

That was a very silly mistake. I forgot to add the values to array before
putting it inside row..the below code works.. Thanks a lot...





--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-NumberFormatException-when-adding-latitude-longitude-using-DIH-tp4068223p4068410.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase matching with set union as opposed to set intersection on query terms

2013-06-05 Thread Eustache Felenc

There is also http://wiki.apache.org/solr/SolrRelevancyCookbook with 
nice examples.


On 06/05/2013 12:13 PM, Jack Krupansky wrote:

"Is there any other documentation that I should review?"

It's in the works! Within a week or two.

-- Jack Krupansky

-Original Message- From: Dotan Cohen
Sent: Wednesday, June 05, 2013 12:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Phrase matching with set union as opposed to set 
intersection on query terms


On Wed, Jun 5, 2013 at 6:10 PM, Shawn Heisey  wrote:

On 6/5/2013 9:03 AM, Dotan Cohen wrote:

How would one write a query which should perform set union on the
search terms (term1 OR term2 OR term3), and yet also perform phrase
matching if both terms are found? I tried a few variants of the
following, but in every case I am getting set intersection on the
search terms:

select?q={!q.op=OR}text:"term1 term2"~10


A phrase search by definition will require all terms to be present.
Even though it is multiple terms, conceptually it is treated as a single
term.

It sounds like what you are after is what edismax can do.  If you define
the pf field in addition to the qf field, Solr will do something pretty
amazing - it will automatically construct a phrase query from a
non-phrase query and search with it against multiple fields.  Done
correctly, this means that an exact match will be listed first in the
results.

http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29

Thanks,
Shawn



Thank you Shawn, this pretty much does what I need it to do:
select?defType=edismax&q={!q.op=OR}search_field:term1 
term2&pf=search_field


I'm reviewing the Edismax page now. Is there any other documentation
that I should review? I have found the Edismax page at the wonderful
lucidworks site, but if there are any other documentation that I
should review to squeeze the most out of Edismax thenI would love to
know about it.
http://docs.lucidworks.com/display/solr/The+Extended+DisMax+Query+Parser

Thank you very much!


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

RE: Sole instance state is down in cloud mode

2013-06-05 Thread James Thomas

Are you using IE?  If so, you might want to try using Firefox.

-Original Message-
From: sathish_ix [mailto:skandhasw...@inautix.co.in] 
Sent: Wednesday, June 05, 2013 6:16 AM
To: solr-user@lucene.apache.org
Subject: Sole instance state is down in cloud mode

Hi,

When i start a core in solr-cloud im getting below message in log

I have setup zookeeper separately and uploaded the config files.
When i start the solr instance in cloud mode, state is down.

INFO: Update state numShards=null message={
  "operation":"state",
  "numShards":null,
  "shard":"shard1",
  "roles":null,
  *"state":"down",*
  "core":"core1",
  "collection":"core1",
  "node_name":"x:9980_solr",
  "base_url":"http://x:9980/solr"}
Jun 5, 2013 6:10:48 AM org.apache.solr.common.cloud.ZkStateReader$2 process
INFO: A cluster state change: WatchedEvent state:SyncConnected 
type:NodeDataChanged path:/clusterstate.json, has occurred - updating...
(live nodes size: 1)

When i hit the url , i am getting left pane of the solr admin and righ side its 
keep on loading, any help ?

Thanks,
Sathish

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sole-instance-state-is-down-in-cloud-mode-tp4068298.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Create index on few unrelated table in Solr

Okey. I'm so sorry. I will not create same task in separate topic next time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Create-index-on-few-unrelated-table-in-Solr-tp4068054p4068405.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Two instances of solr - the same datadir?

2013-06-05 Thread Roman Chyla

So here it is for a record how I am solving it right now:

Write-master is started with: -Dmontysolr.warming.enabled=false
-Dmontysolr.write.master=true -Dmontysolr.read.master=http://localhost:5005
Read-master is started with: -Dmontysolr.warming.enabled=true
-Dmontysolr.write.master=false


solrconfig.xml changes:

1. all index changing components have this bit,
enable="${montysolr.master:true}" - ie.



2. for cache warming de/activation

...

3. to trigger refresh of the read-only-master (from write-master):


  curl
  .
  false
   ${montysolr.read.master:http://localhost
}/solr/admin/cores?wt=json&action=RELOAD&core=collection1


This works, I still don't like the reload of the whole core, but it seems
like the easiest thing to do now.

-- roman


On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla  wrote:

> Hi Peter,
>
> Thank you, I am glad to read that this usecase is not alien.
>
> I'd like to make the second instance (searcher) completely read-only, so I
> have disabled all the components that can write.
>
> (being lazy ;)) I'll probably use
> http://wiki.apache.org/solr/CollectionDistribution to call the curl after
> commit, or write some IndexReaderFactory that checks for changes
>
> The problem with calling the 'core reload' - is that it seems lots of work
> for just opening a new searcher, eeekkk...somewhere I read that it is cheap
> to reload a core, but re-opening the index searches must be definitely
> cheaper...
>
> roman
>
>
> On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge wrote:
>
>> Hi,
>> We use this very same scenario to great effect - 2 instances using the
>> same
>> dataDir with many cores - 1 is a writer (no caching), the other is a
>> searcher (lots of caching).
>> To get the searcher to see the index changes from the writer, you need the
>> searcher to do an empty commit - i.e. you invoke a commit with 0
>> documents.
>> This will refresh the caches (including autowarming), [re]build the
>> relevant searchers etc. and make any index changes visible to the RO
>> instance.
>> Also, make sure to use native in solrconfig.xml to
>> ensure the two instances don't try to commit at the same time.
>> There are several ways to trigger a commit:
>> Call commit() periodically within your own code.
>> Use autoCommit in solrconfig.xml.
>> Use an RPC/IPC mechanism between the 2 instance processes to tell the
>> searcher the index has changed, then call commit when called (more complex
>> coding, but good if the index changes on an ad-hoc basis).
>> Note, doing things this way isn't really suitable for an NRT environment.
>>
>> HTH,
>> Peter
>>
>>
>>
>> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla 
>> wrote:
>>
>> > Replication is fine, I am going to use it, but I wanted it for instances
>> > *distributed* across several (physical) machines - but here I have one
>> > physical machine, it has many cores. I want to run 2 instances of solr
>> > because I think it has these benefits:
>> >
>> > 1) I can give less RAM to the writer (4GB), and use more RAM for the
>> > searcher (28GB)
>> > 2) I can deactivate warming for the writer and keep it for the searcher
>> > (this considerably speeds up indexing - each time we commit, the server
>> is
>> > rebuilding a citation network of 80M edges)
>> > 3) saving disk space and better OS caching (OS should be able to use
>> more
>> > RAM for the caching, which should result in faster operations - the two
>> > processes are accessing the same index)
>> >
>> > Maybe I should just forget it and go with the replication, but it
>> doesn't
>> > 'feel right' IFF it is on the same physical machine. And Lucene
>> > specifically has a method for discovering changes and re-opening the
>> index
>> > (DirectoryReader.openIfChanged)
>> >
>> > Am I not seeing something?
>> >
>> > roman
>> >
>> >
>> >
>> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
>> > jhell...@innoventsolutions.com> wrote:
>> >
>> > > Roman,
>> > >
>> > > Could you be more specific as to why replication doesn't meet your
>> > > requirements?  It was geared explicitly for this purpose, including
>> the
>> > > automatic discovery of changes to the data on the index master.
>> > >
>> > > Jason
>> > >
>> > > On Jun 4, 2013, at 1:50 PM, Roman Chyla 
>> wrote:
>> > >
>> > > > OK, so I have verified the two instances can run alongside, sharing
>> the
>> > > > same datadir
>> > > >
>> > > > All update handlers are unaccessible in the read-only master
>> > > >
>> > > > > > > > enable="${solr.can.write:true}">
>> > > >
>> > > > java -Dsolr.can.write=false .
>> > > >
>> > > > And I can reload the index manually:
>> > > >
>> > > > curl "
>> > > >
>> > >
>> >
>> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
>> > > > "
>> > > >
>> > > > But this is not an ideal solution; I'd like for the read-only
>> server to
>> > > > discover index changes on its own. Any pointers?
>> > > >
>> > > > Thanks,
>> > > >
>> > > >  roman
>> > > >
>> > > >
>> > > > On Tu

Re: java.lang.NumberFormatException when adding latitude,longitude using DIH

Thanks a lot for your response Hoss.. I thought about using scriptTransformer
too but just thought of checking if there is any other way to do that..

Btw, for some reason the values are getting overridden even though its a
multivalued field.. Not sure where I am going wrong!!!

for latlong values - 33.7209548950195,34.474838
-117.176193237305,-117.573463   

The below value is getting indexed..


34.474838,-117.573463


*Script transformer:*




--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-NumberFormatException-when-adding-latitude-longitude-using-DIH-tp4068223p4068401.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: copyField generates "multiple values encountered for non multiValued field"

2013-06-05 Thread Robert Krüger

OK, I have two fields defined as follows:

and this copyField directive

I updated the Index using SolrJ and got the exact same error message
that is in the subject. However, while waiting for feedback I built a
workaround at the application level and now reconstructing the
original state, to be able to answer you, I have different behaviour.
What happens now is that the field "name2" is populated with multiple
values although it is not defined as multiValued (see above).

Although this is strange, it is consistent with the earlier problem in
that copyField does not seem to overwrite the existing field values. I
may be using it incorrectly (it's the first time I am using copyField)
but the docs in the wiki did not say anything about an overwrite
option.

Cheers,

Robert

On Wed, Jun 5, 2013 at 5:16 PM, Jack Krupansky  wrote:
> Try describing your own symptom in your own words - because his issue
> related to Solr 1.4. I mean, where exactly are you setting
> "allowDuplicates=false"?? And why do you think it has anything to do with
> adding documents to Solr? Solr 1.4 did not have atomic update, so sending
> the exact same document twice would not result in a change in the index
> (unless you had a date field with a value of "NOW".) Copy field only uses
> values from the current document.
>
> -- Jack Krupansky
>
> -Original Message- From: Robert Krüger
> Sent: Wednesday, June 05, 2013 10:37 AM
> To: solr-user@lucene.apache.org
> Subject: copyField generates "multiple values encountered for non
> multiValued field"
>
>
> I have the exact same problem as the guy here:
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E
>
> AFAICS he did not get an answer. Is this a known issue? What can I do
> other than doing what copyField should do in my application?
>
> I am using solr 4.0.0.
>
> Thanks,
>
> Robert

Re: facet.missing=true returns null records with zero count also

2013-06-05 Thread Rahul R

Hoss,
We rely heavily on facet.mincount because once a user has selected a facet,
it doesn't make sense for us to show that facet field to him and let him
filter again with the same facet. Also, when a facet has only one value, it
doesn't make sense to show it to the user, since searching with that facet
is just going to give the same result set again. So when facet.missing does
not work with facet.mincount, it is a bit of a hassle for us Will work
on handling it in our program.Thank you for the clarification

- Rahul


On Wed, Jun 5, 2013 at 12:32 AM, Chris Hostetter
wrote:

>
> : that facet value and see all documents. I thought facet.missing=true was
> : the answer.
> ...
> : facquery.setFacetMinCount(1);
>
> Hmm, yeah -- it looks like facet.missing doesn't take facet.mincount into
> consideration.
>
> I don't remember if that was intentional or not, but as a special case
> one-off count it seems like a toss up as to wether it would be more or
> less surprising to hide it if it's below the mincount. (it's very similar
> to doing one off facet.query for example, and those are always included in
> the response and don't consider the facet.mincount either)
>
> In general, this seems like a low impact thing though, correct?  i mean:
> the main advantage of facet.mincount is to reduce what could be a very
> large amount of useless data from being stream from the server->client,
> particularly in the case of using facet.sort where you really need the
> consraints eliminated server side in order to get the sort=limit applied
> correctly.
>
> but with the facet.missing value, it's just a single value per field that
> can easily be ignored by the client if it's not desired because of the
> mincount.  or to put it another way: the amount of work needed to ignor
> this on the client, is less then the amount of work to make it
> configurable to ignore it on the server.
>
>
> -Hoss
>

Re: Query Elevation Component

2013-06-05 Thread davers

I have not implemented it yet. And I forget the exact webpage I found. But
there was a person on that page discussing the same problem and said it was
easy to implement a solution for it but he did not share his solution. If
you figure it out let me know.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-Elevation-Component-tp4056856p4068394.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase matching with set union as opposed to set intersection on query terms

> select?defType=edismax&q={!q.op=OR}search_field:term1 term2&pf=search_field
>

Is there any way to perform a fuzzy search with this method? I have
tried appending "~1" to every term in the search like so:
select?defType=edismax&q={!q.op=OR}search_field:term1~1%20term2~1&pf=search_field

However, two issues:
1) It doesn't work! The results are identical to the results given
when not appending "~1" to every term (or "~3").

2) If at all possible, I would rather define the 'fuzzyness'
elsewhere. Right now I would have to mangle the user-input in order to
add the "~1" to the end of each term.

Note that the ExtendedDisMax page does in fact mention that fuzziness
is supported:
http://wiki.apache.org/solr/ExtendedDisMax#Query_Syntax

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Create index on few unrelated table in Solr


Please don't create new threads re-asking the same questions -- especailly 
when the existing thread is only a day old, and still actively getting 
responses.

it just increases the overall noise of of the list, and results in 
multiple people wasting their time providing you with the same answers...

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3ccaldws-wknmwuralhhmmmtth+7noy1ewu0z-shtmwcoaxzes...@mail.gmail.com%3E

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3Calpine.DEB.2.02.1306041534070.2959@frisbee%3E




: Date: Tue, 4 Jun 2013 02:10:52 -0700 (PDT)
: From: sodoo 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Create index on few unrelated table in Solr
: 
: I want to create index few tables. All tables not related.
: 
: In data-config.xml, that I created to create index
: 
: 
:  
:   
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
:  
: And I have register schema.xml these fields. 
: I tried to make full import but unfortunately only the last entity is
: indexed. Other entities are not index.
: 
: What should I do to import all the entities?
: 
: 
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/Create-index-on-few-unrelated-table-in-Solr-tp4068054.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss

Configuring seperate db-data-config.xml per shard

2013-06-05 Thread RadhaJayalakshmi

Hi,
We have a setup where we have 3 shards in a collection, and each shard in
the collection need to load different sets of data
That is
Shard1- will contain data only for Entity1
Shard2 - will contain data for entity2
shard3- will contain data for entity3
So in this case,. the db-data-config.xml can't be same for three shards so
it can;'t be uploaded in zookeeper.
Is there any way, where we can mantain db-data-config.xml inside each
shard's folder and make our shards to refer to this
db-data-config.xml(during data import), rather than looking for this file in
zookeepers repository

Thanks in Advance
Radha



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuring-seperate-db-data-config-xml-per-shard-tp4068383.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multitable import - uniqueKey


: How can I don't overwrite other entities?
: Please assist me on this example.

I'm confused, you sent this in direct reply to my last message, which 
contained the following...

1) a paragraph describing the general approach to solving this type of 
problem...

>> You can use TemplateTransformer to create a synthetic ID for each 
>> entity using some constant value combined with the auto-increment 
>> value from your DB, for example...

2) a link to an article i wrote a while back dicussing how to solve the 
exact problem you are having...

http://searchhub.org/2011/02/12/solr-powered-isfdb-part-4/

3) links to specific commits in a github repo where there is a working 
example of using DIH to index multiple types of documents from differnet 
tables in a single Solr index.  The commits i linked to show *exactly* 
which changes are needed to go from indexing a single entity to indexing 
two entities w/o conflicting ids...

https://github.com/lucidimagination/isfdb-solr/commit/85d7caf19746399755f6f1c39f48a654da3c5b11
https://github.com/lucidimagination/isfdb-solr/commit/26e945747404125ce5b835e2157c6e2612ff2387


...did you look at any of this? did you try it? do you have any pecific 
quesions baout this approach?



-Hoss

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-05 Thread Prathik Puthran

ngrams won't work here. If I index all the ngrams of the string and when I
try to search for some string it would suggest all the ngrams as well.
Eg:
Dictionary contains the word "Jason Bourne" and you index all the ngrams of
the above word.
When I try to search for "Jason" solr suggests all the ngrams having the
word "Jason". Instead of just suggesting "Jason Bourne" lucene suggests
"Jason B", "Jason Bo", "Jason Bou", "Jason Bour", "Jason Bourn", "Jason
Bourne".

What should I do so that I only get "Jason Bourne" as the suggestion when
the uses searches any substring of this (Bour, Bourne etc).

On Wed, Jun 5, 2013 at 9:39 PM, Jack Krupansky wrote:

> ngrams?
>
> See:
> http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
> apache/lucene/analysis/ngram/**NGramFilterFactory.html
>
> -- Jack Krupansky
>
> -Original Message- From: Prathik Puthran
> Sent: Wednesday, June 05, 2013 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: Configuring lucene to suggest the indexed string for all the
> searches of the substring of the indexed string
>
>
> Hi,
>
> Is it possible to configure solr to suggest the indexed string for all the
> searches of the substring of the string?
>
> Thanks,
> Prathik
>

Re: Create index on few unrelated table in Solr

2013-06-05 Thread Alexandre Rafalovitch

Try creating a composite key that includes schema name as part of the
key. Otherwise, what do you actually expect to happen if all tables
have ID=1. What (single) entry do you expect to end up in Solr?

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Jun 5, 2013 at 11:57 AM, sodoo  wrote:
> Yes. My ID field is uniquekey. How can I don't override each other?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Create-index-on-few-unrelated-table-in-Solr-tp4068054p4068371.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multitable import - uniqueKey

Hehe. 

Yes my all tables ID field names are different. 
For example:

I have 5 table. These names are 'admin, account, group, checklist'

admin=>id ->uniquekey
account=>account_id ->uniquekey
group=>group_id ->uniquekey
checklist=>id->uniquekey

Also I thought last entity overwrite other entities. 

I'm sorry. I don't understand your this example.

Now I try to use below config. 
### data-config.xml

 












Then my schema.xml



id


How can I don't overwrite other entities?
Please assist me on this example.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multitable-import-uniqueKey-tp4067796p4068384.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: data-import problem

Maybe problem is two document declare in data-config.xml.

You will try change this one.

 
  
  
 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/data-import-problem-tp4068306p4068373.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Create index on few unrelated table in Solr

Yes. My ID field is uniquekey. How can I don't override each other?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Create-index-on-few-unrelated-table-in-Solr-tp4068054p4068371.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase matching with set union as opposed to set intersection on query terms


"Is there any other documentation that I should review?"

It's in the works! Within a week or two.

-- Jack Krupansky

-Original Message- 
From: Dotan Cohen

Sent: Wednesday, June 05, 2013 12:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Phrase matching with set union as opposed to set intersection 
on query terms


On Wed, Jun 5, 2013 at 6:10 PM, Shawn Heisey  wrote:

On 6/5/2013 9:03 AM, Dotan Cohen wrote:

How would one write a query which should perform set union on the
search terms (term1 OR term2 OR term3), and yet also perform phrase
matching if both terms are found? I tried a few variants of the
following, but in every case I am getting set intersection on the
search terms:

select?q={!q.op=OR}text:"term1 term2"~10


A phrase search by definition will require all terms to be present.
Even though it is multiple terms, conceptually it is treated as a single
term.

It sounds like what you are after is what edismax can do.  If you define
the pf field in addition to the qf field, Solr will do something pretty
amazing - it will automatically construct a phrase query from a
non-phrase query and search with it against multiple fields.  Done
correctly, this means that an exact match will be listed first in the
results.

http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29

Thanks,
Shawn



Thank you Shawn, this pretty much does what I need it to do:
select?defType=edismax&q={!q.op=OR}search_field:term1 term2&pf=search_field

I'm reviewing the Edismax page now. Is there any other documentation
that I should review? I have found the Edismax page at the wonderful
lucidworks site, but if there are any other documentation that I
should review to squeeze the most out of Edismax thenI would love to
know about it.
http://docs.lucidworks.com/display/solr/The+Extended+DisMax+Query+Parser

Thank you very much!


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string


ngrams?

See:
http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html

-- Jack Krupansky

-Original Message- 
From: Prathik Puthran

Sent: Wednesday, June 05, 2013 11:59 AM
To: solr-user@lucene.apache.org
Subject: Configuring lucene to suggest the indexed string for all the 
searches of the substring of the indexed string


Hi,

Is it possible to configure solr to suggest the indexed string for all the
searches of the substring of the string?

Thanks,
Prathik

Re: Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-05 Thread SandeepM

/So we see the jagged edge waveform which keeps climbing (GC cycles don't
completely collect memory over time).  Our test has a short capture from
real traffic and we are replaying that via solrmeter./

Any idea why the memory climbs over time.  The GC should cleanup after data
is shipped back.  Could there be a memory leak in SOLR?

Appreciate any help.
Thanks.
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879p4068378.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: No files added to classloader from lib

"apache-solr-dataimporthandler-.*\.jar" - note that the "apache-" prefix has 
been removed from Solr jar files.


-- Jack Krupansky

-Original Message- 
From: O. Olson

Sent: Wednesday, June 05, 2013 12:01 PM
To: solr-user@lucene.apache.org
Subject: No files added to classloader from lib

Hi,

I downloaded Solr 4.3 and I am attempting to run and configure a separate
Solr instance under Jetty. I copied the Solr "dist" directory contents to a
directory called "solrDist" under the single core "db" that I was running. I
then attempted to get the DataImportHandler using the following in my
solrconfig.xml:

 

In the log file, I see a lot of messages that the Jar Files in "solrDist"
were added to the classloader. E.g.

…….
534  [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/<>/Documents/Jetty/Jetty9/solr/db/lib/solr-clustering-4.3.0.jar'
to classloader
534  [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/<>/Documents/Jetty/Jetty9/solr/db/lib/solr-core-4.3.0.jar'
to classloader
535  [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/<>/Documents/Jetty/Jetty9/solr/db/lib/solr-dataimporthandler-4.3.0.jar'
to classloader
535  [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/<>/Documents/Jetty/Jetty9/solr/db/lib/solr-dataimporthandler-extras-4.3.0.jar'
to classloader
535  [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/<>/Documents/Jetty/Jetty9/solr/db/lib/solr-langid-4.3.0.jar'
to classloader
535  [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader  - Adding
'file:/C:/Users/<>/Documents/Jetty/Jetty9/solr/db/lib/solr-solrj-4.3.0.jar'
to classloader

.

However in the end I get the following Warning:

570  [coreLoadExecutor-3-thread-1] WARN
org.apache.solr.core.SolrResourceLoader  - No files added to classloader
from lib: solrDist/ (resolved as:
C:\Users\<>\Documents\Jetty\Jetty9\solr\db\solrDist).

Why is this? I thought the Jar Files were added to the classloader, but the
last messages seems to say that none were added. I know that this is a
warning, but I am just curious. I’d be grateful to anyone who has an idea
regarding this.

Thank you,
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/No-files-added-to-classloader-from-lib-tp4068374.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase matching with set union as opposed to set intersection on query terms

On Wed, Jun 5, 2013 at 6:23 PM, Jack Krupansky  wrote:
> term1 OR term2 OR "term1 term2"^2
>
> term1 OR term2 OR "term1 term2"~10^2
>
> The latter would rank documents with the terms nearby higher, and the
> adjacent terms highest.
>
> term1 OR term2 OR "term1 term2"~10^2 OR "term1 term2"^20 OR "term2 term1"^20
>
> To further boost adjacent terms.
>
> But the edismax pf/pf2/pf3 options might be good enough for you.
>

Thank you Jack. I suppose that I could write a script in PHP to create
such a query string from an arbitrary-length phrase, but it wouldn't
be pretty! Edismax does in fact meet my need, though.

Thanks!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Two instances of solr - the same datadir?

2013-06-05 Thread Roman Chyla

Hi Peter,

Thank you, I am glad to read that this usecase is not alien.

I'd like to make the second instance (searcher) completely read-only, so I
have disabled all the components that can write.

(being lazy ;)) I'll probably use
http://wiki.apache.org/solr/CollectionDistribution to call the curl after
commit, or write some IndexReaderFactory that checks for changes

The problem with calling the 'core reload' - is that it seems lots of work
for just opening a new searcher, eeekkk...somewhere I read that it is cheap
to reload a core, but re-opening the index searches must be definitely
cheaper...

roman


On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge  wrote:

> Hi,
> We use this very same scenario to great effect - 2 instances using the same
> dataDir with many cores - 1 is a writer (no caching), the other is a
> searcher (lots of caching).
> To get the searcher to see the index changes from the writer, you need the
> searcher to do an empty commit - i.e. you invoke a commit with 0 documents.
> This will refresh the caches (including autowarming), [re]build the
> relevant searchers etc. and make any index changes visible to the RO
> instance.
> Also, make sure to use native in solrconfig.xml to
> ensure the two instances don't try to commit at the same time.
> There are several ways to trigger a commit:
> Call commit() periodically within your own code.
> Use autoCommit in solrconfig.xml.
> Use an RPC/IPC mechanism between the 2 instance processes to tell the
> searcher the index has changed, then call commit when called (more complex
> coding, but good if the index changes on an ad-hoc basis).
> Note, doing things this way isn't really suitable for an NRT environment.
>
> HTH,
> Peter
>
>
>
> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla 
> wrote:
>
> > Replication is fine, I am going to use it, but I wanted it for instances
> > *distributed* across several (physical) machines - but here I have one
> > physical machine, it has many cores. I want to run 2 instances of solr
> > because I think it has these benefits:
> >
> > 1) I can give less RAM to the writer (4GB), and use more RAM for the
> > searcher (28GB)
> > 2) I can deactivate warming for the writer and keep it for the searcher
> > (this considerably speeds up indexing - each time we commit, the server
> is
> > rebuilding a citation network of 80M edges)
> > 3) saving disk space and better OS caching (OS should be able to use more
> > RAM for the caching, which should result in faster operations - the two
> > processes are accessing the same index)
> >
> > Maybe I should just forget it and go with the replication, but it doesn't
> > 'feel right' IFF it is on the same physical machine. And Lucene
> > specifically has a method for discovering changes and re-opening the
> index
> > (DirectoryReader.openIfChanged)
> >
> > Am I not seeing something?
> >
> > roman
> >
> >
> >
> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
> > jhell...@innoventsolutions.com> wrote:
> >
> > > Roman,
> > >
> > > Could you be more specific as to why replication doesn't meet your
> > > requirements?  It was geared explicitly for this purpose, including the
> > > automatic discovery of changes to the data on the index master.
> > >
> > > Jason
> > >
> > > On Jun 4, 2013, at 1:50 PM, Roman Chyla  wrote:
> > >
> > > > OK, so I have verified the two instances can run alongside, sharing
> the
> > > > same datadir
> > > >
> > > > All update handlers are unaccessible in the read-only master
> > > >
> > > >  > > > enable="${solr.can.write:true}">
> > > >
> > > > java -Dsolr.can.write=false .
> > > >
> > > > And I can reload the index manually:
> > > >
> > > > curl "
> > > >
> > >
> >
> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> > > > "
> > > >
> > > > But this is not an ideal solution; I'd like for the read-only server
> to
> > > > discover index changes on its own. Any pointers?
> > > >
> > > > Thanks,
> > > >
> > > >  roman
> > > >
> > > >
> > > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla 
> > > wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> I need your expert advice. I am thinking about running two instances
> > of
> > > >> solr that share the same datadirectory. The *reason* being: indexing
> > > >> instance is constantly building cache after every commit (we have a
> > big
> > > >> cache) and this slows it down. But indexing doesn't need much RAM,
> > only
> > > the
> > > >> search does (and server has lots of CPUs)
> > > >>
> > > >> So, it is like having two solr instances
> > > >>
> > > >> 1. solr-indexing-master
> > > >> 2. solr-read-only-master
> > > >>
> > > >> In the solrconfig.xml I can disable update components, It should be
> > > fine.
> > > >> However, I don't know how to 'trigger' index re-opening on (2) after
> > the
> > > >> commit happens on (1).
> > > >>
> > > >> Ideally, the second instance could monitor the disk and re-open disk
> > > after
> > > >> new files appear there. Do I have to implement cus

Re: Phrase matching with set union as opposed to set intersection on query terms

On Wed, Jun 5, 2013 at 6:10 PM, Shawn Heisey  wrote:
> On 6/5/2013 9:03 AM, Dotan Cohen wrote:
>> How would one write a query which should perform set union on the
>> search terms (term1 OR term2 OR term3), and yet also perform phrase
>> matching if both terms are found? I tried a few variants of the
>> following, but in every case I am getting set intersection on the
>> search terms:
>>
>> select?q={!q.op=OR}text:"term1 term2"~10
>
> A phrase search by definition will require all terms to be present.
> Even though it is multiple terms, conceptually it is treated as a single
> term.
>
> It sounds like what you are after is what edismax can do.  If you define
> the pf field in addition to the qf field, Solr will do something pretty
> amazing - it will automatically construct a phrase query from a
> non-phrase query and search with it against multiple fields.  Done
> correctly, this means that an exact match will be listed first in the
> results.
>
> http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29
>
> Thanks,
> Shawn
>

Thank you Shawn, this pretty much does what I need it to do:
select?defType=edismax&q={!q.op=OR}search_field:term1 term2&pf=search_field

I'm reviewing the Edismax page now. Is there any other documentation
that I should review? I have found the Edismax page at the wonderful
lucidworks site, but if there are any other documentation that I
should review to squeeze the most out of Edismax thenI would love to
know about it.
http://docs.lucidworks.com/display/solr/The+Extended+DisMax+Query+Parser

Thank you very much!


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: problem with zkcli.sh linkconfig

2013-06-05 Thread Mark Miller

Sounds like a bug - we probably don't have a test that updates a link - if you 
can make a JIRA issue, I'll be happy to look into it soon.

- Mark

On Jun 4, 2013, at 8:16 AM, Shawn Heisey  wrote:

> I've got Solr 4.2.1 running SolrCloud.  I need to change the config set
> associated with a collection.  I'm having a problem with this.  Here's
> the command that I'm running, domain name redacted:
> 
> /opt/mbsolr4/cloud-scripts/zkcli.sh -cp
> "/opt/mbsolr4/lib/ext/slf4j-api-1.7.2.jar:/opt/mbsolr4/lib/ext/slf4j-log4j12-1.7.2.jar"
> -z
> "mbzoo1.REDACTED.com:2181,mbzoo2.REDACTED.com:2181,mbzoo3.REDACTED.com:2181/mbsolr1"
> -collection twotest -confname mbtestcfg -cmd linkconfig
> 
> Here's part of the resulting log:
> 
> Jun 04, 2013 9:08:44 AM org.apache.solr.cloud.ZkController linkConfSet
> INFO: Load collection config from:/collections/p
> Jun 04, 2013 9:08:44 AM org.apache.solr.common.cloud.SolrZkClient makePath
> INFO: makePath: /collections/p
> 
> It partially creates a new collection named "p", which is not referenced
> on my commandline.  This partial collection IS linked to the config set
> that I referenced.  The same thing happens if I use -c and -n instead of
> -collection and -confname.
> 
> Am I doing something wrong, or is this a bug?  Will I need to recreate
> the collection as a workaround?
> 
> Thanks,
> Shawn
>

No files added to classloader from lib

2013-06-05 Thread O. Olson

Hi,

I downloaded Solr 4.3 and I am attempting to run and configure a
separate
Solr instance under Jetty. I copied the Solr "dist" directory contents to a
directory called "solrDist" under the single core "db" that I was running. I
then attempted to get the DataImportHandler using the following in my
solrconfig.xml:

In the log file, I see a lot of messages that the Jar Files in "solrDist"
were added to the classloader. E.g.

…….
534 [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader - Adding
'file:/C:/Users/<>/Documents/Jetty/Jetty9/solr/db/lib/solr-clustering-4.3.0.jar'
to classloader
534 [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader - Adding
'file:/C:/Users/<>/Documents/Jetty/Jetty9/solr/db/lib/solr-core-4.3.0.jar'
to classloader
535 [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader - Adding
'file:/C:/Users/<>/Documents/Jetty/Jetty9/solr/db/lib/solr-dataimporthandler-4.3.0.jar'
to classloader
535 [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader - Adding
'file:/C:/Users/<>/Documents/Jetty/Jetty9/solr/db/lib/solr-dataimporthandler-extras-4.3.0.jar'
to classloader
535 [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader - Adding
'file:/C:/Users/<>/Documents/Jetty/Jetty9/solr/db/lib/solr-langid-4.3.0.jar'
to classloader
535 [coreLoadExecutor-3-thread-1] INFO
org.apache.solr.core.SolrResourceLoader - Adding
'file:/C:/Users/<>/Documents/Jetty/Jetty9/solr/db/lib/solr-solrj-4.3.0.jar'
to classloader

However in the end I get the following Warning:

570 [coreLoadExecutor-3-thread-1] WARN
org.apache.solr.core.SolrResourceLoader - No files added to classloader
from lib: solrDist/ (resolved as:
C:\Users\<>\Documents\Jetty\Jetty9\solr\db\solrDist).

Why is this? I thought the Jar Files were added to the classloader, but the
last messages seems to say that none were added. I know that this is a
warning, but I am just curious. I’d be grateful to anyone who has an idea
regarding this.

Thank you,
O. O.

--
View this message in context:
http://lucene.472066.n3.nabble.com/No-files-added-to-classloader-from-lib-tp4068374.html
Sent from the Solr - User mailing list archive at Nabble.com.

Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-05 Thread Prathik Puthran

Hi,

Is it possible to configure solr to suggest the indexed string for all the
searches of the substring of the string?

Thanks,
Prathik

Re: Solr 4.2.1 higher memory footprint vs Solr 3.5

Shawn:

You're right, I thought I'd seen it as a  option but I think I
was confusing really old solr.

Thanks for catching, having gotten it wrong once I'm sure I'll
remember it better for next time!

Erick

On Tue, Jun 4, 2013 at 1:57 PM, SandeepM  wrote:
> Thanks Eric and Shawn,
>
> Your explanations help understand where SOLR may be spending its time.
> Sounds like compression can be a CPU and heap hog. (I'll try to confirm this
> with the heapdumps)
>
> Initially we tried to keep the JVM heap sizes the same on both Solr 3.5 and
> 4.2.1, which was around 3GB ,which 3.5 handled well even with a 200QPS load.
> Moving to 4.2.1 with the same heap size instantly killed the Server.
> Changing the JVM to 6GB (double) did not help either.  We were seeing higher
> CPU and higher heap usage.
>
> We later changed cache settings so as to reduce their sizes, increased the
> JVM to 8GB and we see an improvement.  But over time, we do see that the
> Heap utilization slowly climbs as the 200QPS test is allowed to run, and
> sometimes leads to max heap being exceeded from the JConsole.  So we see the
> jagged edge waveform which keeps climbing (GC cycles don't completely
> collect memory over time).  Our test has a short capture from real traffic
> and we are replaying that via solrmeter.
>
> Thanks.
> Regards,
> -- Sandeep
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879p4068150.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Solr 4.3 with Internationalization.

2013-06-05 Thread bsargurunathan

Guys,

I am going to use the Solr4.3 to my Shopping cart project.
So I need to support my website with two languages(English and French).

So I want some guide for implement the internationalization with the
Slor4.3.
Please guide with some sample configuration to support the French language
with Solr4.3.

Thanks in advance.

Guru.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-3-with-Internationalization-tp4068368.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase matching with set union as opposed to set intersection on query terms


term1 OR term2 OR "term1 term2"^2

term1 OR term2 OR "term1 term2"~10^2

The latter would rank documents with the terms nearby higher, and the 
adjacent terms highest.


term1 OR term2 OR "term1 term2"~10^2 OR "term1 term2"^20 OR "term2 term1"^20

To further boost adjacent terms.

But the edismax pf/pf2/pf3 options might be good enough for you.

-- Jack Krupansky

-Original Message- 
From: Shawn Heisey

Sent: Wednesday, June 05, 2013 11:10 AM
To: solr-user@lucene.apache.org
Subject: Re: Phrase matching with set union as opposed to set intersection 
on query terms


On 6/5/2013 9:03 AM, Dotan Cohen wrote:

How would one write a query which should perform set union on the
search terms (term1 OR term2 OR term3), and yet also perform phrase
matching if both terms are found? I tried a few variants of the
following, but in every case I am getting set intersection on the
search terms:

select?q={!q.op=OR}text:"term1 term2"~10


A phrase search by definition will require all terms to be present.
Even though it is multiple terms, conceptually it is treated as a single
term.

It sounds like what you are after is what edismax can do.  If you define
the pf field in addition to the qf field, Solr will do something pretty
amazing - it will automatically construct a phrase query from a
non-phrase query and search with it against multiple fields.  Done
correctly, this means that an exact match will be listed first in the
results.

http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29

Thanks,
Shawn

Re: copyField generates "multiple values encountered for non multiValued field"

Try describing your own symptom in your own words - because his issue 
related to Solr 1.4. I mean, where exactly are you setting 
"allowDuplicates=false"?? And why do you think it has anything to do with 
adding documents to Solr? Solr 1.4 did not have atomic update, so sending 
the exact same document twice would not result in a change in the index 
(unless you had a date field with a value of "NOW".) Copy field only uses 
values from the current document.


-- Jack Krupansky

-Original Message- 
From: Robert Krüger

Sent: Wednesday, June 05, 2013 10:37 AM
To: solr-user@lucene.apache.org
Subject: copyField generates "multiple values encountered for non 
multiValued field"


I have the exact same problem as the guy here:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E

AFAICS he did not get an answer. Is this a known issue? What can I do
other than doing what copyField should do in my application?

I am using solr 4.0.0.

Thanks,

Robert

Re: SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-05 Thread Chris Atkinson

Everything is working great now.
Thanks David


On Wed, Jun 5, 2013 at 12:07 AM, David Smiley (@MITRE.org) <
dsmi...@mitre.org> wrote:

> maxDistErr should be like 0.3 based on earlier parts of this discussion
> since
> your data is to one of a couple hours of the day, not whole days.  If it
> was
> whole days, you would use 1.  Changing this requires a re-index.  So does
> changing worldBounds if you do so.
> distErrPct should be 0.  Changing it does not require a re-index because
> you
> are indexing points, not other shapes.  This only affects other shapes.
>
> Speaking of that slight buffer to the query shape I said in my last email,
> it should be < half of maxDistErr, whatever you set that to.  So use like
> 0.1.
>
> ~ David
>
>
> Chris Atkinson wrote
> > Hi David,
> > Thanks for your continued help.
> >
> > I think that you have nailed it on the head for me. I'm 100% sure that I
> > had previously tried that query without success. I'm not sure if perhaps
> I
> > had wrong  distErrPct or  maxDistErr values...
> > It's getting late, so I'm going to call it a night (I'm on GMT), but I'll
> > put your example into practice tomorrow and get confirmation that it's
> > working as expected.
> >
> > I'll keep playing around with the distErrPct values as well.
> > Do I need to do a reindex if I change these values? (I think yes?)
> >
> >
> > On Tue, Jun 4, 2013 at 10:44 PM, Smiley, David W. <
>
> > dsmiley@
>
> > > wrote:
> >
> >> So "availability" is the absence of any other document's indexed time
> >> duration overlapping with your availability query duration.  So I think
> >> you should negate an overlaps query.  The overlaps query looks like:
> >> Intersects(-Inf start end Inf).  And remember the slight buffering
> needed
> >> as described on the wiki.  You'd add a small fraction to the start time
> >> and subtract a small fraction from the end time, so that you don't
> >> accidentally match a document that is adjacent.
> >>
> >> -availability_spatial:"Intersects( 0 30.5 114.5 3650 )"
> >>
> >> Does that work against your data?  If it doesn't, can you conjecture why
> >> it doesn't work based on a sample point in a document that it matched,
> or
> >> a document that should have matched but didn't?
> >>
> >> ~ David
> >>
> >> On 6/4/13 3:31 PM, "Chris Atkinson" <
>
> > chrisacky@
>
> > > wrote:
> >>
> >> >Here is an example I have tried.
> >> >
> >> >So let's assume that I want to checkIn on the 30th day, and leave on
> the
> >> >115th day.
> >> >
> >> >My query would be:
> >> >
> >> >-availability_spatial:"Intersects(   30 0  3650 115 )"
> >> >
> >> >However, that wouldn't match anything. Here is an example document
> below
> >> >so
> >> >you can see. (I've not negated the spatial field in the filter query so
> >> >you
> >> >can see the field coordinates)
> >> >
> >> >In case the formatting is bad: See here
> >> >
> >> >http://pastie.org/pastes/8006249/text
> >> >
> >> >
> >> >
> >> >
> > 
> >
> >  >>
> >  >name="responseHeader"
> >> >>
> > 
> > 0
> > 
> >
> > 
> > 1
> > 
> >
> >  >>
> >  >>name="params"> <
> >> >str name="fl">availability_spatial
> > 
> >
> > 
> > true
> > 
> >> >
> >  >>
> >  >name="q">id:38197
> > 
> >
> > 
> > 1370374172298
> > 
> >
> > 
> >> >xml
> > 
> >
> > 
> > availability_spatial:"Intersects( 30 0 3650 115
> >> >)"
> >> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >> >
> > 
> >  <
> >> >arr name="availability_spatial">
> > 
> > 147.6 163.4
> > 
> >
> > 
> > 164.6 178.4
> >  >>
> >  >str>
> > 
> > 192.6 220.4
> > 
> >
> > 
> > 241.6 264.4
> > 
> >
> > 
> > 
> >
> > 
> >> >
> >  >>
> >  >response>
> >> >
> >> >
> >> >On Tue, Jun 4, 2013 at 8:14 PM, Chris Atkinson <
>
> > chrisacky@
>
> > >
> >> >wrote:
> >> >
> >> >> Thanks David.
> >> >> Query times are really quick and my index is only 20Mb now which is
> >> >>about
> >> >> what I would expect.
> >> >> I'm having some problems figuring out what type of query I want to
> >> find
> >> >> *Available* properties with this new points system.
> >> >>
> >> >>
> >> >> I'm storing bookings against each document. So I have X Y
> coordinates,
> >> >> where X will be  the check in of a previous booking, and Y will be
> the
> >> >> departure.
> >> >>
> >> >> So for example illustrative purposes, a weeks booking from 10th
> >> January
> >> >>to
> >> >> the 17th, would be X Y => 10 17
> >> >>
> >> >>
> > 
> > 10 17
> > 
> >> >>
> > 
> > 22 27
> > 
> >> >>
> >> >> I might have several bookings.
> >> >>
> >> >> Now, I want to find available properties with my search, but I'm just
> >> >>not
> >> >> sure on the ordering of the end/start in the polygon Intersect.
> >> >>
> >> >> I've looked at this document very carefully and tried to draw it all
> >> out
> >> >> on paper.
> >> >>
> >> >>
> >> >>
> >>
> https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-2013011
> >> >>7/
> >> >>
> >> >> Here are the suggestions:
> >> >>
> >> >> q=fieldX:"Intersects(-ƒ end start ƒ)"
> >> >> q=fieldX:"Intersects(-ƒ start end ƒ)"
> >> >> q=fieldX:"Intersects(s

Re: Phrase matching with set union as opposed to set intersection on query terms

On 6/5/2013 9:03 AM, Dotan Cohen wrote:
> How would one write a query which should perform set union on the
> search terms (term1 OR term2 OR term3), and yet also perform phrase
> matching if both terms are found? I tried a few variants of the
> following, but in every case I am getting set intersection on the
> search terms:
> 
> select?q={!q.op=OR}text:"term1 term2"~10

A phrase search by definition will require all terms to be present.
Even though it is multiple terms, conceptually it is treated as a single
term.

It sounds like what you are after is what edismax can do.  If you define
the pf field in addition to the qf field, Solr will do something pretty
amazing - it will automatically construct a phrase query from a
non-phrase query and search with it against multiple fields.  Done
correctly, this means that an exact match will be listed first in the
results.

http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29

Thanks,
Shawn

Phrase matching with set union as opposed to set intersection on query terms

How would one write a query which should perform set union on the
search terms (term1 OR term2 OR term3), and yet also perform phrase
matching if both terms are found? I tried a few variants of the
following, but in every case I am getting set intersection on the
search terms:

select?q={!q.op=OR}text:"term1 term2"~10

Thus, if term1 matches 10 documents and term2 matches 20 documents,
then SET UNION would include all of the documents that have either
term1 and/or term2. That means that between 20-30 results should be
returned. Conversely, SET INTERSECTION would return only results with
_both_ term1 _and_ term2, which could be between 0-10 documents.

Note that in the application, users will be searching for any
arbitrary number of terms, in fact they will be entering phrases. I
can limit these phrases to 140 characters if needed.

Thank you in advance!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Solr - ORM like layer

2013-06-05 Thread Tuğcem Oral

Sorry for opening a new thread. As i sent first message w/o subscribing the
mailing list, i couldn't find a possible solution to reply original thread.
The messaging stream is attached below.

Actually the requirement came up from such a scenario: We collect some xml
documents from some external resources and need to parse those xml docs and
index some part of them. But those xml docs have different roots and
attributes in. So we generate all possible classes for each root type via
JAXB. As each document have different informative values, each of them
should be indexed into seperate solr instances. The module we wrote simply
generates a solr schema template with respect to all aggregative objects in
root object (recursively) except annotation @SolrIndexIgnore owners. And
also we are able to generate a SolrDocument from given object and index it
to specified solr instance. While retrieving results from solr, we generate
a list of this object's, from SolrDocument instances. Hibernate
configuration for Lucene indexing is a bit different i thought, as we are
able to generate solr-schema from given object.

Best.

-Original Message- From: Tuğcem Oral
Sent: Tuesday, June 04, 2013 8:57 AM
To: solr-user@lucene.apache.org
Subject: Solr - ORM like layer

Hi folks,

I wonder that there exist and ORM like layer for solr such that it
generates the solr schema from given complex object type and index given
list of corresponding objects. I wrote a simple module for that need in one
of my projects and happyly ready to generalize it and contribute to solr,
if there's not such a module exists or in progress.

Thanks all.

--
TO

Solr doesn't support complex objects directly - you must flatten and
otherwise denormalize them. If you do want to store something like a graph
in Solr, make each node a separate document (and try to avoid the
temptation to play games with dynamic and multivalued fields).

But if you have a tool to automatically flatten and denormalize complex
objects and graphs and database joins, great. Please describe what it
actually does in a little more (but not excessive) detail.

-- Jack Krupansky

-Original Message- From: Tuğcem Oral
Sent: Tuesday, June 04, 2013 8:57 AM
To: solr-user@lucene.apache.org
Subject: Solr - ORM like layer

Hi folks,

I wonder that there exist and ORM like layer for solr such that it
generates the solr schema from given complex object type and index given
list of corresponding objects. I wrote a simple module for that need in one
of my projects and happyly ready to generalize it and contribute to solr,
if there's not such a module exists or in progress.

Thanks all.

-- 
TO

If by ORM you mean Object Relational Mapping, Hibernate has annotations for
Lucene and if my memory doesn't betray me I think you can configure a Solr
server at Hibernate config.

I have successfully mapped POJO's to Lucene and done text search, it all
happens like magic once your annotations and configuration is right.

Hope that helps,

Guido.

On 04/06/13 13:57, Tuğcem Oral wrote:

> Hi folks,
>
> I wonder that there exist and ORM like layer for solr such that it
> generates the solr schema from given complex object type and index given
> list of corresponding objects. I wrote a simple module for that need in one
> of my projects and happyly ready to generalize it and contribute to solr,
> if there's not such a module exists or in progress.
>
> Thanks all.

-- 
TO

Re: Setting up Solr

> We have a number of Jira issues that specifically deal with something
> called "Developer Curb Appeal."  I think it's pretty clear that we need
> to tackle a bunch of things we could call "Newcomer Curb Appeal."  I can
> work on filing some issues, some of which will address code, some of
> which will address the docs included with Solr and the wiki pages
> referenced there.

I have filed the master issue.  I will file some linked issues over the
next few days.  All ideas and patches welcome.

https://issues.apache.org/jira/browse/SOLR-4901

The wiki is our primary documentation.  Updates are appreciated.  In
order to edit the wiki, you must create an account and ask on this
mailing list for it to be added to the contributors group.

Thanks,
Shawn

Re: copyField generates "multiple values encountered for non multiValued field"

2013-06-05 Thread Alexandre Rafalovitch

I think the suggestion I have seen is that copyField should be
index-only and - therefore - will not be returned. It is primarily
there to make searching easier by aggregating fields or to provide
alternative analyzer pipeline.

Can you make your copyField destination not stored?

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Jun 5, 2013 at 10:37 AM, Robert Krüger  wrote:
> I have the exact same problem as the guy here:
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E
>
> AFAICS he did not get an answer. Is this a known issue? What can I do
> other than doing what copyField should do in my application?
>
> I am using solr 4.0.0.
>
> Thanks,
>
> Robert

data-import problem


Hello Solr-Friends,

I have a problem with my current solr configuration. I want to import 
two tables into solr. I got it to work for the first table, but the 
second table doesn't get imported (no errormessage, 0 rows skipped).
I have two tables called name and title and i want to load their fields 
called id, name and id title (two id colums that have nothing to do with 
each other)


This is in my data-config.xml:









and this is in my schema.xml:










id




I chose that unique key only because solr asked for it.
In my SolrAdmin Scheme-Browser I can see three fields id, name and 
title, but titleid is missing and title itself is empy with no entries. 
I don't know how to get it work to index two seperate lists.


I hope someone can help, thank you!

PS: I am sorry if this mail reached you twice. I sent it the first time 
when I was not registered yet and don't know if the mail was received. 
Sending now again after registration to mailing list.

copyField generates "multiple values encountered for non multiValued field"

2013-06-05 Thread Robert Krüger

I have the exact same problem as the guy here:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E

AFAICS he did not get an answer. Is this a known issue? What can I do
other than doing what copyField should do in my application?

I am using solr 4.0.0.

Thanks,

Robert

Re: Setting up Solr

2013-06-05 Thread Yago Riveiro

If we see the UI of other cloud base softwares like couchbase or riak, they are 
more intuitive than solr's UI. Of course the UI is brand new and need a lot of 
improvements. Per example the possibility of select a existing config from 
zookeeper when you are using the wizard to create a collection. Even more, a 
section to upload a config from de UI without use the cryptical zkClient script.

Regards,

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, June 5, 2013 at 3:21 PM, Alexandre Rafalovitch wrote:

> On Wed, Jun 5, 2013 at 1:48 AM, Aaron Greenspan
> mailto:aar...@thinkcomputer.com)> wrote:
> > I say this not because I enjoy starting flame wars or because I have the 
> > time to participate in them--I don't. I realize that there's a long history 
> > to Solr and I am the new kid who doesn't get it. Nonetheless, that doesn't 
> > change the way it works, and many users will be just like me. So just know 
> > that I'd just like to see Solr improve--frankly, I need it to--and if these 
> > issues were not already glaringly obvious, they should be now.
> 
> 
> This!
> 
> Seriously, I think this feedback is valuable and I have recently gone
> through a similar experience. This is why I have written a book
> specifically targeting people who basically got their first (example)
> collection running and are now stuck on how to get the second (first
> 'real one') do what they want. The book is available for pre-orders
> at: http://www.packtpub.com/apache-solr-for-indexing-data/book (out in
> a couple more days) and a bunch of sample configurations that go with
> it are at: https://github.com/arafalov/solr-indexing-book
> 
> On specific points, I do agree that we need to make Admin WebUI to
> have the first/only core pre-selected. If nobody has created a JIRA
> for this yet, I will. And, I think, perhaps we need absolutely minimal
> solr configuration shipping in Solr distribution. With a single '*'
> field and so on.
> 
> Regards,
> Alex.
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
> book)
> 
>

Re: Indexing Heavy dataset

2013-06-05 Thread Raheel Hasan

some values in the field are up to a 1M as well


On Wed, Jun 5, 2013 at 7:27 PM, Raheel Hasan wrote:

> ok thanks for the reply The field having values like 60kb each
>
> Furthermore, I have realized that the issue is with MySQL as its not
> processing this table when a "where" is applied
>
> Secondly, I have turned this field to "*stored=false*" and now the "*
> select/*" is fast working again
>
>
>
> On Wed, Jun 5, 2013 at 6:56 PM, Shawn Heisey  wrote:
>
>> On 6/5/2013 3:08 AM, Raheel Hasan wrote:
>> > Hi,
>> >
>> > I am trying to index a heavy dataset with 1 particular field really too
>> > heavy...
>> >
>> > However, As I start, I get Memory warning and rollback
>> (OutOfMemoryError).
>> > So, I have learned that we can use -Xmx1024m option with java command to
>> > start the solr and allocate more memory to the heap.
>> >
>> > My question is, that since this could also become insufficient later,
>> so it
>> > the issue related to cacheing?
>> >
>> > here is my cache block in solrconfig:
>> >
>> > > >  size="512"
>> >  initialSize="512"
>> >  autowarmCount="0"/>
>> >
>> > > >  size="512"
>> >  initialSize="512"
>> >  autowarmCount="0"/>
>> >
>> > > >size="512"
>> >initialSize="512"
>> >autowarmCount="0"/>
>> >
>> > I am thinking like maybe I need to turn of the cache for
>> "documentClass".
>> > Anyone got a better idea? Or perhaps there is another issue here?
>>
>> Exactly how big is this field?  Do you need this giant field returned
>> with your results, or is it just there for searching?
>>
>> Caches of size 512, especially with autowarm disabled, are probably not
>> a major cause for concern, unless the big field is big enough so that
>> 512 of them is really really huge.  If that's the case, I would reduce
>> the size of your documentCache, not turn it off.
>>
>> The value of ramBufferSizeMB elsewhere in your config is more likely to
>> affect how much RAM gets used during indexing.  The default for this
>> field as of Solr 4.1.0 is 100.  Most people can reduce this value.
>>
>> I'm writing a reply to another thread where you are participating, with
>> info that will likely be useful for you too.  Look for that.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Regards,
> Raheel Hasan
>



-- 
Regards,
Raheel Hasan

Re: Indexing Heavy dataset

2013-06-05 Thread Raheel Hasan

ok thanks for the reply The field having values like 60kb each

Furthermore, I have realized that the issue is with MySQL as its not
processing this table when a "where" is applied

Secondly, I have turned this field to "*stored=false*" and now the "*select/
*" is fast working again



On Wed, Jun 5, 2013 at 6:56 PM, Shawn Heisey  wrote:

> On 6/5/2013 3:08 AM, Raheel Hasan wrote:
> > Hi,
> >
> > I am trying to index a heavy dataset with 1 particular field really too
> > heavy...
> >
> > However, As I start, I get Memory warning and rollback
> (OutOfMemoryError).
> > So, I have learned that we can use -Xmx1024m option with java command to
> > start the solr and allocate more memory to the heap.
> >
> > My question is, that since this could also become insufficient later, so
> it
> > the issue related to cacheing?
> >
> > here is my cache block in solrconfig:
> >
> >  >  size="512"
> >  initialSize="512"
> >  autowarmCount="0"/>
> >
> >  >  size="512"
> >  initialSize="512"
> >  autowarmCount="0"/>
> >
> >  >size="512"
> >initialSize="512"
> >autowarmCount="0"/>
> >
> > I am thinking like maybe I need to turn of the cache for "documentClass".
> > Anyone got a better idea? Or perhaps there is another issue here?
>
> Exactly how big is this field?  Do you need this giant field returned
> with your results, or is it just there for searching?
>
> Caches of size 512, especially with autowarm disabled, are probably not
> a major cause for concern, unless the big field is big enough so that
> 512 of them is really really huge.  If that's the case, I would reduce
> the size of your documentCache, not turn it off.
>
> The value of ramBufferSizeMB elsewhere in your config is more likely to
> affect how much RAM gets used during indexing.  The default for this
> field as of Solr 4.1.0 is 100.  Most people can reduce this value.
>
> I'm writing a reply to another thread where you are participating, with
> info that will likely be useful for you too.  Look for that.
>
> Thanks,
> Shawn
>
>


-- 
Regards,
Raheel Hasan

Re: Setting up Solr

2013-06-05 Thread Alexandre Rafalovitch

On Wed, Jun 5, 2013 at 1:48 AM, Aaron Greenspan
 wrote:
> I say this not because I enjoy starting flame wars or because I have the time 
> to participate in them--I don't. I realize that there's a long history to 
> Solr and I am the new kid who doesn't get it. Nonetheless, that doesn't 
> change the way it works, and many users will be just like me. So just know 
> that I'd just like to see Solr improve--frankly, I need it to--and if these 
> issues were not already glaringly obvious, they should be now.

This!

Seriously, I think this feedback is valuable and I have recently gone
through a similar experience. This is why I have written a book
specifically targeting people who basically got their first (example)
collection running and are now stuck on how to get the second (first
'real one') do what they want. The book is available for pre-orders
at: http://www.packtpub.com/apache-solr-for-indexing-data/book (out in
a couple more days) and a bunch of sample configurations that go with
it are at: https://github.com/arafalov/solr-indexing-book

On specific points, I do agree that we need to make Admin WebUI to
have the first/only core pre-selected. If nobody has created a JIRA
for this yet, I will. And, I think, perhaps we need absolutely minimal
solr configuration shipping in Solr distribution. With a single '*'
field and so on.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

Re: Heap space problem with mlt query

On 6/5/2013 3:07 AM, Varsha Rani wrote:
> Hi ,
> 
> I am having solr index of 80GB  with 1 million documents .Each document of
> aprx. 500KB . I have a machine with 16GB ram.
> 
> I am running mlt query on 3-5 fields of theses document .
> 
> I am getting solr out of memory problem .

This wiki page has relevant info for your situation.  As you are reading
it, it might not seem relevant, but I'll try to point things out.

http://wiki.apache.org/solr/SolrPerformanceProblems

The memory that is getting exhausted here is heap memory.  You probably
need a larger java heap.  The settings that your other replies have
talked about do affect how much heap gets used, but they do not increase
it.  That is a java commandline option that must be applied to the
command that starts the servlet container which runs Solr.

For 500KB documents, you probably want a ramBufferSizeMB of 64-128.  You
probably want to greatly reduce the size of your documentCache, and
possibly the other caches as well.  Your autowarm counts are very high -
you'll want to reduce those so that your cache warming time is low when
you commit and open a new searcher.

With an index size of 80GB, you'll probably need a heap size of 8GB.
Depending on how you use Solr, you might need more.  If you read the
wiki page carefully, you'll also realize that in addition to this heap
memory, you need additional memory to cache your index - between 40 and
80GB of additional memory.  The absolute minimum server size you want
here is 48GB, and 128GB would be *much* better.  Reducing your index
size might be a critical step.  Do you need to store all fields?  Most
people don't need all the fields in order to display the top N search
results.  When showing a detail page to the user, most people can get
the bulk of their data from another data store by using an ID value
retrieved from Solr.

The performance problems that come from your disk cache being too small
can carry over into OutOfMemory exceptions that you wouldn't otherwise
get, because it makes indexing and queries take too long.  When they
take too long, you can end up doing too many of them at the same time,
chewing up additional memory.

Thanks,
Shawn

Re: Indexing Heavy dataset

On 6/5/2013 3:08 AM, Raheel Hasan wrote:
> Hi,
> 
> I am trying to index a heavy dataset with 1 particular field really too
> heavy...
> 
> However, As I start, I get Memory warning and rollback (OutOfMemoryError).
> So, I have learned that we can use -Xmx1024m option with java command to
> start the solr and allocate more memory to the heap.
> 
> My question is, that since this could also become insufficient later, so it
> the issue related to cacheing?
> 
> here is my cache block in solrconfig:
> 
>   size="512"
>  initialSize="512"
>  autowarmCount="0"/>
> 
>   size="512"
>  initialSize="512"
>  autowarmCount="0"/>
> 
> size="512"
>initialSize="512"
>autowarmCount="0"/>
> 
> I am thinking like maybe I need to turn of the cache for "documentClass".
> Anyone got a better idea? Or perhaps there is another issue here?

Exactly how big is this field?  Do you need this giant field returned
with your results, or is it just there for searching?

Caches of size 512, especially with autowarm disabled, are probably not
a major cause for concern, unless the big field is big enough so that
512 of them is really really huge.  If that's the case, I would reduce
the size of your documentCache, not turn it off.

The value of ramBufferSizeMB elsewhere in your config is more likely to
affect how much RAM gets used during indexing.  The default for this
field as of Solr 4.1.0 is 100.  Most people can reduce this value.

I'm writing a reply to another thread where you are participating, with
info that will likely be useful for you too.  Look for that.

Thanks,
Shawn

Re: different Solr Logging for CONSOLE and FILE

On 6/5/2013 3:46 AM, Raheel Hasan wrote:
> OK thanks... it works... :D
> 
> Also I found that we could put both of them and it will also work:
> log4j.rootLogger=INFO, file
> log4j.rootLogger=WARN, CONSOLE

If this completely separates INFO from WARN and ERROR, then you would
want to rethink and probably use what Bernd suggested.  I don't know if
this is what happens.

It's easier to understand a logfile if you can see errors, warnings, and
informational messages together in context.  If the more severe messages
are only logged to CONSOLE, then you lose them.  Even if you then
redirect the console to a file outside of Solr, you would need to try
and piece the full log together based on timestamps from two files, and
sometimes things happen too fast for that, even if you're logging with
millisecond accuracy.

Thanks,
Shawn

Re: Receiving unexpected Faceting results.

On Wed, Jun 5, 2013 at 3:41 PM, Brendan Grainger
 wrote:
> Hi Dotan,
>
> I think all you need to do is add:
>
> facet.mincount=1
>
> i.e.
>
> select?q=*:*&fq=tags:dotan-*&facet=true&facet.field=tags&
> rows=0&facet.mincount=1
>
> Note that you can do it per field as well:
>
> select?q=*:*&fq=tags:dotan-*&facet=true&facet.field=tags&
> rows=0&f.tags.facet.mincount=1
>
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount
>

Thanks, Brendan. I will review the available Facet Parameters, which I
really should have thought to do before posting as it is already
bookmarked!

Re: Heap space problem with mlt query

2013-06-05 Thread adityab

Did you try reducing filter and query cache. They are fairly large too unless
you really need them to be cached for your use cache.
Do you have that many distinct filter queries hitting solr for the size you
have defined for filterCache?
Are you doing any sorting? as this will chew up a lot of memory because of
lucene's internal field cache




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068326.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Receiving unexpected Faceting results.

2013-06-05 Thread కామేశ్వర రావు భైరవభట్ల

On Wed, Jun 5, 2013 at 3:38 PM, Raymond Wiker  wrote:
> 3) Use the parameter facet.prefix, e.g, facet.prefix=dotan-. Note: this
> particular case will not work if the field you're facetting on is tokenised
> (with "-" being used as a taken separator).
>
> 4) Use the parameter facet.mincount - looks like you want to set it to 1,
> instead of the default which is 0.

Perfect, thank you Raymond!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Search for misspelled words in corpus

Hi,

I have a problem where our text corpus on which we need to do search
contains many misspelled words. Same word could also be misspelled in
several different ways. It could also have documents that have correct
spellings However, the search term that we give in query would always be
correct spelling. Now when we search on a term, we would like to get all
the documents that contain both correct and misspelled forms of the search
term.
We tried fuzzy search, but it doesn't work as per our expectations. It
returns any close match, not specifically misspelled words. For example, if
I'm searching for a word like "fight", I would like to return the documents
that have words like "figth" and "feight", not documents with words like
"sight" and "light".
Is there any suggested approach for doing this?

regards,
Kamesh

Re: Receiving unexpected Faceting results.

2013-06-05 Thread Brendan Grainger

Hi Dotan,

I think all you need to do is add:

facet.mincount=1

i.e.

select?q=*:*&fq=tags:dotan-*&facet=true&facet.field=tags&
rows=0&facet.mincount=1

Note that you can do it per field as well:

select?q=*:*&fq=tags:dotan-*&facet=true&facet.field=tags&
rows=0&f.tags.facet.mincount=1

http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount




On Wed, Jun 5, 2013 at 8:27 AM, Dotan Cohen  wrote:

> Consider the following Solr query:
> select?q=*:*&fq=tags:dotan-*&facet=true&facet.field=tags&rows=0
>
> The 'tags' field is a multivalue field. I would expect the previous
> query to return only tags that begin with the string 'dotan-' such as:
> dotan-home
> dotan-work
> ...but not strings which do not begin with (or even contain) the
> string in question.
>
> However, I am getting these results:
> 
> 14
> 13
> 0
> 0
> 
>
> It _may_ be that the 'beer' and 'beatles' tags were once attached to
> the same documents as are attached the 'dotan-home' and/or
> 'dotan-work'. I've done a bit of experimenting on this Solr install,
> so I cannot be sure. However, considering that they are in fact 0
> results for those two, I would not expect them to show up at all, even
> if they ever were attached to (i.e. once a value in the multiValue
> field) any of the results that match the filter query.
>
> So, the questions are:
> 1) How can I check if ever the multiValue fields for a particular
> document (given its uniqueKey id) ever contains a specific value.
> Alternatively, how can I see all the values that the document ever had
> for the field. I don't expect this to actually be possible, but I ask
> if it is, i.e. by examining certain aspects of the Solr index with a
> text editor.
>
> 2) If those spurious results are appearing does that mean necessarily
> that those values for the multivalued field were in fact once in the
> multivalued field for documents matching the filter query? Thus, the
> answer to the previous question would be to simply run a query for the
> id of the document in question, and facet on the multivalued field
> with a large limit.
>
> 3) How to have Solr return only those faceting values for the field
> that in fact begin with 'dotan-', even if a document has other tags
> such as 'beatles'?
>
> 4) How to have Solr return only those faceting values which are larger
> than 0?
>
> Thank you!
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>



-- 
Brendan Grainger
www.kuripai.com

Re: Receiving unexpected Faceting results.

2013-06-05 Thread Raymond Wiker

3) Use the parameter facet.prefix, e.g, facet.prefix=dotan-. Note: this
particular case will not work if the field you're facetting on is tokenised
(with "-" being used as a taken separator).

4) Use the parameter facet.mincount - looks like you want to set it to 1,
instead of the default which is 0.

Receiving unexpected Faceting results.

Consider the following Solr query:
select?q=*:*&fq=tags:dotan-*&facet=true&facet.field=tags&rows=0

The 'tags' field is a multivalue field. I would expect the previous
query to return only tags that begin with the string 'dotan-' such as:
dotan-home
dotan-work
...but not strings which do not begin with (or even contain) the
string in question.

However, I am getting these results:

14
13
0
0


It _may_ be that the 'beer' and 'beatles' tags were once attached to
the same documents as are attached the 'dotan-home' and/or
'dotan-work'. I've done a bit of experimenting on this Solr install,
so I cannot be sure. However, considering that they are in fact 0
results for those two, I would not expect them to show up at all, even
if they ever were attached to (i.e. once a value in the multiValue
field) any of the results that match the filter query.

So, the questions are:
1) How can I check if ever the multiValue fields for a particular
document (given its uniqueKey id) ever contains a specific value.
Alternatively, how can I see all the values that the document ever had
for the field. I don't expect this to actually be possible, but I ask
if it is, i.e. by examining certain aspects of the Solr index with a
text editor.

2) If those spurious results are appearing does that mean necessarily
that those values for the multivalued field were in fact once in the
multivalued field for documents matching the filter query? Thus, the
answer to the previous question would be to simply run a query for the
id of the document in question, and facet on the multivalued field
with a large limit.

3) How to have Solr return only those faceting values for the field
that in fact begin with 'dotan-', even if a document has other tags
such as 'beatles'?

4) How to have Solr return only those faceting values which are larger than 0?

Thank you!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Files included from the default SolrConfig

1. SolrCell (ExtractingRequestHandler) - extract and index content from rich 
documents, such as PDF, Office docs, HTML (uses Tika)

2. Clustering - for result clustering.
3. Language identification (two update processors) - analyzes text of fields 
to determine language code.


None of those is mandatory, which is why they have separate libs.

-- Jack Krupansky

-Original Message- 
From: Raheel Hasan

Sent: Wednesday, June 05, 2013 5:57 AM
To: solr-user@lucene.apache.org
Subject: Files included from the default SolrConfig

Hi,

I am trying to optimize solr.

The default solrConfig that comes with solr>collection1 has a lot of libs
included I dont really need. Perhaps if someone could help we identifying
the purpose. (I only import from DIH):

Please tell me whats in these:
contrib/extraction/lib
solr-cell-

contrib/clustering/lib
solr-clustering-

contrib/langid/lib/
solr-langid


--
Regards,
Raheel Hasan

Re: Heap space problem with mlt query

2013-06-05 Thread Varsha Rani

Hi yriveiro, 


When i was using document cache size=" 131072", i got exception in 5000-6000
mlt queries.

But once i done document cache size="16384", i got same problem in 1500-2000
mlt queries.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068313.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query Elevation Component

2013-06-05 Thread jmlucjav

davers wrote
> I want to elevate certain documents differently depending a a certain fq
> parameter in the request. I've read of somebody coding solr to do this but
> no code was shared. Where would I start looking to implement this feature
> myself?

Davers, 

I am also looking into this feature. Care to tell where did you see this
discussed? I could not find anything. Also, did you manage to implement this
somehow?

thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-Elevation-Component-tp4056856p4068308.html
Sent from the Solr - User mailing list archive at Nabble.com.

data-import problem


Hello Solr-Friends,

I have a problem with my current solr configuration. I want to import 
two tables into solr. I got it to work for the first table, but the 
second table doesn't get imported (no errormessage, 0 rows skipped).
I have two tables called name and title and i want to load their fields 
called id, name and id title (two id colums that have nothing to do with 
each other)


This is in my data-config.xml:









and this is in my schema.xml:










id




I chose that unique key only because solr asked for it.
In my SolrAdmin Scheme-Browser I can see three fields id, name and 
title, but titleid is missing and title itself is empy with no entries. 
I don't know how to get it work to index two seperate lists.


I hope someone can help, thank you!

data-import problem