Re: Solr Deployment Question

2010-05-13 Thread findbestopensource
Please explain how you have handled two indexes in a single VM. Is it multi
core?

To identify memory consumption, You need to calculate usedmemory before and
after loading the indexes, basically calculate usedmemory before and after
any check point you want to analyse. Their difference will give you the
actual memory consumption.

Regards
Aditya
http://www.findbestopensource.com


On Fri, May 14, 2010 at 11:14 AM, Maduranga Kannangara <
mkannang...@infomedia.com.au> wrote:

> But even we used a single index, we were running out of memory.
> What do you mean by "active"? No queries on the masters.
> Only one index is being processed/optimized.
>
> Also, if I may add to my same question, how can I find the
> amount of memory that an index would use, theoretically?
> i.e.: Is there a formulae etc?
>
> Thanks
> Madu
>
>
>
> -Original Message-
> From: findbestopensource [mailto:findbestopensou...@gmail.com]
> Sent: Friday, 14 May 2010 3:34 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Deployment Question
>
> You may use one index at a time, but both indexes are active and loaded all
> its terms in memory. Memory consumption will be certainly more.
>
> Regards
> Aditya
> http://www.findbestopensource.com
>
> On Fri, May 14, 2010 at 10:28 AM, Maduranga Kannangara <
> mkannang...@infomedia.com.au> wrote:
>
> > Hi
> >
> > We use separate JVMs to Index and Query.
> > (Client applications will query only slaves,
> > while master does only indexing)
> >
> > Recently we moved a two master indexes to
> > a single JVM. Our memory allocation was for
> > each index was 512Mb and 1Gb.
> >
> > Once we moved both indexes to a single VM,
> > we thought it would still Index using 1Gb as we
> > use only one index at a time. But for our surprise
> > it needed more than that (1.2Gb) even though
> > only one index was used at a time.
> >
> > Can I know why, or can I know how to find
> > why this is?
> >
> > Solr 1.4
> > Java 1.6.0_20
> >
> > We use a VPS for deployment.
> >
> > Thanks in advance
> > Madu
> >
> >
> >
>


RE: Solr Deployment Question

2010-05-13 Thread Maduranga Kannangara
But even we used a single index, we were running out of memory.
What do you mean by "active"? No queries on the masters.
Only one index is being processed/optimized.

Also, if I may add to my same question, how can I find the 
amount of memory that an index would use, theoretically?
i.e.: Is there a formulae etc?

Thanks 
Madu



-Original Message-
From: findbestopensource [mailto:findbestopensou...@gmail.com] 
Sent: Friday, 14 May 2010 3:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Deployment Question

You may use one index at a time, but both indexes are active and loaded all
its terms in memory. Memory consumption will be certainly more.

Regards
Aditya
http://www.findbestopensource.com

On Fri, May 14, 2010 at 10:28 AM, Maduranga Kannangara <
mkannang...@infomedia.com.au> wrote:

> Hi
>
> We use separate JVMs to Index and Query.
> (Client applications will query only slaves,
> while master does only indexing)
>
> Recently we moved a two master indexes to
> a single JVM. Our memory allocation was for
> each index was 512Mb and 1Gb.
>
> Once we moved both indexes to a single VM,
> we thought it would still Index using 1Gb as we
> use only one index at a time. But for our surprise
> it needed more than that (1.2Gb) even though
> only one index was used at a time.
>
> Can I know why, or can I know how to find
> why this is?
>
> Solr 1.4
> Java 1.6.0_20
>
> We use a VPS for deployment.
>
> Thanks in advance
> Madu
>
>
>


Re: Solr Deployment Question

2010-05-13 Thread findbestopensource
You may use one index at a time, but both indexes are active and loaded all
its terms in memory. Memory consumption will be certainly more.

Regards
Aditya
http://www.findbestopensource.com

On Fri, May 14, 2010 at 10:28 AM, Maduranga Kannangara <
mkannang...@infomedia.com.au> wrote:

> Hi
>
> We use separate JVMs to Index and Query.
> (Client applications will query only slaves,
> while master does only indexing)
>
> Recently we moved a two master indexes to
> a single JVM. Our memory allocation was for
> each index was 512Mb and 1Gb.
>
> Once we moved both indexes to a single VM,
> we thought it would still Index using 1Gb as we
> use only one index at a time. But for our surprise
> it needed more than that (1.2Gb) even though
> only one index was used at a time.
>
> Can I know why, or can I know how to find
> why this is?
>
> Solr 1.4
> Java 1.6.0_20
>
> We use a VPS for deployment.
>
> Thanks in advance
> Madu
>
>
>


Solr Deployment Question

2010-05-13 Thread Maduranga Kannangara
Hi

We use separate JVMs to Index and Query.
(Client applications will query only slaves,
while master does only indexing)

Recently we moved a two master indexes to
a single JVM. Our memory allocation was for
each index was 512Mb and 1Gb.

Once we moved both indexes to a single VM,
we thought it would still Index using 1Gb as we
use only one index at a time. But for our surprise
it needed more than that (1.2Gb) even though
only one index was used at a time.

Can I know why, or can I know how to find
why this is?

Solr 1.4
Java 1.6.0_20

We use a VPS for deployment.

Thanks in advance
Madu




Re: Bitwise Operations on Integer Fields in Lucene and Solr Index

2010-05-13 Thread Israel Ekpo
Correction,

I meant to list

https://issues.apache.org/jira/browse/LUCENE-2460
https://issues.apache.org/jira/browse/SOLR-1913



On Thu, May 13, 2010 at 10:13 PM, Israel Ekpo  wrote:

> I have created two ISSUES as new features
>
> https://issues.apache.org/jira/browse/LUCENE-1560
>
> https://issues.apache.org/jira/browse/SOLR-1913
>
> The first one is for the Lucene Filter.
>
> The second one is for the Solr QParserPlugin
>
> The source code and jar files are attached and the Solr plugin is available
> for use immediately.
>
>
>
>
> On Thu, May 13, 2010 at 6:42 PM, Andrzej Bialecki  wrote:
>
>> On 2010-05-13 23:27, Israel Ekpo wrote:
>> > Hello Lucene and Solr Community
>> >
>> > I have a custom org.apache.lucene.search.Filter that I would like to
>> > contribute to the Lucene and Solr projects.
>> >
>> > So I would need some direction as to how to create and ISSUE or submit a
>> > patch.
>> >
>> > It looks like there have been changes to the way this is done since the
>> > latest merge of the two projects (Lucene and Solr).
>> >
>> > Recently, some Solr users have been looking for a way to perform bitwise
>> > operations between and integer value and some fields in the Index
>> >
>> > So, I wrote a Solr QParser plugin to do this using a custom Lucene
>> Filter.
>> >
>> > This package makes it possible to filter results returned from a query
>> based
>> > on the results of a bitwise operation on an integer field in the
>> documents
>> > returned from the pre-constructed query.
>>
>> Hi,
>>
>> What a coincidence! :) I'm working on something very similar, only the
>> use case that I need to support is slightly different - I want to
>> support a ranked search based on a bitwise overlap of query value and
>> field value. That is, the number of differing bits would reduce the
>> score. This scenario occurs e.g. during near-duplicate detection that
>> uses fuzzy signatures, on document- or sentence levels.
>>
>> I'm going to submit my code early next week, it still needs some
>> polishing. I have two ways to execute this query, neither of which uses
>> filters at the moment:
>>
>> * method 1: during indexing the bits in the fields are turned into
>> on/off terms on the same field, and during search a BooleanQuery is
>> formed from the int value with the same terms. Scoring is courtesy of
>> BooleanScorer. This method supports only a single int value per field.
>>
>> * method 2, incomplete yet - during indexing the bits are turned into
>> terms as before, but this method supports multiple int values per field:
>> terms that correspond to bitmasks on the same value are put at the same
>> positions. Then a specialized Query / Scorer traverses all 32 posting
>> lists in parallel, moving through all matching docs and scoring
>> according to how many terms matched at the same position.
>>
>> I wrapped this in a Solr FieldType, and instead of using a custom
>> QParser plugin I simply implemented FieldType.getFieldQuery().
>>
>> It would be great to work out a convenient user-level API for this
>> feature, both the scoring and the non-scoring case.
>>
>> --
>> Best regards,
>> Andrzej Bialecki <><
>>  ___. ___ ___ ___ _ _   __
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>



-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Best way to handle bitfields in solr...

2010-05-13 Thread Israel Ekpo
William,

This QParserPlugin should solve that problem now.

Check out https://issues.apache.org/jira/browse/SOLR-1913

BitwiseQueryParserPlugin is a org.apache.solr.search.QParserPlugin that
allows users to filter the documents returned from a query by performing
bitwise operations between a particular integer field in the index and the
specified value.

The plugin is available immediately for your use.

On Fri, Dec 4, 2009 at 4:03 PM, Otis Gospodnetic  wrote:

> Would http://wiki.apache.org/solr/FunctionQuery#fieldvalue help?
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> - Original Message 
> > From: William Pierce 
> > To: solr-user@lucene.apache.org
> > Sent: Fri, December 4, 2009 2:43:25 PM
> > Subject: Best way to handle bitfields in solr...
> >
> > Folks:
> >
> > In my db I currently have fields that represent bitmasks.   Thus, for
> example, a
> > value of the mask of 48 might represent an "undergraduate" (value = 16)
> and
> > "graduate" (value = 32).   Currently,  the corresponding field in solr is
> a
> > multi-valued string field called "EdLevel" which will have
> > Undergraduate and Graduate  as its two values (for
> > this example).   I do the conversion from the int into the list of values
> as I
> > do the indexing.
> >
> > Ideally, I'd like solr to have bitwise operations so that I could store
> the int
> > value, and then simply search by using bit operations.  However, given
> that this
> > is not possible,  and that there have been recent threads speaking to
> > performance issues with multi-valued fields,  is there something better I
> could
> > do?
> >
> > TIA,
> >
> > - Bill
>
>


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Bitwise Operations on Integer Fields in Lucene and Solr Index

2010-05-13 Thread Israel Ekpo
I have created two ISSUES as new features

https://issues.apache.org/jira/browse/LUCENE-1560

https://issues.apache.org/jira/browse/SOLR-1913

The first one is for the Lucene Filter.

The second one is for the Solr QParserPlugin

The source code and jar files are attached and the Solr plugin is available
for use immediately.



On Thu, May 13, 2010 at 6:42 PM, Andrzej Bialecki  wrote:

> On 2010-05-13 23:27, Israel Ekpo wrote:
> > Hello Lucene and Solr Community
> >
> > I have a custom org.apache.lucene.search.Filter that I would like to
> > contribute to the Lucene and Solr projects.
> >
> > So I would need some direction as to how to create and ISSUE or submit a
> > patch.
> >
> > It looks like there have been changes to the way this is done since the
> > latest merge of the two projects (Lucene and Solr).
> >
> > Recently, some Solr users have been looking for a way to perform bitwise
> > operations between and integer value and some fields in the Index
> >
> > So, I wrote a Solr QParser plugin to do this using a custom Lucene
> Filter.
> >
> > This package makes it possible to filter results returned from a query
> based
> > on the results of a bitwise operation on an integer field in the
> documents
> > returned from the pre-constructed query.
>
> Hi,
>
> What a coincidence! :) I'm working on something very similar, only the
> use case that I need to support is slightly different - I want to
> support a ranked search based on a bitwise overlap of query value and
> field value. That is, the number of differing bits would reduce the
> score. This scenario occurs e.g. during near-duplicate detection that
> uses fuzzy signatures, on document- or sentence levels.
>
> I'm going to submit my code early next week, it still needs some
> polishing. I have two ways to execute this query, neither of which uses
> filters at the moment:
>
> * method 1: during indexing the bits in the fields are turned into
> on/off terms on the same field, and during search a BooleanQuery is
> formed from the int value with the same terms. Scoring is courtesy of
> BooleanScorer. This method supports only a single int value per field.
>
> * method 2, incomplete yet - during indexing the bits are turned into
> terms as before, but this method supports multiple int values per field:
> terms that correspond to bitmasks on the same value are put at the same
> positions. Then a specialized Query / Scorer traverses all 32 posting
> lists in parallel, moving through all matching docs and scoring
> according to how many terms matched at the same position.
>
> I wrapped this in a Solr FieldType, and instead of using a custom
> QParser plugin I simply implemented FieldType.getFieldQuery().
>
> It would be great to work out a convenient user-level API for this
> feature, both the scoring and the non-scoring case.
>
> --
> Best regards,
> Andrzej Bialecki <><
>  ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Long Lucene queries

2010-05-13 Thread Lance Norskog
No changes are needed. Just experiment with 'curl'.

On Tue, May 11, 2010 at 11:52 PM, Pooja Verlani  wrote:
> Hi,
> Thanks Eric..
> The search parameter length is a lot to be done in GET, I am thinking of
> opting for POST, is it possible to do POST request to solr. Any
> configuration changes or code changes required for the same? I have many
> parameters but only one is supposed to be very lengthy.
>
> Any suggestions?
>
> Regards,
> Pooja
>
> On Fri, May 7, 2010 at 4:39 PM, Erik Hatcher  wrote:
>
>>
>> On May 7, 2010, at 6:56 AM, Pooja Verlani wrote:
>>
>>> In my web-app, i have to fire a query thats too long due to the various
>>> boosts I have to give. The size changes according to the query and many a
>>> times I get a blank page as I probably cross lucene's character limit. Is
>>> it
>>> possible to post it otherwise, to solr. Shall I be using POST instead of a
>>> GET here? Any other better suggestion?
>>>
>>
>> A few options:
>>
>>  * Use POST (except you won't see the params in the log files)
>>
>>  * Tomcat: <
>> http://wiki.apache.org/solr/SolrTomcat#Enabling_Longer_Query_Requests>
>>
>>  * Jetty: 
>>
>> Or, possibly a lot of your query params can be put into solrconfig.xml, and
>> you send over just what changed.  You can do some tricks with param
>> substitution to streamline this stuff in some cases.  Some examples of what
>> you're sending over would help us see where some improvements could be made.
>>
>>        Erik
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: NPE When trying to commit

2010-05-13 Thread Kaktu Chakarabati

Also,
The strange thing is that I still get this exception when i try to swap in a
snapshot I have of the index from a day or two..
are these index commit points saved in some external place or so? Very
strange..
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/NPE-When-trying-to-commit-tp816160p816177.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH settings

2010-05-13 Thread Blargy

Can you please share with me your DIH settings and JDBC driver you are using.

I'll start...

jdbc driver = mysql-connector-java-5.1.12-bin
batchSize = "-1"
readOnly = "true"


Would someone mind explaining what "convertType" and "transactionIsolation"
actually does? The wiki doesnt really explain the purpose of it. Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-settings-tp816166p816166.html
Sent from the Solr - User mailing list archive at Nabble.com.


Seattle Hadoop/NoSQL: Facebook, more Discussion. Thurs May 27th

2010-05-13 Thread Bradford Stephens
We've heard your feedback from the last meetup: we're having less
speakers and more discussion. Yay!
http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/

We're expecting:

1. Facebook will talk about Hive (a SQL-like language for MapReduce)
2. OpsCode will talk about cluster management with Chef
3. Then we'll break up into groups and have casual Hadoop/NoSQL
related discussions and Q&A with several experts, so you can learn
more!

Also, stay tuned for news on a FREE Seattle Hadoop Community &
Training day in late July. We're going to get some fantastic people,
and you'll have hands-on experience with all the Hadoop ecosystem.

When: Thursday, May 27, 2010 6:45 PM

Where:
Amazon SLU, Von Vorst Building
426 Terry Ave N
Seattle, WA 98109
9044153009

-- 
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science


Re: SolrUser - ERROR:SCHEMA-INDEX-MISMATCH

2010-05-13 Thread Erick Erickson
This is probably a bad idea. You're getting by on backwards
compatibility stuff, I'd really recommend that you reindex your
entire corpus, possibly getting by on what you already have
until you can successfully reindex.

Have a look at trie fields (this is detailed in the example
schema.xml). Here's another place to look:
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/

You also haven't told us what you want to do with
the field, so making recommendations is difficult.

Best
Erick

On Thu, May 13, 2010 at 5:19 PM, Anderson vasconcelos <
anderson.v...@gmail.com> wrote:

> Hi Erick.
> I put in my schema.xml fields with type string. The system go to te
> production, and now i see that the field must be a long field.
>
> When i change the fieldtype to long, show the error
> ERROR:SCHEMA-INDEX-MISMATCH when i search by solr admin.
>
> I Put "plong", and this works. This is the way that i must go on? (This
> could generate a trouble in the future?)
>
> What's the advantages to set the field type to long? I must mantain this
> field in string type?
>
> Thanks
>
> 2010/5/13 Erick Erickson 
>
> > Not at present, you must re-index your documents when you redefine your
> > schema
> > to change existing documents.
> >
> > Field updating of documents already indexed is being worked on, but it's
> > not
> > available yet.
> >
> > Best
> > Erick
> >
> > On Thu, May 13, 2010 at 3:58 PM, Anderson vasconcelos <
> > anderson.v...@gmail.com> wrote:
> >
> > > Hi All.
> > >
> > > I have the follow fields in my schema:
> > >  > > default="NEW"/>
> > >  > stored="true"
> > > required="true"/>
> > >  stored="true"
> > > required="true"/>
> > >  > > required="true"/>
> > >  multiValued="false"
> > > indexed="true" stored="true"/>
> > >  > > required="false"/>
> > >  > > required="false"/>
> > >
> > > I need to change the index of SOLR, adding a dynamic field that will
> > > contains all values of "value" field. Its possible to get all index
> data
> > > and
> > > reindex, putting the values on my dynamic field?
> > >
> > > How the data was no stored, i don't find one way to do this
> > >
> > > Thanks
> > >
> >
>


Re: SolrUser - Reindex

2010-05-13 Thread Erick Erickson
In general, it's hard to just answer since there are many
factors to consider, not the least of which is what you
want it to do. In this case, I suspect the issue is
WordDelimiterFactory, it splits words on all non
alphanumerics by default.

It would probably be a good idea to work with
the various combinations of tokenizers and filters
to get a feel for what they do.

The admin analysis page allows you to put in arbitrary
text and see what the results of analysis are. So if you
define a bunch of different fields in your schema (just for
testing), and then put text in the analysis page you'll
see what transformations occur. This is invaluable for
understanding the differences. And until you get a good
idea what various tokenizers and filters do both in isolation
and in combination, you'll get lots of surprises. Even after
you're familiar with them, you'll *still* get surprises, but at
least you'll have a chance to figure it out...

Best
Erick


On Thu, May 13, 2010 at 5:23 PM, Anderson vasconcelos <
anderson.v...@gmail.com> wrote:

> I'm using the textgen fieldtype on my field as follow:
>  positionIncrementGap="100">
>  
>
> words="stopwords.txt" enablePositionIncrements="true" />
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>
>  
>  
>
> ignoreCase="true" expand="true"/>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>
>  
>
>
> .
>   stored="true"/>
>
> .
>
> They no remove the @ symbol. To configure to index the @ symbol i must use
> HTMLStripStandardTokenizerFactory ?
>
> Thanks
>
> 2010/5/13 Erick Erickson 
>
> > Probably your analyzer is removing the @ symbol, it's hard to say if you
> > don't include the relevant parts of your schema.
> >
> > This page might help:
> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >
> > Best
> > Erick
> >
> > On Thu, May 13, 2010 at 3:59 PM, Anderson vasconcelos <
> > anderson.v...@gmail.com> wrote:
> >
> > > Why solr/lucene no index the Character '@' ?
> > >
> > > I send to index email fields x...@gmail.com ...and after try do search
> > > to_email:*...@*, and not found.
> > >
> > > I need to do some configuration?
> > >
> > > Thanks
> > >
> >
>


Bitwise Operations on Integer Fields in Lucene and Solr Index

2010-05-13 Thread Israel Ekpo
Hello Lucene and Solr Community

I have a custom org.apache.lucene.search.Filter that I would like to
contribute to the Lucene and Solr projects.

So I would need some direction as to how to create and ISSUE or submit a
patch.

It looks like there have been changes to the way this is done since the
latest merge of the two projects (Lucene and Solr).

Recently, some Solr users have been looking for a way to perform bitwise
operations between and integer value and some fields in the Index

So, I wrote a Solr QParser plugin to do this using a custom Lucene Filter.

This package makes it possible to filter results returned from a query based
on the results of a bitwise operation on an integer field in the documents
returned from the pre-constructed query.

You can perform three basic types of operations on these integer fields

* BitwiseOperation.BITWISE_AND (bitwise AND)
* BitwiseOperation.BITWISE_OR (bitwise inclusive OR)
* BitwiseOperation.BITWISE_XOR (bitwise exclusive OR)

You can also negate the results of these operations.

For example, imagine there is an integer field in the index named "flags"
with the a value 8 (1000 in binary). The following results will be expected
:

   1. A source value of 8 will match during a BitwiseOperation.BITWISE_AND
operation, with negate set to false.
   2. A source value of 4 will match during a BitwiseOperation.BITWISE_AND
operation, with negate set to true.

The BitwiseFilter constructor accepts the following values

* The name of the integer field (A string)
* The BitwiseOperation object. Example BitwiseOperation.BITWISE_XOR
* The source value (an integer)
* A boolean value indicating whether or not to negate the results of the
operation
* A pre-constructed org.apache.lucene.search.Query

Here is an example of how you would use it with Solr

http://localhost:8983/solr/bitwise/select/?q={!bitwisefield=user_permissions
op=AND source=3 negate=true}state:FL

http://localhost:8983/solr/bitwise/select/?q={!bitwisefield=user_permissions
op=AND source=3}state:FL

Here is an example of how you would use it with Lucene

public class BitwiseTestSearch extends BitwiseTestBase {

public BitwiseTestSearch()
{

}

public void search() throws IOException, ParseException
{
setupSearch();

// term
Term t = new Term(COUNTRY_KEY, "us");

// term query
Query q = new TermQuery(t);

// maximum number of documents to display
int limit = 1000;

int sourceValue = 0 ;

boolean negate = false;

BitwiseFilter bitwiseFilter = new BitwiseFilter(USER_PERMS_KEY,
BitwiseOperation.BITWISE_XOR, sourceValue, negate, q);

Query fq = new FilteredQuery(q, bitwiseFilter);

ScoreDoc[] hits = isearcher.search(fq, null, limit).scoreDocs;

BitwiseResultFilter resultFilter = bitwiseFilter.getResultFilter();

for (int i = 0; i < hits.length; i++) {

Document hitDoc = isearcher.doc(hits[i].doc);

System.out.println(FIRST_NAME_KEY + " field has a value of " +
hitDoc.get(FIRST_NAME_KEY));
System.out.println(LAST_NAME_KEY + " field has a value of " +
hitDoc.get(LAST_NAME_KEY));
System.out.println(ACTIVE_KEY + " field has a value of " +
hitDoc.get(ACTIVE_KEY));

System.out.println(USER_PERMS_KEY + " field has a value of " +
hitDoc.get(USER_PERMS_KEY));

System.out.println("doc ID --> " + hits[i].doc);


System.out.println("...");
}

System.out.println("sourceValue = " + sourceValue + ",operation = "
+ resultFilter.getOperation().getOperationName() + ", negate = " + negate);

System.out.println("A total of " + hits.length + " documents were
found from the search\n");

shutdown();
}

public static void main(String args[]) throws IOException,
ParseException
{
BitwiseTestSearch search = new BitwiseTestSearch();

search.search();
}
}

Any guidance would be highly appreciated.

Thanks.


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: SolrUser - Reindex

2010-05-13 Thread Anderson vasconcelos
I'm using the textgen fieldtype on my field as follow:

  




  
  





  


.
 

.

They no remove the @ symbol. To configure to index the @ symbol i must use
HTMLStripStandardTokenizerFactory ?

Thanks

2010/5/13 Erick Erickson 

> Probably your analyzer is removing the @ symbol, it's hard to say if you
> don't include the relevant parts of your schema.
>
> This page might help:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> Best
> Erick
>
> On Thu, May 13, 2010 at 3:59 PM, Anderson vasconcelos <
> anderson.v...@gmail.com> wrote:
>
> > Why solr/lucene no index the Character '@' ?
> >
> > I send to index email fields x...@gmail.com ...and after try do search
> > to_email:*...@*, and not found.
> >
> > I need to do some configuration?
> >
> > Thanks
> >
>


Re: SolrUser - ERROR:SCHEMA-INDEX-MISMATCH

2010-05-13 Thread Anderson vasconcelos
Hi Erick.
I put in my schema.xml fields with type string. The system go to te
production, and now i see that the field must be a long field.

When i change the fieldtype to long, show the error
ERROR:SCHEMA-INDEX-MISMATCH when i search by solr admin.

I Put "plong", and this works. This is the way that i must go on? (This
could generate a trouble in the future?)

What's the advantages to set the field type to long? I must mantain this
field in string type?

Thanks

2010/5/13 Erick Erickson 

> Not at present, you must re-index your documents when you redefine your
> schema
> to change existing documents.
>
> Field updating of documents already indexed is being worked on, but it's
> not
> available yet.
>
> Best
> Erick
>
> On Thu, May 13, 2010 at 3:58 PM, Anderson vasconcelos <
> anderson.v...@gmail.com> wrote:
>
> > Hi All.
> >
> > I have the follow fields in my schema:
> >  > default="NEW"/>
> >  stored="true"
> > required="true"/>
> >  > required="true"/>
> >  > required="true"/>
> >  > indexed="true" stored="true"/>
> >  > required="false"/>
> >  > required="false"/>
> >
> > I need to change the index of SOLR, adding a dynamic field that will
> > contains all values of "value" field. Its possible to get all index data
> > and
> > reindex, putting the values on my dynamic field?
> >
> > How the data was no stored, i don't find one way to do this
> >
> > Thanks
> >
>


Re: bi-directional replication on solr 1.4?

2010-05-13 Thread Tim Heckman
It looks like SnapPuller.java doesn't allow for the possibility of the
slave having a later index version than the master. It only checks
whether the versions are equal.

It's easy enough to add that check and prevent the index fetch when
the slave has a later version (in fact I'm running it in a sandbox
right now). But I wonder what other problems it might create in a
production environment (or what problems I am overlooking). Does
anyone have any thoughts on this?

thanks,
Tim


On Thu, May 13, 2010 at 2:37 PM, Tim Heckman  wrote:
> Does bi-directional replication work in solr 1.4? In other words, if I
> wanted to have 2 servers that are both master and slave. Call them A
> and B. I would configure things so that normally, A runs a DIH
> periodically to rebuild the index, and then B pulls the updated index
> from A. The idea here is that if A goes down, B could run the data
> imports, and then A would pick up the up-to-date index from B when it
> comes back up.
>
> Based on the note about Repeaters on the wiki, it looks like it might
> be possible, but there aren't enough details for me to know for sure.
>
> thanks,
> Tim
>


Re: SolrUser - Reindex

2010-05-13 Thread Erick Erickson
Probably your analyzer is removing the @ symbol, it's hard to say if you
don't include the relevant parts of your schema.

This page might help:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Best
Erick

On Thu, May 13, 2010 at 3:59 PM, Anderson vasconcelos <
anderson.v...@gmail.com> wrote:

> Why solr/lucene no index the Character '@' ?
>
> I send to index email fields x...@gmail.com ...and after try do search
> to_email:*...@*, and not found.
>
> I need to do some configuration?
>
> Thanks
>


Re: SolrUser - ERROR:SCHEMA-INDEX-MISMATCH

2010-05-13 Thread Erick Erickson
Not at present, you must re-index your documents when you redefine your
schema
to change existing documents.

Field updating of documents already indexed is being worked on, but it's not
available yet.

Best
Erick

On Thu, May 13, 2010 at 3:58 PM, Anderson vasconcelos <
anderson.v...@gmail.com> wrote:

> Hi All.
>
> I have the follow fields in my schema:
>  default="NEW"/>
>  required="true"/>
>  required="true"/>
>  required="true"/>
>  indexed="true" stored="true"/>
>  required="false"/>
>  required="false"/>
>
> I need to change the index of SOLR, adding a dynamic field that will
> contains all values of "value" field. Its possible to get all index data
> and
> reindex, putting the values on my dynamic field?
>
> How the data was no stored, i don't find one way to do this
>
> Thanks
>


Re: Help with Embedded Server - SOLVED

2010-05-13 Thread Eric Berry
Thanks for all the help Lance.

I was finally able to get it working by using a more complex initialization
process:
[code lang="groovy"]
def solrHome = ConfigurationHolder.config.universitySearchService?.solrHome
?: ""
def coreName = ConfigurationHolder.config.universitySearchService?.solrCore
?: "universities"
if (!solrHome) {
   throw new IllegalArgumentException("UniversitySearchService configured as
embedded, but no solrHome property is set.")
}
CoreContainer coreContainer = new CoreContainer(solrHome)
File solrHomeDir = new File(solrHome)
File solrDataDir = new File(solrHomeDir, "data")
SolrConfig solrConfig = new SolrConfig(solrHome, "solrconfig.xml", null) //
null input stream so the xml file is used instead.
CoreDescriptor descriptor = new CoreDescriptor(coreContainer, coreName,
solrHome)
SolrCore solrCore = new SolrCore(coreName, solrDataDir.path, solrConfig,
null, descriptor)
coreContainer.register(solrCore, false)
server = new EmbeddedSolrServer(coreContainer, coreName)
[/code]

I basically needed to create a SolrConfig, SolrConfig, and CoreDescriptor
manually and register it with the CoreContainer.

Thanks,
Eric

-- 
Learn from the past. Live in the present. Plan for the future.
Blog: http://www.townsfolkdesigns.com/blogs/elberry
jEdit  - Programmer's Text Editor
Bazaar  - Version Control for Humans


SolrUser - Reindex

2010-05-13 Thread Anderson vasconcelos
Why solr/lucene no index the Character '@' ?

I send to index email fields x...@gmail.com ...and after try do search
to_email:*...@*, and not found.

I need to do some configuration?

Thanks


SolrUser - ERROR:SCHEMA-INDEX-MISMATCH

2010-05-13 Thread Anderson vasconcelos
Hi All.

I have the follow fields in my schema:








I need to change the index of SOLR, adding a dynamic field that will
contains all values of "value" field. Its possible to get all index data and
reindex, putting the values on my dynamic field?

How the data was no stored, i don't find one way to do this

Thanks


Re: Field Collapsing: How to estimate total number of hits

2010-05-13 Thread Sergey Shinderuk
Finally I get it working. It seems that latest SOLR-236-trunk.patch
just have some bugs.

I checked out an older revision of solr trunk - rev 899572 (dtd.
2010-01-15) from  http://svn.apache.org/repos/asf/lucene/solr/trunk
and applied SOLR-236.patch dtd. 2010-02-01.

And collapsing works fine. I get correct numFound values after collapsing.

Maybe this can help someone.


2010/5/13 Sergey Shinderuk :
> Joe, thanks for your answer. But it doesn't solve my problem. Below I
> gave a longer description of my problem.
>
> First of all, I checked out solr trunk revision 928303 with last
> change dtd. 2010-03-28. Then I applied the latest patch from SOLR-236
> to get field collapsing component. After that I built the example
> configuration with 'ant example'.
>
> Then I started to experiment with field collapsing:
>
> 1. Query all docs http://localhost:8983/solr/select?q=*:*
> ...
> 
> ...
> There are 19 documents in the index.
>
>
> 2. Same with faceting by manu_exact field:
> http://localhost:8983/solr/select?q=*:*&facet=on&facet.field=manu_exact
> ...
> 
>  
>    4
>    2
>    2
>    2
>    2
>    1
>    1
>    1
>    1
>    1
>    1
>    1
>  
> 
> ...
>
> I got 12 distinct facets.
>
>
> 3. Now collapsing by manu_exact instead of faceting
> http://localhost:8983/solr/select?q=*:*&collapse.field=manu_exact
>
> I get collapse counts for the first 10 rows having distinct manu_exact
> values. But the problem is that i get an odd numFound:
>
> 
>
> numFound is equal to the number of rows returned by solr. (In fact, if
> I add rows=3 to the query string, then I get numFound=3.)
> And I want to get numFound = 12, because there are 12 distinct values
> in the index for manu_exact field as demonstrated in p. 2.
>
>
>
> Joe suggested adding a dummy field with a sole value of 1 and
> performimg faceting on this field over *uncollapsed* result set
>
> http://localhost:8983/solr/select?q=*:*&collapse.field=manu_exact&collapse.facet=after&facet=on&facet.field=dummy&rows=3
>
> And I get numFound = 10 as before and facet count = 19 for the sole
> value of dummy field. And this is the expected result, but not what I
> want.
>
>
> I thought that my question is the one faced immediately if one uses
> field collapsing. If you don't know the total number of results, then
> you cannot paginate through them, at least you don't know the number
> of pages.
>
> In my application I'm trying to collapse near-duplicate documents
> based on document signature. And I need to know how many non-duplicate
> results hit the query.
>
>
> Any help appreciated.
>
>
> 2010/5/12 Joe Calderon :
>> dont know if its the best solution but i have a field i facet on
>> called type its either 0,1, combined with collapse.facet=before i just
>> sum all the values of the facet field to get the total number found
>>
>> if you dont have such a field u can always add a field with a single value
>>
>> --joe
>>
>> On Wed, May 12, 2010 at 10:41 AM, Sergey Shinderuk  
>> wrote:
>>> Hi, fellows!
>>>
>>> I use field collapsing to collapse near-duplicate documents based on
>>> document fuzzy signature calculated at index time.
>>> The problem is that, when field collapsing is enabled, in query
>>> response numFound is equal to the number of rows requested.
>>>
>>> For instance, with solr example schema i can issue the following query
>>>
>>> http://localhost:8983/solr/select?q=*:*&rows=3&collapse.field=manu_exact
>>>
>>> In response i get collapse_counts together with ordinary result list,
>>> but numFound equals 3.
>>> As far as I understand, this is due to the way field collapsing works.
>>>
>>> I want to show the total number of hits to the user and provide a
>>> pagination through the results.
>>>
>>> Any ideas?
>>>
>>> Regards,
>>> Sergey Shinderuk
>>>
>>
>


Re: [resolved] Config issue for deduplication

2010-05-13 Thread Markus Fischer

Got it with the help of Demian Katz, main developper of Vufind:

The import script of Vufind was bypassing the duplication parameters 
while writing directly to the SOLR-Index.


By deactivitating direct writing to the index and using the standard way 
it now works!


Thanks to all who gave input!

Markus

Markus Fischer schrieb:

I use

true

and a different field than ID to control duplication. This is about 
bibliographic data coming from different sources with different IDs 
which may have the same content...


I attached solrconfig.xml if you want to take a look.

Thanks a lot!

Markus

Markus Jelsma schrieb:
What's your solrconfig? No deduplication is overwritesDedupes = false 
and signature field is other than doc ID field (unique)  
-Original message-

From: Markus Fischer 
Sent: Thu 13-05-2010 17:01
To: solr-user@lucene.apache.org; Subject: Config issue for deduplication

I am trying to configure automatic deduplication for SOLR 1.4 in 
Vufind. I followed:


http://wiki.apache.org/solr/Deduplication

Actually nothing happens. All records are being imported without any 
deduplication.


What am I missing?

Thanks
Markus

I did:

- create a duplicated set of records, only shifted their ID by a fixed 
number


---
solrconfig.xml


 
 dedupe
 



  class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">

  true
  true
  dedupeHash
  reference,issn
  name="signatureClass">org.apache.solr.update.processor.Lookup3Signature 


  
  
  


---
In schema.xml I added the field

multiValued="false" />


--

If I look at the created field "dedupeHash" it seems to be empty...!?



Re: multi-valued associated fields

2010-05-13 Thread Eric Grobler
Hi Ahmed

Thanks again for sharing your insight and experience.
I will discuss the multi-core approach with members of our team.

Regards
Eric

On Wed, May 12, 2010 at 9:24 PM, ahammad  wrote:

>
> In our deployment, we thought that complications might arise when
> attempting
> to hit the Solr server with addresses of too many cores. For instance, we
> have 15+ cores running at the moment. At the worst case, we will have to
> use
> all 15+ addresses of all the cores to search all our data. What we
> eventually did was to combine all the cores into a single core, which will
> basically give us a more clean solution. You will get the simplicity of
> querying one core, but the flexibility of modifying cores separately.
>
> Basically, we have all the cores indexing separately. We set up a script
> that would use the index merge functionality of Solr to combine all the
> indexes into a single index accessible through one core. Yes, there will be
> some overhead on the server, but I believe that it's a good compromise. In
> our case, we have multiple servers at our disposal, so this was not a
> problem to implement. It all depends on your data set and the volume of
> documents that you will be indexing.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813419.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Config issue for deduplication

2010-05-13 Thread Markus Fischer

I use

true

and a different field than ID to control duplication. This is about 
bibliographic data coming from different sources with different IDs 
which may have the same content...


I attached solrconfig.xml if you want to take a look.

Thanks a lot!

Markus

Markus Jelsma schrieb:
What's your solrconfig? No deduplication is overwritesDedupes = false and signature field is other than doc ID field (unique) 
 
-Original message-

From: Markus Fischer 
Sent: Thu 13-05-2010 17:01
To: solr-user@lucene.apache.org; 
Subject: Config issue for deduplication


I am trying to configure automatic deduplication for SOLR 1.4 in Vufind. 
I followed:


http://wiki.apache.org/solr/Deduplication

Actually nothing happens. All records are being imported without any 
deduplication.


What am I missing?

Thanks
Markus

I did:

- create a duplicated set of records, only shifted their ID by a fixed 
number


---
solrconfig.xml


 
 dedupe
 



  class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">

  true
  true
  dedupeHash
  reference,issn
  name="signatureClass">org.apache.solr.update.processor.Lookup3Signature

  
  
  


---
In schema.xml I added the field

multiValued="false" />


--

If I look at the created field "dedupeHash" it seems to be empty...!?





  
  ${solr.abortOnConfigurationError:false}

  
  ${solr.solr.home:./solr}/biblio


  
   
false

10



32
2147483647
1
1000
1










single
  

  

false
32
10


2147483647
1


false



  
  false
  
  1
  


  
  
  
  

  
  




 
  1
  2






  


  

1024





   


  



true




   

   
50


200







  

  science art business engineering history
  0
  10

  




  

  science art business engineering history
  format
  format:book

  



false


2

  

  
  






   
   
   

  
  
  
  
  

 
   explicit
   
 true
 true
 20


  spellcheck
 
  
  
  
  

 dismax
 explicit
 true
 true
 20


  spellcheck
 
  
  
  

  title,title_short,callnumber-label,topic,language,author,publishDate
  
title^75
title_short^100
callnumber-label^400
topic^300
language^30
author^75
publishDate
  
  1
  1
  true
  5
  5

  
  
  

  default
  spellingShingle
  0.75
  ./spellShingle
  textSpellShingle
  true


  basicSpell
  spelling
  0.75
  ./spellchecker
  textSpell
  true

  
  


  
 
  

  explicit



  spellcheck
  elevator

  
  
  

string
elevate.xml
  
 
  

  explicit


  elevator

  
  

  
  

  

  

dedupe

  
  
   

   
 true
 true
 dedupeHash
 title
 org.apache.solr.update.processor.Lookup3Signature
   
   
   
 

  
  
  

  
  


  
  
  
  
  

  standard
  solrpingquery
  all

  

  
  

 explicit 
 true

  
  
  
   
   
   

 100

   

   
   

  
  70
  
  0.5 
  
  [-\w ,/\n\"']{20,200}

   
   
   
   

 
 

   
  
  
  
  

  
  
5
   

   
  
shakespeare


  




bi-directional replication on solr 1.4?

2010-05-13 Thread Tim Heckman
Does bi-directional replication work in solr 1.4? In other words, if I
wanted to have 2 servers that are both master and slave. Call them A
and B. I would configure things so that normally, A runs a DIH
periodically to rebuild the index, and then B pulls the updated index
from A. The idea here is that if A goes down, B could run the data
imports, and then A would pick up the up-to-date index from B when it
comes back up.

Based on the note about Repeaters on the wiki, it looks like it might
be possible, but there aren't enough details for me to know for sure.

thanks,
Tim


Re: Advancded Reading

2010-05-13 Thread Peter Sturge
A truly indispensable resource is Yonik's Mastering Solr 1.4 on-demand
webinar:


http://www.lucidimagination.com/solutions/Webinars/mastering-solr-1.4-with-yonik-seeley




On Thu, May 13, 2010 at 6:04 PM, Blargy  wrote:

>
> Does anyone know of any documentation that is more in-depth that the wiki
> and
> the Solr 1.4 book? I'm passed the basic usage of Solr and creating simple
> support plugins. I really want to know all about the inner workings of Solr
> and Lucene. Can someone recommend anything?
>
> Thanks
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Advancded-Reading-tp815382p815382.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


maximum recommended document cache size

2010-05-13 Thread Nagelberg, Kallin
I am trying to tune my Solr setup so that the caches are well warmed after the 
index is updated. My documents are quite small, usually under 10k. I currently 
have a document cache size of about 15,000, and am warming up 5,000 with a 
query after each indexing. Autocommit is set at 30 seconds, and my caches are 
warming up easily in just a couple of seconds. I've read of concerns regarding 
garbage collection when your cache is too large. Does anyone have experience 
with this? Ideally I would like to get 90% of all documents from the last month 
in memory after each index, which would be around 25,000. I'm doing extensive 
load testing, but if someone has recommendations I'd love to hear them.

Thanks,
-Kallin Nagelberg


Re: synonyms not working with copyfield

2010-05-13 Thread Nick Martin
Hi,

You could use a copyField against all fields and then AND the query terms 
given. Quite restrictive but all terms would then have to be present to match.
I'm still a relative newbie to Solr so perhaps I'm horribly wrong.

Cheers

Nick

On 13 May 2010, at 18:18, surajit wrote:

> 
> Understood and I can work with that limitation by using separate
> fields during indexing. However, my search interface is just a text
> box like Google and I need to take the query and return only those
> documents that match ALL terms in the query and if I am going to take
> the query and match it against each field (separately), how do I get
> back documents matching all user terms? One soln I can think of is to
> take all the field-specific analysis out of solr and do it as a
> pre-process step, but want to make sure there isn't an alternative
> within Solr.
> 
> surajit
> 
> On Thu, May 13, 2010 at 12:42 PM, Chris Hostetter-3 [via Lucene]
>  wrote:
>> : which is good, but the different fields that I copy into the copyfield
>> need
>> : different analysis and I no longer am able to do that. I can, of course,
>> 
>> Fundementally, Solr can only apply a single analysis chain to all of
>> the text in a given field -- regardless of where it may be copied from.
>> if it didn't, there would be no way to get matches at query time.
>> 
>> the query analysis has to "make sense" for the index analysis, so it has
>> to be consistent.
>> 
>> 
>> 
>> -Hoss
>> 
>> 
>> 
>> 
>> View message @
>> http://lucene.472066.n3.nabble.com/synonyms-not-working-with-copyfield-tp814108p815302.html
>> To unsubscribe from Re: synonyms not working with copyfield, click here.
>> 
> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/synonyms-not-working-with-copyfield-tp814108p815426.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: synonyms not working with copyfield

2010-05-13 Thread surajit

Understood and I can work with that limitation by using separate
fields during indexing. However, my search interface is just a text
box like Google and I need to take the query and return only those
documents that match ALL terms in the query and if I am going to take
the query and match it against each field (separately), how do I get
back documents matching all user terms? One soln I can think of is to
take all the field-specific analysis out of solr and do it as a
pre-process step, but want to make sure there isn't an alternative
within Solr.

surajit

On Thu, May 13, 2010 at 12:42 PM, Chris Hostetter-3 [via Lucene]
 wrote:
> : which is good, but the different fields that I copy into the copyfield
> need
> : different analysis and I no longer am able to do that. I can, of course,
>
> Fundementally, Solr can only apply a single analysis chain to all of
> the text in a given field -- regardless of where it may be copied from.
> if it didn't, there would be no way to get matches at query time.
>
> the query analysis has to "make sense" for the index analysis, so it has
> to be consistent.
>
>
>
> -Hoss
>
>
>
> 
> View message @
> http://lucene.472066.n3.nabble.com/synonyms-not-working-with-copyfield-tp814108p815302.html
> To unsubscribe from Re: synonyms not working with copyfield, click here.
>

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-not-working-with-copyfield-tp814108p815426.html
Sent from the Solr - User mailing list archive at Nabble.com.


Advancded Reading

2010-05-13 Thread Blargy

Does anyone know of any documentation that is more in-depth that the wiki and
the Solr 1.4 book? I'm passed the basic usage of Solr and creating simple
support plugins. I really want to know all about the inner workings of Solr
and Lucene. Can someone recommend anything?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Advancded-Reading-tp815382p815382.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: grouping in fq

2010-05-13 Thread Chris Hostetter

: >> (+category:xyz +price:[100 TO *]) -category:xyz
: 
: this one doesn't seem to work (I'm not using a price field, but a text field
: -- using price field here just for example).

it never will, it's saying only things that are in category xyz and above 
100 dollars can match, but anything in category xyz can not match.

inherient contradiction.

: (+category:xyz +price:[100 TO *]) (-category:xyz) -- returns only results
: with category xyz and price >=100

you can't have a pure negative clauses in a boolean query -- they match 
nothing (by definition: a query that only rejects things doesn't select 
anything) the second set of parens creates a boolean query with one 
negative clause, so it selects nothing, hence you only get docs matching 
the first part.


: (+category:xyz +price:[100 TO *]) (*:* -category:xyz) -- returns results
: with category xyz and price >=100 AND results where category!=xyz

exactly.  *:* selects all docs, and -category:xyz then rejects the ones in 
category xyz.  these are then combined with the docs from the first part 
(in cat xyz and above 100)

so now you have what you want...

: > >> > How do I implement a requirement like "if category is xyz,
: > >> > the price should
: > >> > be greater than 100 for inclusion in the result set".


-Hoss



Re: synonyms not working with copyfield

2010-05-13 Thread Chris Hostetter
: which is good, but the different fields that I copy into the copyfield need
: different analysis and I no longer am able to do that. I can, of course,

Fundementally, Solr can only apply a single analysis chain to all of 
the text in a given field -- regardless of where it may be copied from.  
if it didn't, there would be no way to get matches at query time.

the query analysis has to "make sense" for the index analysis, so it has 
to be consistent.



-Hoss



Re: synonyms not working with copyfield

2010-05-13 Thread Sachin

 take a look at the DismaxRequestHandler:

http://wiki.apache.org/solr/DisMaxRequestHandler

 


 

 

-Original Message-
From: surajit 
To: solr-user@lucene.apache.org
Sent: Thu, May 13, 2010 9:52 pm
Subject: Re: synonyms not working with copyfield



Thanks much! I added a synonym filter to the copyfield and it started working
which is good, but the different fields that I copy into the copyfield need
different analysis and I no longer am able to do that. I can, of course,
search against the individual fields instead of the copyfield, but I want to
return a match only if ALL terms in the query are matched in the overall
document (as in an AND) and if I search against individual fields I am not
sure of an easy way to figure out if all terms matched in the overall
document. Any ideas?

surajit
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-not-working-with-copyfield-tp814108p815263.html
Sent from the Solr - User mailing list archive at Nabble.com.

 


RE: confused by simple OR

2010-05-13 Thread Nagelberg, Kallin
Awesome that works, thanks Ahmet. 

-Kallin Nagelberg

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Thursday, May 13, 2010 12:24 PM
To: solr-user@lucene.apache.org
Subject: Re: confused by simple OR


> I must be missing something very
> obvious here. I have a filter query like so:
> 
> (-rootdir:somevalue)
> 
> I get results for that filter
> 
> However, when I OR it with another term like so I get
> nothing:
> 
> ((-rootdir:somevalue) OR (rootdir:somevalue AND
> someboolean:true))
> 

Simply you cannot combine NOT and OR clauses like you did. It should be 
something like: 

((+*:* -rootdir:somevalue) OR (rootdir:somevalue AND someboolean:true))


  


Re: confused by simple OR

2010-05-13 Thread Ahmet Arslan

> I must be missing something very
> obvious here. I have a filter query like so:
> 
> (-rootdir:somevalue)
> 
> I get results for that filter
> 
> However, when I OR it with another term like so I get
> nothing:
> 
> ((-rootdir:somevalue) OR (rootdir:somevalue AND
> someboolean:true))
> 

Simply you cannot combine NOT and OR clauses like you did. It should be 
something like: 

((+*:* -rootdir:somevalue) OR (rootdir:somevalue AND someboolean:true))


  


Re: synonyms not working with copyfield

2010-05-13 Thread surajit

Thanks much! I added a synonym filter to the copyfield and it started working
which is good, but the different fields that I copy into the copyfield need
different analysis and I no longer am able to do that. I can, of course,
search against the individual fields instead of the copyfield, but I want to
return a match only if ALL terms in the query are matched in the overall
document (as in an AND) and if I search against individual fields I am not
sure of an easy way to figure out if all terms matched in the overall
document. Any ideas?

surajit
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-not-working-with-copyfield-tp814108p815263.html
Sent from the Solr - User mailing list archive at Nabble.com.


confused by simple OR

2010-05-13 Thread Nagelberg, Kallin
I must be missing something very obvious here. I have a filter query like so:

(-rootdir:somevalue)

I get results for that filter

However, when I OR it with another term like so I get nothing:

((-rootdir:somevalue) OR (rootdir:somevalue AND someboolean:true))

How is this possible? Have I gone mad?

Thanks,
Kallin Nagelberg




RE: Config issue for deduplication

2010-05-13 Thread Markus Jelsma
What's your solrconfig? No deduplication is overwritesDedupes = false and 
signature field is other than doc ID field (unique) 
 
-Original message-
From: Markus Fischer 
Sent: Thu 13-05-2010 17:01
To: solr-user@lucene.apache.org; 
Subject: Config issue for deduplication

I am trying to configure automatic deduplication for SOLR 1.4 in Vufind. 
I followed:

http://wiki.apache.org/solr/Deduplication

Actually nothing happens. All records are being imported without any 
deduplication.

What am I missing?

Thanks
Markus

I did:

- create a duplicated set of records, only shifted their ID by a fixed 
number

---
solrconfig.xml


 
     dedupe
 



  
  true
  true
  dedupeHash
  reference,issn
  org.apache.solr.update.processor.Lookup3Signature
  
  
  


---
In schema.xml I added the field



--

If I look at the created field "dedupeHash" it seems to be empty...!?


Re: Config issue for deduplication

2010-05-13 Thread Markus Fischer
Hmm, I can't find in solrconfig.xml anything about dataimporthandler for 
Vufind.


So I suppose, no the import function does not use this method. Import is 
done by a script.


Maybe I do not associate


 
 dedupe
 


with the correct requestHandler?

I placed it directly after



So kind of having twice this line.

Markus

Ahmet Arslan schrieb:

I am trying to configure automatic
deduplication for SOLR 1.4 in Vufind. I followed:

http://wiki.apache.org/solr/Deduplication

Actually nothing happens. All records are being imported
without any deduplication.


Does "being imported" means you are using dataimporthandler? If yes you can use 
this to enable DIH with dedupe.



data-config.xml
dedupe




  


Re: Question on pf (Phrase Fields)

2010-05-13 Thread Marco Martinez
I don't know if this solution accomplished your requirements but you can use
fq to do the query with only "foo" and q when you search by more terms.

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/5/13 Blargy 

>
> Is there any way to configure this so it only takes after if you match more
> than one word?
>
> For example if I search for: "foo" it should have no effect on scoring, but
> if I search for "foo bar" then it should.
>
> Is this possible? Thanks
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Question-on-pf-Phrase-Fields-tp815095p815095.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: grouping in fq

2010-05-13 Thread Satish Kumar
>> (+category:xyz +price:[100 TO *]) -category:xyz

this one doesn't seem to work (I'm not using a price field, but a text field
-- using price field here just for example).

Below are some other variations I tried:

(+category:xyz +price:[100 TO *]) -category:xyz -- zero results
(+category:xyz +price:[100 TO *]) (-category:xyz) -- returns only results
with category xyz and price >=100
(+category:xyz +price:[100 TO *]) (*:* -category:xyz) -- returns results
with category xyz and price >=100 AND results where category!=xyz


On Wed, May 12, 2010 at 2:54 PM, Lance Norskog  wrote:

> Because leading negative clauses don't work. The (*:* AND x) syntax
> means "select everything AND also select x".
>
> You could also do
> (+category:xyz +price:[100 TO *]) -category:xyz
>
> On Tue, May 11, 2010 at 12:36 PM, Satish Kumar
>  wrote:
> > thanks Ahmet.
> >
> > (+category:xyz +price:[100 TO *]) (+*:* -category:xyz)
> > why do we have to use (+*:* -category:xyz) instead of  -category:xyz?
> >
> >
> >
> > On Tue, May 11, 2010 at 3:08 PM, Ahmet Arslan  wrote:
> >
> >> > How do I implement a requirement like "if category is xyz,
> >> > the price should
> >> > be greater than 100 for inclusion in the result set".
> >> >
> >> > In other words, the result set should contain:
> >> > - all matching documents with category value not xyz
> >> > - all matching documents with category value xyz and price
> >> > > 100
> >> >
> >> > I was thinking something like fq=(-category:xyz OR
> >> > (category:xyz AND price >
> >> > 100))
> >> >
> >> > this doesn't seem to work. Any suggestions will be greatly
> >> > appreciated.
> >>
> >> Something like this should work:
> >> (+category:xyz +price:[100 TO *]) (+*:* -category:xyz)
> >>
> >> and your price field must be one of the trie based fields.
> >>
> >>
> >>
> >>
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Question on pf (Phrase Fields)

2010-05-13 Thread Blargy

Is there any way to configure this so it only takes after if you match more
than one word?

For example if I search for: "foo" it should have no effect on scoring, but
if I search for "foo bar" then it should.

Is this possible? Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-on-pf-Phrase-Fields-tp815095p815095.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Config issue for deduplication

2010-05-13 Thread Ahmet Arslan
> I am trying to configure automatic
> deduplication for SOLR 1.4 in Vufind. I followed:
> 
> http://wiki.apache.org/solr/Deduplication
> 
> Actually nothing happens. All records are being imported
> without any deduplication.

Does "being imported" means you are using dataimporthandler? If yes you can use 
this to enable DIH with dedupe.



data-config.xml
dedupe




  


Config issue for deduplication

2010-05-13 Thread Markus Fischer
I am trying to configure automatic deduplication for SOLR 1.4 in Vufind. 
I followed:


http://wiki.apache.org/solr/Deduplication

Actually nothing happens. All records are being imported without any 
deduplication.


What am I missing?

Thanks
Markus

I did:

- create a duplicated set of records, only shifted their ID by a fixed 
number


---
solrconfig.xml


 
 dedupe
 



  class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">

  true
  true
  dedupeHash
  reference,issn
  name="signatureClass">org.apache.solr.update.processor.Lookup3Signature

  
  
  


---
In schema.xml I added the field

multiValued="false" />


--

If I look at the created field "dedupeHash" it seems to be empty...!?


Re: ContentStreamUpdateRequest - out of memory on a large file

2010-05-13 Thread Grant Ingersoll

On May 12, 2010, at 1:58 PM, Christopher Baird wrote:

> We're running into an out of memory problem when sending a large file to our
> SOLR server using the ContentStreamUpdateRequest.  It appears that this
> happens because when the request method of CommonsHttpSolrServer is called
> (this is called even when using a StreamingUpdateSolrServer instance because
> the ContentStreamUpdateRequest class is not an instance of UpdateRequest) an
> InputStreamRequestEntity is used in the PostMethod buffers the content.  The
> buffering happens because the content length is not provided and thus
> defaults to "CONTENT_LENGHT_AUTO" which instructs InputStreamRequestEntity
> to buffer the entire content.
> 
> 
> 
> Is there an existing work-around to this?
> 
> 
> 
> If not, can anyone think of why I wouldn't want to update the code to pass
> in the content-length and avoid the buffering (I don't want to walk down a
> path to find out I really stepped in something).

I can't think of any reason not to put up a patch for it.

Re: Field Collapsing: How to estimate total number of hits

2010-05-13 Thread Sergey Shinderuk
Joe, thanks for your answer. But it doesn't solve my problem. Below I
gave a longer description of my problem.

First of all, I checked out solr trunk revision 928303 with last
change dtd. 2010-03-28. Then I applied the latest patch from SOLR-236
to get field collapsing component. After that I built the example
configuration with 'ant example'.

Then I started to experiment with field collapsing:

1. Query all docs http://localhost:8983/solr/select?q=*:*
...

...
There are 19 documents in the index.


2. Same with faceting by manu_exact field:
http://localhost:8983/solr/select?q=*:*&facet=on&facet.field=manu_exact
...

  
4
2
2
2
2
1
1
1
1
1
1
1
  

...

I got 12 distinct facets.


3. Now collapsing by manu_exact instead of faceting
http://localhost:8983/solr/select?q=*:*&collapse.field=manu_exact

I get collapse counts for the first 10 rows having distinct manu_exact
values. But the problem is that i get an odd numFound:



numFound is equal to the number of rows returned by solr. (In fact, if
I add rows=3 to the query string, then I get numFound=3.)
And I want to get numFound = 12, because there are 12 distinct values
in the index for manu_exact field as demonstrated in p. 2.



Joe suggested adding a dummy field with a sole value of 1 and
performimg faceting on this field over *uncollapsed* result set

http://localhost:8983/solr/select?q=*:*&collapse.field=manu_exact&collapse.facet=after&facet=on&facet.field=dummy&rows=3

And I get numFound = 10 as before and facet count = 19 for the sole
value of dummy field. And this is the expected result, but not what I
want.


I thought that my question is the one faced immediately if one uses
field collapsing. If you don't know the total number of results, then
you cannot paginate through them, at least you don't know the number
of pages.

In my application I'm trying to collapse near-duplicate documents
based on document signature. And I need to know how many non-duplicate
results hit the query.


Any help appreciated.


2010/5/12 Joe Calderon :
> dont know if its the best solution but i have a field i facet on
> called type its either 0,1, combined with collapse.facet=before i just
> sum all the values of the facet field to get the total number found
>
> if you dont have such a field u can always add a field with a single value
>
> --joe
>
> On Wed, May 12, 2010 at 10:41 AM, Sergey Shinderuk  
> wrote:
>> Hi, fellows!
>>
>> I use field collapsing to collapse near-duplicate documents based on
>> document fuzzy signature calculated at index time.
>> The problem is that, when field collapsing is enabled, in query
>> response numFound is equal to the number of rows requested.
>>
>> For instance, with solr example schema i can issue the following query
>>
>> http://localhost:8983/solr/select?q=*:*&rows=3&collapse.field=manu_exact
>>
>> In response i get collapse_counts together with ordinary result list,
>> but numFound equals 3.
>> As far as I understand, this is due to the way field collapsing works.
>>
>> I want to show the total number of hits to the user and provide a
>> pagination through the results.
>>
>> Any ideas?
>>
>> Regards,
>> Sergey Shinderuk
>>
>


Re: synonyms not working with copyfield

2010-05-13 Thread Ahmet Arslan
> I have indexed person names in solr using synonym expansion
> and am getting a
> match when I explicitly use that field in my query
> (name:query). However,
> when I copy that field into another field using copyfield
> and search on that
> field, I don't get a match. Below are excerpts from
> schema.txt. I am new to
> Solr and appreciate any help! Thanks.
> 
> Surajit
> 
>  positionIncrementGap="100">
>       
>          class="solr.WhitespaceTokenizerFactory"/>
>          class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="0"
> catenateWords="1"
> catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="1"/>
>          class="solr.LowerCaseFilterFactory"/>
>          class="solr.SynonymFilterFactory"
> synonyms="person-synonyms.txt" ignoreCase="true"
> expand="true"/>
>       
> 
> 
> 
> 
> 
>  indexed="true"
> stored="true" required="false" /> 
> 
> 
> 
> 
>  stored="true"
> multiValued="true"/>
> 
> 
> 
> 
>  

CopyField just copies raw text, i mean not analyzed. Do you have a   in your text fieldType definition?





RE: Strange behavior for certain words

2010-05-13 Thread Ahmet Arslan
Hi,
       Thanks for your response. Attached are the Schema.xml and sample docs 
that were indexed. The query and response are as below. The attachment 
Prodsku4270257.xml has a field "paymenttype" whose value is 'prepaid'.

query:
q=prepaid&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=json&debugQuery=on&explainOther=&hl=on

But you are populating your text field from deviceType, features, description 
and color. paymentType is not copied into text. So this behavior is normal.
Either add this copy field declaration 
   
Or query directly this field: q=paymentType:prepaid



  

Re: synonyms not working with copyfield

2010-05-13 Thread Gary
Hi Surajit
I aint sure if this is any help, but I had a similar problem but with stop 
words, they were not working with dismax queries. Well to cut a long story it 
seems that all the querying fields need to be configured with stopwords.

Maybe this has the similar affect with Synonyms confguration, thus your 
copyField should be defined as a type that is configured with the 
SynonymFilterFactory, just like 
"person_name".

You can find some guidance here:

http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/

Gary





Re: Too many clauses in lucene query

2010-05-13 Thread Ahmet Arslan
> I am forming a query to boost a certain ids, the list of
> ids can go till
> 2000 too. I am sometimes getting the error for too many
> clauses in the
> boolean query and otherwise i am getting a null page. Can
> you suggest any
> config changes regarding this.
> I am using solr 1.3.


For too many clauses there is  1024 in 
solrconfig.xml. 

For null page 
http://wiki.apache.org/solr/SolrTomcat#Enabling_Longer_Query_Requests may help.