Index Autocad

2011-02-18 Thread lucene lucene
Hi team,

Is there a way lucene can index AutoCAD files – “*.dwg” files?

If so, please let me know.

Can you please provide some insight on the same?



Thanks in advance..



Regards

Vignesh


Indexing AutoCAD files

2011-02-18 Thread Vignesh Raj
Hi team,

Is there a way lucene can index AutoCAD files - "*.dwg" files?

If so, please let me know. 

Can you please provide some insight on the same?

 

Thanks in advance..

 

Regards

Vignesh



Remove part of keywords from existing index and merging new index

2011-02-18 Thread Ryan Chan
Hello,

I am not sure if it is possible.

1. I have a document of 100MB, I want to remove keywords started with
a specific pattern, e.g. abc*, so all keywords started with abc* in
the index will be removed, and I don't need to reindex the document
again.

2. I have another document of 100KB, I want to append the new document
to an existing one, without the new to reindex the existing document
again.


I believe (2) is possible, but not sure about (1).

Thanks.


adding a TimerTask

2011-02-18 Thread Tri Nguyen
Hi,

How can I add a TimerTask to Solr?

Tri

Re: DIH threads

2011-02-18 Thread Bill Bell
I used it on 4,0 and it did not help us. We were bound on SQL io

Bill Bell
Sent from mobile


On Feb 18, 2011, at 4:47 PM, Mark  wrote:

> Has anyone applied the DIH threads patch on 1.4.1 
> (https://issues.apache.org/jira/browse/SOLR-1352)?
> 
> Does anyone know if this works and/or does it improve performance?
> 
> Thanks
> 
> 


Re: Best way for a query-expander?

2011-02-18 Thread Chris Hostetter

: I want to implement a query-expander, one that enriches the input by the 
: usage of extra parameters that, for example, a form may provide.
: 
: Is the right way to subclass SearchHandler?
: Or rather to subclass QueryComponent?

This smells like the poster child for an X/Y problem 
(or maybe an "X/(Y OR Z)" problem)...

if you can elaborate a bit more on the type of enrichment you want to do, 
it's highly likely that your goal can be met w/o needing to write a custom 
plugin (i'm thinking particularly of the multitudes of parsers solr 
already has, local params, and variable substitution)

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss


Re: Dih sproc call

2011-02-18 Thread Chris Hostetter

: References: 
: In-Reply-To: 
: Subject: Dih sproc call

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss


Re: Help migrating from Lucene

2011-02-18 Thread Chris Hostetter
: to our indexing service are defined in a central interface.   Here is an
: example of a query executed from a programmatically constructed Lucene
: query.
...
: solrQuery.setQuery(query.toString());

first of all, be advised that Query.toString() is not garunteed to produce 
a string that the Lucene QueryParser can parse back into a real query.  If 
you are programaticly building up a Lucene query just to format it back as 
a string, you should probably consider just programaticly building up the 
Solr query string.

Second: you should also consider the fact that there may be better ways to 
express your query to solr that are more efficient, or do what you want 
more then what you had before (ie: some of those MUST clauses you had 
probably are ment to act as "filters", which don't need to influence the 
scores, and are most likely reused on many queries -- in which case 
specifying them using "fq" instead of "q" is going to make things 
simpler/faster and give you better relevancy scores on your real user 
input.


: How can I set the sort into the java client?

Did you look at the "SolrQuery.addSortField" method?

: Also, with the annotations of Pojo's outlined here.
...
: How are sets handled?  For instance, how are Lists of other POJO's added to
: the document?

i had no idea, but a google serach for "solrj annotation beans" lead me...
http://lucene.472066.n3.nabble.com/Does-SolrJ-support-nested-annotated-beans-td868375.html
  ...and then to...
https://issues.apache.org/jira/browse/SOLR-1945


-Hoss


Re: solr current workding directory or reading config files

2011-02-18 Thread Chris Hostetter

: I have a class (in a jar) that reads from properties (text) files.  I have 
these 
: files in the same jar file as the class.
: 
: However, when my class reads those properties files, those files cannot be 
found 
: since solr reads from tomcat's bin directory.

Can you elaborate a bit more on what these Jars are?  ... are these Solr 
Plugins you've writen (ie: that know about the internal Solr APIs?) ? ... 
how does your jar realted to solr?  are you building your own solr.war 
containing those jars, or are you loading it using a solr plugin "lib" 
directory? ... what do you mean by "my class reads those properties files" 
? ... what code are you using to "read" them?  what log/error messages are 
you getting?

: I don't really want to put the config files in tomcat's bin directory.

in an ideal world, solr would never use the current working directory, and 
would only ever pay attention to the Solr Home dir and paths things 
specificly mentioned by config directives -- but the world is not ideal, 
and solr definitely has some historic behavior that does utilize the CWD.  
But if you are using Solr's ResourceLoader API in your plugin, it should 
actively try to find your resource in a multitude of places (if it's not 
an absolute path)

need more specifics to understand exactly what is going wrong for you 
though.

-Hoss

Removing duplicates

2011-02-18 Thread Mark
I know that I can use the SignatureUpdateProcessorFactory to remove 
duplicates but I would like the duplicates in the index but remove them 
conditionally at query time.


Is there any easy way I could accomplish this?


DIH threads

2011-02-18 Thread Mark
Has anyone applied the DIH threads patch on 1.4.1 
(https://issues.apache.org/jira/browse/SOLR-1352)?


Does anyone know if this works and/or does it improve performance?

Thanks




Re: Index Design Question

2011-02-18 Thread Andreas Kemkes
Thank you.  These are good general suggestion.

Regarding the optimization for indexing vs. querying: are there any specific 
recommendations for each of those cases available somewhere.  A link, for 
example, would be fabulous.

I'm also still curious about solutions that go further.

For example, there is a 2007 Lucene Overview presentation by Aaron Bannert 
claiming that "Lucene provides built-in methods to allow queries to span 
multiple remote Lucene indexes."  and "A much more involved way to achieving 
high levels of update performance can be had by dividing the data into separate 
“columns”, or “silos”. Each column will hold a subset of the overall data, and 
will only receive updates for data that it controls.  By taking advantage 
of the remote index merging query utility mentioned on an earlier slide, 
the data can still be searched in its entirety without any loss of accuracy and 
with negligible performance impact."

Is this possible using Solr?  How could this be accomplished?  Again, any link 
would be fabulous.

The wiki page http://wiki.apache.org/solr/MergingSolrIndexes seems to describe 
a 
somewhat different approach to merging.

Is this something that could be integrated into master/slave replication by 
having two masters and one merged slave (in the above sense of separate 
“columns”, or “silos”)?

If yes, what are the performance considerations when using it?


  

Re: solr.KeepWordsFilterFactory confusion

2011-02-18 Thread Ahmet Arslan


--- On Fri, 2/18/11, Robert Haschart  wrote:

> From: Robert Haschart 
> Subject: Re: solr.KeepWordsFilterFactory confusion
> To: solr-user@lucene.apache.org
> Date: Friday, February 18, 2011, 10:19 PM
> Thanks for your response.  After
> making that change it seemed at first like it made no
> difference, after restarting the jetty server, and
> reindexing the test object, the display still shows:
> 
> 
>   Video
>   Streaming Video
>   Online
>   Gooberhead
>   Book of the Month
> 
> 
> But it turns out that I had been making an incorrect
> assumption.  I was looking at the retruned stored
> values for the solr document, and seeing the "Gooberhead"
> entry listed, and thinking that the analyzer wasn't
> running.  However as I have subsequently figured out,
> the analyzers are not run on the data that is to be stored,
> only on the data that is to being indexed. 
> So after making your change to that field type statement,
> if I search
> for   format_facet:Gooberhead   I
> get results = 0 which is what I'd expect.  But seeing
> that the unexpected values are still stored with the solr
> document, it seems that I will have to take a different
> approach.

Facets are populated from indexed values. However deleted documents (and their 
terms) are not really deleted until an optimize. Issuing an optimize may help 
in your case.





XML Stripping from DIH

2011-02-18 Thread Olson, Ron
Hi all-

I have some XML in a database that I am trying to index and store; I am 
interested in the various pieces of text, but none of the tags. I've been 
trying to figure out a way to strip all the tags out, but haven't found 
anything within Solr to do so; the XML parser seems to want XPath to get the 
various element values, when all I want is to turn the whole thing into one 
blob of text, regardless of whether it makes any "contextual" sense.

Is there something in Solr to do this, or is it something I'd have to write 
myself (which I'm willing to do if necessary)?

Thanks for any info,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: Best way for a query-expander?

2011-02-18 Thread Paul Libbrecht
it does work!

Le 18 févr. 2011 à 20:48, Paul Libbrecht a écrit :

> using rb.req.getParams().get("blip") inside prepare(ResponseBuilder)'s 
> subclass of QueryComponent I could easily get the extra http request param.
> 
> However, how would I change the query?
> using rb.setQuery(xxx) within that same prepare method seems to have no 
> effect.

Sorry for the noise, it does have the exact desired effect.

Nice pattern.
I believe everyone needs query expansion except maybe if using Dismax.

paul

> 
> Le 18 févr. 2011 à 19:51, Tommaso Teofili a écrit :
> 
>> Hi Paul,
>> me and a colleague worked on a QParserPlugin to "expand" alias field names
>> to many existing field names
>> ex: q=mockfield:val ==> q=actualfield1:val OR actualfield2:val
>> but if you want to be able to use other params that come from the HTTP
>> request you should use a custom RequestHandler I think,
>> My 2 cents,
>> Tommaso
> 



Re: solr.KeepWordsFilterFactory confusion

2011-02-18 Thread Robert Haschart
Thanks for your response.  After making that change it seemed at first 
like it made no difference, after restarting the jetty server, and 
reindexing the test object, the display still shows:



  Video
  Streaming Video
  Online
  Gooberhead
  Book of the Month


But it turns out that I had been making an incorrect assumption.  I was 
looking at the retruned stored values for the solr document, and seeing 
the "Gooberhead" entry listed, and thinking that the analyzer wasn't 
running.  However as I have subsequently figured out, the analyzers are 
not run on the data that is to be stored, only on the data that is to 
being indexed. 

So after making your change to that field type statement, if I search 
for   format_facet:Gooberhead   I get results = 0 which is what I'd 
expect.  But seeing that the unexpected values are still stored with the 
solr document, it seems that I will have to take a different approach.


Thanks again.

-Bob Haschart

Ahmet Arslan wrote:


I've added a new field type in schema.xml:

 

 
   

  

   



class="solr.StrField" should be class="solr.TextField"


 
 





Re: Best way for a query-expander?

2011-02-18 Thread Paul Libbrecht

using rb.req.getParams().get("blip") inside prepare(ResponseBuilder)'s subclass 
of QueryComponent I could easily get the extra http request param.

However, how would I change the query?
using rb.setQuery(xxx) within that same prepare method seems to have no effect.


paul

Le 18 févr. 2011 à 19:51, Tommaso Teofili a écrit :

> Hi Paul,
> me and a colleague worked on a QParserPlugin to "expand" alias field names
> to many existing field names
> ex: q=mockfield:val ==> q=actualfield1:val OR actualfield2:val
> but if you want to be able to use other params that come from the HTTP
> request you should use a custom RequestHandler I think,
> My 2 cents,
> Tommaso
> 
> 
> 2011/2/18 Em 
> 
>> 
>> Hi Paul,
>> 
>> what do you understand by saying "extra parameters"?
>> 
>> Regards
>> 
>> 
>> Paul Libbrecht-4 wrote:
>>> 
>>> 
>>> Hello Solr-friends,
>>> 
>>> I want to implement a query-expander, one that enriches the input by the
>>> usage of extra parameters that, for example, a form may provide.
>>> 
>>> Is the right way to subclass SearchHandler?
>>> Or rather to subclass QueryComponent?
>>> 
>>> thanks in advance
>>> 
>>> paul
>>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Best-way-for-a-query-expander-tp2528194p2528736.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 



Re: Dih sproc does not work

2011-02-18 Thread Bill Bell
When I use 'call sprocname' it does call the process, but I am not getting the 
select into Solr.

It shows 0 docs added. I am only returning 1 rs.

Bill Bell
Sent from mobile


On Feb 18, 2011, at 11:49 AM, Bill Bell  wrote:

> I an trying to call a stored procedure using query= in DIH. I tried exec 
> name, call name, and name and none works.
> 
> This is SQL server 2008.
> 
> Bill Bell
> Sent from mobile
> 


Understanding multi-field queries with q and fq

2011-02-18 Thread mrw


After searching this list, Google, and looking through the Pugh book, I am a
little confused about the right way to structure a query.

The Packt book uses the example of the MusicBrainz DB full of song metadata. 
What if they also had the song lyrics in English and German as files on
disk, and wanted to index them along with the metadata, so that each
document would basically have song title, artist, publisher, date, ...,
All_Metadata (copy field of all metadata fields), Text_English, and
Text_German fields?  

There can only be one default field, correct?  So if we want to search for
all songs containing (zeppelin AND (dog OR merle)) do we 

repeat the entire query text for all three major fields in the 'q' clause
(assuming we don't want to use the cache):

q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND (dog
OR merle)+Text_German:(zeppelin AND (dog OR merle))

or repeat the entire query text for all three major fields in the 'fq'
clause (assuming we want to use the cache):

q=*:*&fq=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin
AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle))

?

Thanks!


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2528866.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way for a query-expander?

2011-02-18 Thread Tommaso Teofili
Hi Paul,
me and a colleague worked on a QParserPlugin to "expand" alias field names
to many existing field names
ex: q=mockfield:val ==> q=actualfield1:val OR actualfield2:val
but if you want to be able to use other params that come from the HTTP
request you should use a custom RequestHandler I think,
My 2 cents,
Tommaso


2011/2/18 Em 

>
> Hi Paul,
>
> what do you understand by saying "extra parameters"?
>
> Regards
>
>
> Paul Libbrecht-4 wrote:
> >
> >
> > Hello Solr-friends,
> >
> > I want to implement a query-expander, one that enriches the input by the
> > usage of extra parameters that, for example, a form may provide.
> >
> > Is the right way to subclass SearchHandler?
> > Or rather to subclass QueryComponent?
> >
> > thanks in advance
> >
> > paul
> >
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Best-way-for-a-query-expander-tp2528194p2528736.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Best way for a query-expander?

2011-02-18 Thread Paul Libbrecht
Erm... extra web-request-parameters simply.

paul


Le 18 févr. 2011 à 19:37, Em a écrit :

> 
> Hi Paul,
> 
> what do you understand by saying "extra parameters"?
> 
> Regards
> 
> 
> Paul Libbrecht-4 wrote:
>> 
>> 
>> Hello Solr-friends,
>> 
>> I want to implement a query-expander, one that enriches the input by the
>> usage of extra parameters that, for example, a form may provide.
>> 
>> Is the right way to subclass SearchHandler?
>> Or rather to subclass QueryComponent?
>> 
>> thanks in advance



Dih sproc does not work

2011-02-18 Thread Bill Bell
I an trying to call a stored procedure using query= in DIH. I tried exec name, 
call name, and name and none works.

This is SQL server 2008.

Bill Bell
Sent from mobile



Re: Best way for a query-expander?

2011-02-18 Thread Em

Hi Paul,

what do you understand by saying "extra parameters"?

Regards


Paul Libbrecht-4 wrote:
> 
> 
> Hello Solr-friends,
> 
> I want to implement a query-expander, one that enriches the input by the
> usage of extra parameters that, for example, a form may provide.
> 
> Is the right way to subclass SearchHandler?
> Or rather to subclass QueryComponent?
> 
> thanks in advance
> 
> paul
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-for-a-query-expander-tp2528194p2528736.html
Sent from the Solr - User mailing list archive at Nabble.com.


Dih sproc call

2011-02-18 Thread Bill Bell
I an trying to call a stored procedure using query= in DIH. I tried exec name, 
call name, and name and none works.

This is SQL server 2008.

Bill Bell
Sent from mobile


On Feb 18, 2011, at 10:27 AM, Paul Libbrecht  wrote:

> 
> Hello Solr-friends,
> 
> I want to implement a query-expander, one that enriches the input by the 
> usage of extra parameters that, for example, a form may provide.
> 
> Is the right way to subclass SearchHandler?
> Or rather to subclass QueryComponent?
> 
> thanks in advance
> 
> paul


Best way for a query-expander?

2011-02-18 Thread Paul Libbrecht

Hello Solr-friends,

I want to implement a query-expander, one that enriches the input by the usage 
of extra parameters that, for example, a form may provide.

Is the right way to subclass SearchHandler?
Or rather to subclass QueryComponent?

thanks in advance

paul

Re: Validate Query Syntax of Solr Request Before Sending

2011-02-18 Thread csj

Hi,

FYI, I found out. I'm using the SolrQueryParser (tadaa...)

It needs the solrconfig.xml and the solr.xml files in other to validate the
query.

Then I'm able to validate any query before sending it to the Solrserver,
thereby preventing unnecessary requests.

/Christian
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Validate-Query-Syntax-of-Solr-Request-Before-Sending-tp2515797p2528183.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: GET or POST for large queries?

2011-02-18 Thread Jan Høydahl
OK.

I would ask on the mailing list of ManifoldCF to see if they have some 
experience with OLS.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 18. feb. 2011, at 17.29, mrw wrote:

> 
> Thanks for the tip.  No, I did not know about that.  Unfortunately, we use
> Oracle OLS which does not appear to be supported.
> 
> 
> Jan Høydahl / Cominvent wrote:
>> 
>> Hi,
>> 
>> There are better ways to combat row level security in search than sending
>> huge lists of users over the wire.
>> 
>> Have you checked out the ManifoldCF project with which you can integrate
>> security to Solr? http://incubator.apache.org/connectors/
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 
>> 
> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2527765.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: GET or POST for large queries?

2011-02-18 Thread mrw

Thanks for the tip.  No, I did not know about that.  Unfortunately, we use
Oracle OLS which does not appear to be supported.


Jan Høydahl / Cominvent wrote:
> 
> Hi,
> 
> There are better ways to combat row level security in search than sending
> huge lists of users over the wire.
> 
> Have you checked out the ManifoldCF project with which you can integrate
> security to Solr? http://incubator.apache.org/connectors/
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2527765.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: GET or POST for large queries?

2011-02-18 Thread Jan Høydahl
Hi,

There are better ways to combat row level security in search than sending huge 
lists of users over the wire.

Have you checked out the ManifoldCF project with which you can integrate 
security to Solr? http://incubator.apache.org/connectors/

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 18. feb. 2011, at 15.30, mrw wrote:

> 
> Thanks for the response.
> 
> Yes, the queries are fairly large.  Basically, the corporate security policy
> dictates that we use row-level security attributes from the DB for access
> control to Solr.   So,  we bake row-level security attributes from the
> database into the index, and then, at query time, ask for those same
> attributes from the DB and pass them as part of the Solr query.  So, imagine
> a bank VP with access to tens of thousands of customer records and
> transactions, and all those access attributes get sent to Solr.  The system
> works well for the low-level account managers and low-entitlement users, but
> cannot scale for the high-level folks.
> 
> POSTing the data appears to avoid the header threshold issue, but it breaks
> because of the "too many boolean clauses" error.
> 
> 
> 
> 
> gearond wrote:
>> 
>> Probably you could do it, and solving a problem in business supersedes 
>> 'rightness' concerns, much to the dismay of geeks and 'those who like
>> rightness 
>> and say the word "Neemph!" '. 
>> 
>> 
>> the not rightness about this is that:
>> POST, PUT, DELETE are assumed to make changes to the URL's backend.
>> GET is assumed NOT to make changes.
>> 
>> So if your POST does not make a change . . . it breaks convention. But if
>> it 
>> solves the problem . . . :-)
>> 
>> Another way would be to GET with a 'query file' location, and then have
>> the 
>> server fetch that query and execute it.
>> 
>> Boy!!! I'd love to see one of your queries!!! You must have a few ANDs/ORs
>> in 
>> them :-)
>> 
>> Dennis Gearon
>> 
> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2526934.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: My Plan to Scale Solr

2011-02-18 Thread Walter Underwood
He misspelled it as "LSA". The original post says "'m not sure if it will work 
out in a real production environment, which has a tight SLA pending." Clearly a 
Service Level Agreement, not Latent Semantic Analysis.

Since we're working on search engines, let's all try to figure stuff out for 
ourselves at least once, before we interrupt a few hundred people with 
questions.

wunder

On Feb 17, 2011, at 11:47 PM, Lance Norskog wrote:

> Or even better, search with 'LSA'.
> 
> On Thu, Feb 17, 2011 at 9:22 AM, Walter Underwood  
> wrote:
>> http://lmgtfy.com/?q=SLA
>> 
>> wunder
>> 
>> On Feb 17, 2011, at 11:04 AM, Dennis Gearon wrote:
>> 
>>> What's an 'LSA'
>>> 
>>> Dennis Gearon
>>> 
>>> 
>>> Signature Warning
>>> 
>>> It is always a good idea to learn from your own mistakes. It is usually a 
>>> better
>>> idea to learn from others’ mistakes, so you do not have to make them 
>>> yourself.
>>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>> 
>>> 
>>> EARTH has a Right To Life,
>>> otherwise we all die.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> From: Stijn Vanhoorelbeke 
>>> To: solr-user@lucene.apache.org; bing...@asu.edu
>>> Sent: Thu, February 17, 2011 4:28:13 AM
>>> Subject: Re: My Plan to Scale Solr
>>> 
>>> Hi,
>>> 
>>> I'm currently looking at SolrCloud. I've managed to set up a scalable
>>> cluster with ZooKeeper.
>>> ( see the examples in http://wiki.apache.org/solr/SolrCloud for a quick
>>> understanding )
>>> This way, all different shards / replicas are stored in a centralised
>>> configuration.
>>> 
>>> Moreover the ZooKeeper contains out-of-the-box loadbalancing.
>>> So, lets say - you have 2 different shards and each is replicated 2 times.
>>> Your zookeeper config will look like this:
>>> 
>>> \config
>>> ...
>>>   /live_nodes (v=6 children=4)
>>>  lP_Port:7500_solr (ephemeral v=0)
>>>  lP_Port:7574_solr (ephemeral v=0)
>>>  lP_Port:8900_solr (ephemeral v=0)
>>>  lP_Port:8983_solr (ephemeral v=0)
>>> /collections (v=20 children=1)
>>>  collection1 (v=0 children=1) "configName=myconf"
>>>   shards (v=0 children=2)
>>>shard1 (v=0 children=3)
>>> lP_Port:8983_solr_ (v=4)
>>> "node_name=lP_Port:8983_solr url=http://lP_Port:8983/solr/";
>>> lP_Port:7574_solr_ (v=1)
>>> "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/";
>>> lP_Port:8900_solr_ (v=1)
>>> "node_name=lP_Port:8900_solr url=http://lP_Port:8900/solr/";
>>>shard2 (v=0 children=2)
>>> lP_Port:7500_solr_ (v=0)
>>> "node_name=lP_Port:7500_solr url=http://lP_Port:7500/solr/";
>>> lP_Port:7574_solr_ (v=1)
>>> "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/";
>>> 
>>> --> This setup can be realised, by 1 ZooKeeper module - the other solr
>>> machines need just to know the IP_Port were the zookeeper is active & that's
>>> it.
>>> --> So no configuration / installing is needed to realise quick a scalable /
>>> load balanced cluster.
>>> 
>>> Disclaimer:
>>> ZooKeeper is a relative new feature - I'm not sure if it will work out in a
>>> real production environment, which has a tight SLA pending.
>>> But - definitely keep your eyes on this stuff - this will mature quickly!
>>> 
>>> Stijn Vanhoorelbeke
>> 






Re: [solrCloud] Distributed IDF - scoring in the cloud

2011-02-18 Thread Yonik Seeley
On Fri, Feb 18, 2011 at 7:07 AM, Thorsten Scherler  wrote:
> Is there a general interest to bring 1632 to the trunk (especially for
> solrCloud)?

Definitely - distributed idf is needed (as an option).

-Yonik
http://lucidimagination.com


Re: Solr multi cores or not

2011-02-18 Thread Marc SCHNEIDER
Multi-core was first added in 1.3 version and matured in 1.4. And as far as
I understand the Solr team encourages the use of multi-core.

Marc.

On Fri, Feb 18, 2011 at 3:04 PM, Thumuluri, Sai <
sai.thumul...@verizonwireless.com> wrote:

> Thank you, I will go the multi-core route and see how that works out. I
> guess, if we have to run queries across the cores, I may have to just
> run separate queries.
>
> -Original Message-
> From: Marc SCHNEIDER [mailto:marc.schneide...@gmail.com]
> Sent: Friday, February 18, 2011 8:01 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr multi cores or not
>
> Hi,
>
> It depends on what kind of data you are indexing between your multiple
> applications.
> If app1 has many fields to be indexed and app2 too and if theses fields
> are
> differents then it would probably be better to have multi cores.
> If you have a lot of common fields between app1 and app2 then one index
> is
> probably the best choice as it will avoid you configuring / implementing
> several indexes. In this case you can also have a differentiating field
> (like 'type') so that you can get data corresponding to your app.
> It really depends on your data structure.
>
> Hope this helps,
> Marc.
>
> On Wed, Feb 16, 2011 at 9:45 PM, Thumuluri, Sai <
> sai.thumul...@verizonwireless.com> wrote:
>
> > Hi,
> >
> > I have a need to index multiple applications using Solr, I also have
> the
> > need to share indexes or run a search query across these application
> > indexes. Is solr multi-core - the way to go?  My server config is
> > 2virtual CPUs @ 1.8 GHz and has about 32GB of memory. What is the
> > recommendation?
> >
> > Thanks,
> > Sai Thumuluri
> >
> >
> >
>


Re: GET or POST for large queries?

2011-02-18 Thread Markus Jelsma
Increase the setting in solrconfig

On Friday 18 February 2011 15:30:11 mrw wrote:
> Thanks for the response.
> 
> POSTing the data appears to avoid the header threshold issue, but it breaks
> because of the "too many boolean clauses" error.
> 
> gearond wrote:
> > Probably you could do it, and solving a problem in business supersedes
> > 'rightness' concerns, much to the dismay of geeks and 'those who like
> > rightness
> > and say the word "Neemph!" '.
> > 
> > 
> > the not rightness about this is that:
> > POST, PUT, DELETE are assumed to make changes to the URL's backend.
> > GET is assumed NOT to make changes.
> > 
> > So if your POST does not make a change . . . it breaks convention. But if
> > it
> > solves the problem . . . :-)
> > 
> > Another way would be to GET with a 'query file' location, and then have
> > the
> > server fetch that query and execute it.
> > 
> > Boy!!! I'd love to see one of your queries!!! You must have a few
> > ANDs/ORs in
> > them :-)
> > 
> >  Dennis Gearon

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: GET or POST for large queries?

2011-02-18 Thread mrw

Thanks for the response and info.

I'll try that.  


Jonathan Rochkind wrote:
> 
> Yes, I think it's 1024 by default.  I think you can raise it in your 
> config. But your performance may suffer.
> 
> Best would be to try and find a better way to do what you want without 
> using thousands of clauses. This might require some custom Java plugins 
> to Solr though.
> 
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2526950.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: GET or POST for large queries?

2011-02-18 Thread mrw

Thanks for the response.

Yes, the queries are fairly large.  Basically, the corporate security policy
dictates that we use row-level security attributes from the DB for access
control to Solr.   So,  we bake row-level security attributes from the
database into the index, and then, at query time, ask for those same
attributes from the DB and pass them as part of the Solr query.  So, imagine
a bank VP with access to tens of thousands of customer records and
transactions, and all those access attributes get sent to Solr.  The system
works well for the low-level account managers and low-entitlement users, but
cannot scale for the high-level folks.

POSTing the data appears to avoid the header threshold issue, but it breaks
because of the "too many boolean clauses" error.




gearond wrote:
> 
> Probably you could do it, and solving a problem in business supersedes 
> 'rightness' concerns, much to the dismay of geeks and 'those who like
> rightness 
> and say the word "Neemph!" '. 
> 
> 
> the not rightness about this is that:
> POST, PUT, DELETE are assumed to make changes to the URL's backend.
> GET is assumed NOT to make changes.
> 
> So if your POST does not make a change . . . it breaks convention. But if
> it 
> solves the problem . . . :-)
> 
> Another way would be to GET with a 'query file' location, and then have
> the 
> server fetch that query and execute it.
> 
> Boy!!! I'd love to see one of your queries!!! You must have a few ANDs/ORs
> in 
> them :-)
> 
>  Dennis Gearon
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2526934.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr multi cores or not

2011-02-18 Thread Thumuluri, Sai
Thank you, I will go the multi-core route and see how that works out. I
guess, if we have to run queries across the cores, I may have to just
run separate queries. 

-Original Message-
From: Marc SCHNEIDER [mailto:marc.schneide...@gmail.com] 
Sent: Friday, February 18, 2011 8:01 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr multi cores or not

Hi,

It depends on what kind of data you are indexing between your multiple
applications.
If app1 has many fields to be indexed and app2 too and if theses fields
are
differents then it would probably be better to have multi cores.
If you have a lot of common fields between app1 and app2 then one index
is
probably the best choice as it will avoid you configuring / implementing
several indexes. In this case you can also have a differentiating field
(like 'type') so that you can get data corresponding to your app.
It really depends on your data structure.

Hope this helps,
Marc.

On Wed, Feb 16, 2011 at 9:45 PM, Thumuluri, Sai <
sai.thumul...@verizonwireless.com> wrote:

> Hi,
>
> I have a need to index multiple applications using Solr, I also have
the
> need to share indexes or run a search query across these application
> indexes. Is solr multi-core - the way to go?  My server config is
> 2virtual CPUs @ 1.8 GHz and has about 32GB of memory. What is the
> recommendation?
>
> Thanks,
> Sai Thumuluri
>
>
>


Re: Solr multi cores or not

2011-02-18 Thread Marc SCHNEIDER
Hi,

It depends on what kind of data you are indexing between your multiple
applications.
If app1 has many fields to be indexed and app2 too and if theses fields are
differents then it would probably be better to have multi cores.
If you have a lot of common fields between app1 and app2 then one index is
probably the best choice as it will avoid you configuring / implementing
several indexes. In this case you can also have a differentiating field
(like 'type') so that you can get data corresponding to your app.
It really depends on your data structure.

Hope this helps,
Marc.

On Wed, Feb 16, 2011 at 9:45 PM, Thumuluri, Sai <
sai.thumul...@verizonwireless.com> wrote:

> Hi,
>
> I have a need to index multiple applications using Solr, I also have the
> need to share indexes or run a search query across these application
> indexes. Is solr multi-core - the way to go?  My server config is
> 2virtual CPUs @ 1.8 GHz and has about 32GB of memory. What is the
> recommendation?
>
> Thanks,
> Sai Thumuluri
>
>
>


string field_type query

2011-02-18 Thread Isha Garg

i had declare a field_name=category ,field_type=string
now i am querying category:Crime but it did nt show any results .But 
when i query for *:* it shows values related to this category

can anyone tell me the problem?


[solrCloud] Distributed IDF - scoring in the cloud

2011-02-18 Thread Thorsten Scherler
Hi all,

doing the solrCloud examples and one thing I am not clear about is the
scoring in a distributed search.

I did a small test where I used the "Example A: Simple two shard
cluster" from wiki:SolrCloud and additional added 

java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar
ipod_other.xml

java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar
monitor2.xml

Now requesting
http://localhost:8983/solr/collection1/select?distrib=true&q=electronics&fl=score&shards=localhost:8983/solr,localhost:7574/solr
for both host will return the same result. Here we get the score for
each hit based on the shard specific score and merge them into one
result doc.

However when I add monitor2.xml as well to 7574 which previously did not
contained this, the scoring changes depending on the server I request.

The score returned for 8983 is always 0.09289607 being distrib=true|false

The score returned for 7574 is always 0.121383816 being distrib=true|false

So is it correct to assume that if a document is indexed in both shards
the score which will predominate is the one from the host which has been
requested?

My client plan to distribute the current index into different shards.
For example each "Consejería" (counseling) should be hosted in a shard.
The critical point for the client is that the scoring is the same as in
the big unique index they use right now for a distributed search.

As I understand the current solrCloud implementation there is no concern
about harmonizing the score.

In my research I came across
http://markmail.org/message/bhhfwymz5y7lvoj7
"The "IDF" part of the relevancy score is the only place that
distributed search scoring won't "match up" with no distributed
scoring because the document frequency used for the term is local to
every core instead of global.  If you distribute your documents fairly
randomly to the different shards, this won't matter.

There is a patch in the works to add global idf, but I think that even
when it's committed, it will default to off because of the higher cost
associated with it." the patch is
https://issues.apache.org/jira/browse/SOLR-1632

However last comment is from 26/Jul/10 reporting the patch failed and a
comment from Yonik give the impression that is not ready to use:

"It looks like the issue is this: rewrite() doesn't work for function
queries (there is no propagation mechanism to go through value sources).
This is a problem when real queries are embedded in function queries."

Is there a general interest to bring 1632 to the trunk (especially for
solrCloud)? 

Or may it be better to look into something that aims to scale the index
into hbase so he does not lose the scoring.

TIA for your feedback
-- 
Thorsten Scherler 
codeBusters S.L. - web based systems

http://www.codebusters.es/



smime.p7s
Description: S/MIME cryptographic signature


Re: SolrCloud new....

2011-02-18 Thread Stijn Vanhoorelbeke

Hi,

I'm busy doing the exact same thing.
I figured things out - all  by myself - the wiki page is a nice 'fist view',
but doesn't goes in dept...

Lets go ahead:
1)Should i copy the libraries from cloud to trunk???
2)should i keep the cloud module in every system???

A: Yes, you should.
You should get yourself the latest dev trunk and compile it.

The steps I followed:
+ grap latest trunk & build solr
+ backup all solr config files
+ in dir tomcat6/webapps/ remove the dir 'solr'
+ copy the new solr.war ( which you build in first step ) to tomcat6/webapps
+ On your Solr_home/conf dir solrconfig.xml need to be replaced by a new one
( you take from example dir of your build) -- some other config files ( like
schema.xml ) you may keep using the old ones.
+ Adapt the new files to represent the old configuration
+ restart tomcat and it will install new version of solr

It seems the index isn't compatible - so you need to flush your whole index
and re-index all data.
And finally you have your solr system back with zookeeper integrated in
/admin zone :)


3) I am not using any cores in the solr. It is a single solr in every
system.can solrcloud support it??

A: Actually you are using one cor - so gives no problem.
But be sure to check you have solr.xml file in your solr_home dir.
This file just mentions all cores - in your case just one core;
( you can find examples of layout of this file easily on
http://wiki.apache.org/solr/CoreAdmin )

4) the example is given in jetty.Is it the same way to make it in tomcat???

A: Right now - it is the same way.
You have to edit your /etc/init.d/tomcat6 startup script. In the start)
section you can specify all the JAVA_OPTS ( the ones the solrcloud wiki
mentions).

Be sure to set following one:
export JAVA_OPTS="$JAVA_OPTS -DhostPort=8080" ( if tomcat runs on port 8080
)

At first I didn't -->  my zookeeper pointed to standard 8983 port, which
gave errors.


In the above I gave you a quick peak how to get the SolrCloud feature.
In above the Zookeeper is embedded in one of your solr machines. If you
don't want this you may place zookeeper on a different machine ( like I'm
doing right now).

If you need more help - you can contact me.
Stijn Vanhoorelbeke,


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-new-tp1528872p2526080.html
Sent from the Solr - User mailing list archive at Nabble.com.