Re: nutch and solr

2012-02-21 Thread tamanjit.bin...@yahoo.co.in
Try this command.

 bin/nutch crawl urls//.txt -dir crawl/
-threads 10 -depth 2 -topN 1000

Your folder structure will look like this:

-- urls -- -- .txt
|
|
 -- crawl -- 

The folder name will be for different domains. So for each domain folder in
urls folder there has to be a corresponding folder (with the same name) in
the crawl folder.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/nutch-and-solr-tp3765166p3765607.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Chris Hostetter

: But i don't know if it's possible to merge this "autocreated" facet with a
: facet already predefined ? i tried to used  (adding this to my
: code in my previous post) : 
: **

copyField applies to the raw input of those fields -- so the special logic 
you have in the analyzer for your text_tag_facet won't be applied yet when 
it's copied to your predefined_facet field (copyField happens first)

: It's maybe because (As I understood) the real (stored) value of this dynamic
: facet is still the initial fulltext  ?? (or maybe i'm wrong ...)

stored values are differnet from indexed values -- but stored values are 
also not ever a factor in dealing with faceting, the stored value is just 
what is returned when you get results back (ie: the "doc list") ... your 
problem has nothing to do with stored values.


-Hoss


Re: Fast Vector Highlighter Working for some records only

2012-02-21 Thread Koji Sekiguchi

(12/02/22 11:58), dhaivat wrote:

Thanks for reply,

But can you please tell me why it's working for some documents and not for
other.


As Solr 1.4.1 cannot recognize hl.useFastVectorHighlighter flag, Solr just
ignore it, but due to hl=true is there, Solr tries to create highlight snippets
by using (existing; traditional; I mean not FVH) Highlighter.
Highlighter (including FVH) cannot produce snippets sometime for some reasons,
you can use hl.alternateField parameter.

http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/


Re: Fast Vector Highlighter Working for some records only

2012-02-21 Thread dhaivat

Koji Sekiguchi wrote
> 
> (12/02/21 21:22), dhaivat wrote:
>> Hi  Koji,
>>
>> Thanks for quick reply, i am using solr 1.4.1
>>
> 
> Uh, you cannot use FVH on Solr 1.4.1. FVH is available Solr 3.1 or later.
> So your hl.useFastVectorHighlighter=true flag is ignored.
> 
> koji
> -- 
> Query Log Visualizer for Apache Solr
> http://soleami.com/
> 

Thanks for reply,

But can you please tell me why it's working for some documents and not for
other.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3765458.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help with MMapDirectoryFactory in 3.5

2012-02-21 Thread Chris Hostetter

: How do I see the setting in the log or in stats.jsp ? I cannot find a place
: that indicates it is set or not.

I don't think the DirectoryFactory plugin hook was ever setup so that it 
can report it's info/stats ... it doesn't look like it implements 
SOlrInfoMBean, so it can't really report anything about itself.

: I would assume StandardDirectoryFactory is being used but I do see (when I
: set it or NOT set it)
...
: readerDir :  
: org.apache.lucene.store.MMapDirectory@C:\solr\jetty\example\solr\providersea
: rch\data\index 

this is because StandardDirectoryFactory uses FSDirectory .. if you check 
out those docs you'll see...

http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/store/FSDirectory.html#open%28java.io.File%29

>> Currently this returns MMapDirectory for most Solaris and Windows 
>> 64-bit JREs, NIOFSDirectory for other non-Windows JREs, and 
>> SimpleFSDirectory for other JREs on Windows.



-Hoss


Re: Solrj Stream Server memory leak

2012-02-21 Thread Chris Hostetter

: I am using the SolrJ client's StreamingUpdateSolrServer and when ever i
: stop tomcat, it throws a memory leak warning. sample error message:
: 
: SEVERE: The web application [/MyApplication] appears to have started a
: thread named [pool-1004-thread-1] but has failed to stop it. This is very
: likely to create a memory leak.

as part of the SolrCloud work (SOLR-2358) a "shutdown()" method was added 
to CommonsHttpSolrServer (and StreamingUpdateSolrServer) to instruct it to 
shutdown the HttpClient it wraps (if it created it).  So if you are using 
trunk, you should call that when you are done with the 
StreamingUpdateSolrServer object.

As a workarround in 3x, you can instantiate the HttpClient yourself using 
the MultiThreadedHttpConnectionManager, and pass it to the 
StreamingUpdateSolrServer constructor.  then when your app shuts down, you 
can call shutdown on the HttpClient.

alternately: the minimal ammount of change you can make to work 
arround tthis would be to add a call to the static method...

  MultiThreadedHttpConnectionManager.shutdownAll();

...some where in your app's shutdown code (assuming it doesn't cause 
problems with any subsequent shutdown code)


-Hoss


Re: Date filter query

2012-02-21 Thread Erick Erickson
bq: How could I overlook it?

Easy, the same way I did for a year and more 

Best
Erick

On Tue, Feb 21, 2012 at 6:50 PM, Em  wrote:
> Erick,
>
> damn!
>
> The NOW of now isn't the same NOW a second later. So obvisiously. How
> could I overlook it?
>
> Kind regards,
> Em
>
> Am 22.02.2012 00:17, schrieb Erick Erickson:
>> Be a little careful here. Any "fq" that references NOW will probably
>> NOT be effectively cached. Think of the fq cache as a map, with
>> the key being the fq clause and the value being the set of
>> documents that match that value.
>>
>> So something like NOW gives
>> 2012-01-23T00:00:00Z
>> but issuing that a second later gives:
>> 2012-01-23T00:00:01Z
>>
>> so the keys don't match, they're considered
>> different fq clauses and the calculations are all
>> done all over again.
>>
>> Using the rounding for date math will help here,
>> something like NOW/DAY+1DAY to get midnight tonight
>> will give you something that's re-used, similarly for
>> NOW/DAY-30DAY etc.
>>
>> All that said, your query times are pretty long. I doubt
>> that your fq clause is really the culprit here. You need
>> to find out what the bottleneck is here, consider using
>> jconsole to see what your machine is occupying its
>> time with. Examine your cache statistics to see
>> if your getting good usage from your cache. You
>> haven't detailed what you're measuring. If this is just
>> a half-dozen queries after starting Solr, you may get
>> much better performance if you autowarm.
>>
>> You may have too little memory allocated. You may be
>> swapping to disk a lot. You may.
>>
>> What have you tried and what have the results been?
>>
>> In short, these times are very suspect and you haven't
>> really provided much info to go on.
>>
>> Best
>> Erick
>>
>> On Tue, Feb 21, 2012 at 5:25 PM, Em  wrote:
>>> Hi,
>>>
 But they [the cache configurations] are default for both tests, can it
>>> affect on
 results?
>>> Yes, they affect both results. Try to increase the values for
>>> queryResultCache and documentCache from 512 to 1024 (provided that you
>>> got two distinct queries "bay" and "girl"). In general they should fit
>>> the amount of documents and results you are expecting to have in a way
>>> that chances are good to have a cache-hit.
>>>
 Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
>>> 11 shards on the same machine? Could lead to decreased performance due
>>> to disk-io.
>>>
>>> Did you tried my advice of adjusting the precisionSteps of your
>>> TrieDateFields and reindexed your documents afterwards?
>>>
>>> Kind regards,
>>> Em
>>>
>>>
>>> Am 21.02.2012 22:52, schrieb ku3ia:
 Hi,

>> First: I am really surprised that the difference between explicit
>> Date-Values and the more friendly date-keywords is that large.
 Maybe it is that I use shards. I have 11 shards, summary ~310M docs.

>> Did you made a server restart between both tests?
 I tried to run these test one after another, I'd rebooted my tomcats, I'd
 run second test first and vice versa.

>> Second: Could you show us your solrconfig to make sure that your caches
>> are configured well?
 I'm using solrconfig from solr/example directory. The difference is that I
 only commented out unused components. Filter, document and query result
 cache is default. But they are default for both tests, can it affect on
 results?

>> Furthermore: Take into consideration, whether you really need 500 rows
>> per request.
 Yes, I need 500 rows.

 Thanks

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html
 Sent from the Solr - User mailing list archive at Nabble.com.

>>


Re: Fast Vector Highlighter Working for some records only

2012-02-21 Thread Koji Sekiguchi

(12/02/21 21:22), dhaivat wrote:

Hi  Koji,

Thanks for quick reply, i am using solr 1.4.1



Uh, you cannot use FVH on Solr 1.4.1. FVH is available Solr 3.1 or later.
So your hl.useFastVectorHighlighter=true flag is ignored.

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/


Re: Date filter query

2012-02-21 Thread Em
Erick,

damn!

The NOW of now isn't the same NOW a second later. So obvisiously. How
could I overlook it?

Kind regards,
Em

Am 22.02.2012 00:17, schrieb Erick Erickson:
> Be a little careful here. Any "fq" that references NOW will probably
> NOT be effectively cached. Think of the fq cache as a map, with
> the key being the fq clause and the value being the set of
> documents that match that value.
> 
> So something like NOW gives
> 2012-01-23T00:00:00Z
> but issuing that a second later gives:
> 2012-01-23T00:00:01Z
> 
> so the keys don't match, they're considered
> different fq clauses and the calculations are all
> done all over again.
> 
> Using the rounding for date math will help here,
> something like NOW/DAY+1DAY to get midnight tonight
> will give you something that's re-used, similarly for
> NOW/DAY-30DAY etc.
> 
> All that said, your query times are pretty long. I doubt
> that your fq clause is really the culprit here. You need
> to find out what the bottleneck is here, consider using
> jconsole to see what your machine is occupying its
> time with. Examine your cache statistics to see
> if your getting good usage from your cache. You
> haven't detailed what you're measuring. If this is just
> a half-dozen queries after starting Solr, you may get
> much better performance if you autowarm.
> 
> You may have too little memory allocated. You may be
> swapping to disk a lot. You may.
> 
> What have you tried and what have the results been?
> 
> In short, these times are very suspect and you haven't
> really provided much info to go on.
> 
> Best
> Erick
> 
> On Tue, Feb 21, 2012 at 5:25 PM, Em  wrote:
>> Hi,
>>
>>> But they [the cache configurations] are default for both tests, can it
>> affect on
>>> results?
>> Yes, they affect both results. Try to increase the values for
>> queryResultCache and documentCache from 512 to 1024 (provided that you
>> got two distinct queries "bay" and "girl"). In general they should fit
>> the amount of documents and results you are expecting to have in a way
>> that chances are good to have a cache-hit.
>>
>>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
>> 11 shards on the same machine? Could lead to decreased performance due
>> to disk-io.
>>
>> Did you tried my advice of adjusting the precisionSteps of your
>> TrieDateFields and reindexed your documents afterwards?
>>
>> Kind regards,
>> Em
>>
>>
>> Am 21.02.2012 22:52, schrieb ku3ia:
>>> Hi,
>>>
> First: I am really surprised that the difference between explicit
> Date-Values and the more friendly date-keywords is that large.
>>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
>>>
> Did you made a server restart between both tests?
>>> I tried to run these test one after another, I'd rebooted my tomcats, I'd
>>> run second test first and vice versa.
>>>
> Second: Could you show us your solrconfig to make sure that your caches
> are configured well?
>>> I'm using solrconfig from solr/example directory. The difference is that I
>>> only commented out unused components. Filter, document and query result
>>> cache is default. But they are default for both tests, can it affect on
>>> results?
>>>
> Furthermore: Take into consideration, whether you really need 500 rows
> per request.
>>> Yes, I need 500 rows.
>>>
>>> Thanks
>>>
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
> 


nutch and solr

2012-02-21 Thread alessio crisantemi
I try to configured nutch (1.4) on my solr 3.2
But when I try with a crawl command

"bin/nutch inject crawl/crawldb urls"

don't works, and it reply with "can't convert a empty path"



why, in your opinion?

tx

a.


Re: filter query or boolean?

2012-02-21 Thread Erick Erickson
Apples and oranges here.

Filter queries do NOT contribute to score. But they are cached so
if you have a frequent use-case for filtering, you'll get much
faster performance. OTOH, if your filter queries are never
repeated, filter queries aren't helpful.

So if correctness isn't defined by the fq clause being included
in the relevance score, you're *usually*  better off using filter
queries...

Best
Erick

On Tue, Feb 21, 2012 at 1:25 PM,   wrote:
>
> Hi,
>  Which is faster for boolean compound expressions. filter queries or a
> single query with boolean expressions?
> For that matter, is there any difference other than maybe speed?
>
> thanks


Re: Date filter query

2012-02-21 Thread Erick Erickson
Be a little careful here. Any "fq" that references NOW will probably
NOT be effectively cached. Think of the fq cache as a map, with
the key being the fq clause and the value being the set of
documents that match that value.

So something like NOW gives
2012-01-23T00:00:00Z
but issuing that a second later gives:
2012-01-23T00:00:01Z

so the keys don't match, they're considered
different fq clauses and the calculations are all
done all over again.

Using the rounding for date math will help here,
something like NOW/DAY+1DAY to get midnight tonight
will give you something that's re-used, similarly for
NOW/DAY-30DAY etc.

All that said, your query times are pretty long. I doubt
that your fq clause is really the culprit here. You need
to find out what the bottleneck is here, consider using
jconsole to see what your machine is occupying its
time with. Examine your cache statistics to see
if your getting good usage from your cache. You
haven't detailed what you're measuring. If this is just
a half-dozen queries after starting Solr, you may get
much better performance if you autowarm.

You may have too little memory allocated. You may be
swapping to disk a lot. You may.

What have you tried and what have the results been?

In short, these times are very suspect and you haven't
really provided much info to go on.

Best
Erick

On Tue, Feb 21, 2012 at 5:25 PM, Em  wrote:
> Hi,
>
>> But they [the cache configurations] are default for both tests, can it
> affect on
>> results?
> Yes, they affect both results. Try to increase the values for
> queryResultCache and documentCache from 512 to 1024 (provided that you
> got two distinct queries "bay" and "girl"). In general they should fit
> the amount of documents and results you are expecting to have in a way
> that chances are good to have a cache-hit.
>
>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
> 11 shards on the same machine? Could lead to decreased performance due
> to disk-io.
>
> Did you tried my advice of adjusting the precisionSteps of your
> TrieDateFields and reindexed your documents afterwards?
>
> Kind regards,
> Em
>
>
> Am 21.02.2012 22:52, schrieb ku3ia:
>> Hi,
>>
 First: I am really surprised that the difference between explicit
 Date-Values and the more friendly date-keywords is that large.
>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
>>
 Did you made a server restart between both tests?
>> I tried to run these test one after another, I'd rebooted my tomcats, I'd
>> run second test first and vice versa.
>>
 Second: Could you show us your solrconfig to make sure that your caches
 are configured well?
>> I'm using solrconfig from solr/example directory. The difference is that I
>> only commented out unused components. Filter, document and query result
>> cache is default. But they are default for both tests, can it affect on
>> results?
>>
 Furthermore: Take into consideration, whether you really need 500 rows
 per request.
>> Yes, I need 500 rows.
>>
>> Thanks
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>


Re: mixed indexing through dhi and other ways

2012-02-21 Thread Em
Hi Ramo,

sorry for confusing you.

Forget everything that I said after "However" - it was wrong (I mixed
something here).

Yes, you can index documents via any UpdateRequestHandler you like while
using the DIH.

Kind regards,
Em

Am 21.02.2012 23:41, schrieb Ramo Karahasan:
> Hi,
> 
> what do you mean? Are you referring the time i add a new document? But that
> should be okay, all documents will be added with delta import that are older
> than the last time I've indexed, right?
> 
> Thanks,
> Ramo
> 
> -Ursprüngliche Nachricht-
> Von: Em [mailto:mailformailingli...@yahoo.de] 
> Gesendet: Dienstag, 21. Februar 2012 23:27
> An: solr-user@lucene.apache.org
> Betreff: Re: mixed indexing through dhi and other ways
> 
> Hi Ramo,
> 
> yes, it's possible.
> However keep in mind that your cURL, CSV, XML, JSON etc. update-requests
> store the information that is needed to do delta-updates with your DIH (if
> needed!).
> 
> Kind regards,
> Em
> 
> Am 21.02.2012 23:18, schrieb Ramo Karahasan:
>> Hi,
>>
>>  
>>
>> currently i'm indexing via DHI and delta import.
>>
>> Is it possible to additionaly index data via cURL as XML or JSON into 
>> the index which was created via DHI, for example for 
>> "real-time"indexing data, like comments on a question?
>>
>>  
>>
>> Thank you,
>>
>> Ramo
>>
>>
> 
> 


Solr Highlighting not working with PayloadTermQueries

2012-02-21 Thread Nitin Arora
Hi, 

I'm using SOLR and Lucene in my application for search. 

I'm facing an issue of highlighting using FastVectorHighlighter not working
when I use PayloadTermQueries as clauses of a BooleanQuery. 

After Debugging I found that In DefaultSolrHighlighter.Java,
fvh.getFieldQuery does not return any term in the termMap. 

FastVectorHighlighter fvh = new FastVectorHighlighter( 
// FVH cannot process hl.usePhraseHighlighter parameter per-field
basis 
params.getBool( HighlightParams.USE_PHRASE_HIGHLIGHTER, true ), 
// FVH cannot process hl.requireFieldMatch parameter per-field basis 
params.getBool( HighlightParams.FIELD_MATCH, false ) ); 

FieldQuery fieldQuery = fvh.getFieldQuery( query );

The reason of empty termmap is, PayloadTermQuery is discarded while
constructing the FieldQuery. 

void flatten( Query sourceQuery, Collection flatQueries ){ 
if( sourceQuery instanceof BooleanQuery ){ 
  BooleanQuery bq = (BooleanQuery)sourceQuery; 
  for( BooleanClause clause : bq.getClauses() ){ 
if( !clause.isProhibited() ) 
  flatten( clause.getQuery(), flatQueries ); 
  } 
} 
else if( sourceQuery instanceof DisjunctionMaxQuery ){ 
  DisjunctionMaxQuery dmq = (DisjunctionMaxQuery)sourceQuery; 
  for( Query query : dmq ){ 
flatten( query, flatQueries ); 
  } 
} 
else if( sourceQuery instanceof TermQuery ){ 
  if( !flatQueries.contains( sourceQuery ) ) 
flatQueries.add( sourceQuery ); 
} 
else if( sourceQuery instanceof PhraseQuery ){ 
  if( !flatQueries.contains( sourceQuery ) ){ 
PhraseQuery pq = (PhraseQuery)sourceQuery; 
if( pq.getTerms().length > 1 ) 
  flatQueries.add( pq ); 
else if( pq.getTerms().length == 1 ){ 
  flatQueries.add( new TermQuery( pq.getTerms()[0] ) ); 
} 
  } 
} 
// else discard queries 
  } 

What is the best way to get highlighting working with Payload Term Queries? 

Thanks 
Nitin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Highlighting-not-working-with-PayloadTermQueries-tp3765093p3765093.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: reader/searcher refresh after replication (commit)

2012-02-21 Thread Em
Eks,

that sounds strange!

Am I getting you right?
You have a master which indexes batch-updates from time to time.
Furthermore you got some slaves, pulling data from that master to keep
them up-to-date with the newest batch-updates.
Additionally your slaves index own content in soft-commit mode that
needs to be available as soon as possible.
In consequence the slavesare not in sync with the master.

I am not 100% certain, but chances are good that Solr's
replication-mechanism only changes those segments that are not in sync
with the master.

What are you expecting a BeforeCommitListener could do for you, if one
would exist?

Kind regards,
Em

Am 21.02.2012 21:10, schrieb eks dev:
> Thanks Mark,
> Hmm, I would like to have this information asap, not to wait until the
> first search gets executed (depends on user) . Is solr going to create
> new searcher as a part of "replication transaction"...
> 
> Just to make it clear why I need it...
> I have simple master, many slaves config where master does "batch"
> updates in big chunks (things user can wait longer to see on search
> side) but slaves work in soft commit mode internally where I permit
> them to run away slightly from master in order to know where
> "incremental update" should start, I read it from UserData 
> 
> Basically, ideally, before commit (after successful replication is
> finished) ends, I would like to read in these counters to let
> "incremental update" run from the right point...
> 
> I need to prevent updating "replicated index" before I read this
> information (duplicates can appear) are there any "IndexWriter"
> listeners around?
> 
> 
> Thanks again,
> eks.
> 
> 
> 
> On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller  wrote:
>> Post commit calls are made before a new searcher is opened.
>>
>> Might be easier to try to hook in with a new searcher listener?
>>
>> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>>
>>> Hi all,
>>> I am a bit confused with IndexSearcher refresh lifecycles...
>>> In a master slave setup, I override postCommit listener on slave
>>> (solr trunk version) to read some user information stored in
>>> userCommitData on master
>>>
>>> --
>>> @Override
>>> public final void postCommit() {
>>> // This returnes "stale" information that was present before
>>> replication finished
>>> RefCounted refC = core.getNewestSearcher(true);
>>> Map userData =
>>> refC.get().getIndexReader().getIndexCommit().getUserData();
>>> }
>>> 
>>> I expected core.getNewestSearcher(true); to return refreshed
>>> SolrIndexSearcher, but it didn't
>>>
>>> When is this information going to be refreshed to the status from the
>>> replicated index, I repeat this is postCommit listener?
>>>
>>> What is the way to get the information from the last commit point?
>>>
>>> Maybe like this?
>>> core.getDeletionPolicy().getLatestCommit().getUserData();
>>>
>>> Or I need to explicitly open new searcher (isn't solr does this behind
>>> the scenes?)
>>> core.openNewSearcher(false, false)
>>>
>>> Not critical, reopening new searcher works, but I would like to
>>> understand these lifecycles, when solr loads latest commit point...
>>>
>>> Thanks, eks
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
> 


AW: mixed indexing through dhi and other ways

2012-02-21 Thread Ramo Karahasan
Hi,

what do you mean? Are you referring the time i add a new document? But that
should be okay, all documents will be added with delta import that are older
than the last time I've indexed, right?

Thanks,
Ramo

-Ursprüngliche Nachricht-
Von: Em [mailto:mailformailingli...@yahoo.de] 
Gesendet: Dienstag, 21. Februar 2012 23:27
An: solr-user@lucene.apache.org
Betreff: Re: mixed indexing through dhi and other ways

Hi Ramo,

yes, it's possible.
However keep in mind that your cURL, CSV, XML, JSON etc. update-requests
store the information that is needed to do delta-updates with your DIH (if
needed!).

Kind regards,
Em

Am 21.02.2012 23:18, schrieb Ramo Karahasan:
> Hi,
> 
>  
> 
> currently i'm indexing via DHI and delta import.
> 
> Is it possible to additionaly index data via cURL as XML or JSON into 
> the index which was created via DHI, for example for 
> "real-time"indexing data, like comments on a question?
> 
>  
> 
> Thank you,
> 
> Ramo
> 
> 



Re: SOLR - Just for search or whole site DB?

2012-02-21 Thread Em
Hi Spadez,

MySQL, as well as any other SQL-database, needs the same amount of work
to integrate its data into Solr.
Choose your favorite database and get started!

Best,
Em

Am 21.02.2012 18:32, schrieb Spadez:
> Thank you for the information Damien. 
> 
> Is there a better database to use at the core of the sight which is more
> compatible with SOLR than MYSQL, or is hooking MYSQL up with SOLR simple
> enough.
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SOLR-Just-for-search-or-whole-site-DB-tp3763439p3764254.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: mixed indexing through dhi and other ways

2012-02-21 Thread Em
Hi Ramo,

yes, it's possible.
However keep in mind that your cURL, CSV, XML, JSON etc. update-requests
store the information that is needed to do delta-updates with your DIH
(if needed!).

Kind regards,
Em

Am 21.02.2012 23:18, schrieb Ramo Karahasan:
> Hi,
> 
>  
> 
> currently i'm indexing via DHI and delta import.
> 
> Is it possible to additionaly index data via cURL as XML or JSON into the
> index which was created via DHI, for example for "real-time"indexing data,
> like comments on a question?
> 
>  
> 
> Thank you,
> 
> Ramo 
> 
> 


Re: Date filter query

2012-02-21 Thread Em
Hi,

> But they [the cache configurations] are default for both tests, can it
affect on
> results?
Yes, they affect both results. Try to increase the values for
queryResultCache and documentCache from 512 to 1024 (provided that you
got two distinct queries "bay" and "girl"). In general they should fit
the amount of documents and results you are expecting to have in a way
that chances are good to have a cache-hit.

> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
11 shards on the same machine? Could lead to decreased performance due
to disk-io.

Did you tried my advice of adjusting the precisionSteps of your
TrieDateFields and reindexed your documents afterwards?

Kind regards,
Em


Am 21.02.2012 22:52, schrieb ku3ia:
> Hi,
> 
>>> First: I am really surprised that the difference between explicit 
>>> Date-Values and the more friendly date-keywords is that large. 
> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
> 
>>> Did you made a server restart between both tests?
> I tried to run these test one after another, I'd rebooted my tomcats, I'd
> run second test first and vice versa.
> 
>>> Second: Could you show us your solrconfig to make sure that your caches 
>>> are configured well?
> I'm using solrconfig from solr/example directory. The difference is that I
> only commented out unused components. Filter, document and query result
> cache is default. But they are default for both tests, can it affect on
> results?
> 
>>> Furthermore: Take into consideration, whether you really need 500 rows 
>>> per request. 
> Yes, I need 500 rows.
> 
> Thanks
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


mixed indexing through dhi and other ways

2012-02-21 Thread Ramo Karahasan
Hi,

 

currently i'm indexing via DHI and delta import.

Is it possible to additionaly index data via cURL as XML or JSON into the
index which was created via DHI, for example for "real-time"indexing data,
like comments on a question?

 

Thank you,

Ramo 



Re: Date filter query

2012-02-21 Thread ku3ia
Hi,

>>First: I am really surprised that the difference between explicit 
>>Date-Values and the more friendly date-keywords is that large. 
Maybe it is that I use shards. I have 11 shards, summary ~310M docs.

>>Did you made a server restart between both tests?
I tried to run these test one after another, I'd rebooted my tomcats, I'd
run second test first and vice versa.

>>Second: Could you show us your solrconfig to make sure that your caches 
>>are configured well?
I'm using solrconfig from solr/example directory. The difference is that I
only commented out unused components. Filter, document and query result
cache is default. But they are default for both tests, can it affect on
results?

>>Furthermore: Take into consideration, whether you really need 500 rows 
>>per request. 
Yes, I need 500 rows.

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Date filter query

2012-02-21 Thread Em
Hi,

your QTimes are somewhat slow!
First: I am really surprised that the difference between explicit
Date-Values and the more friendly date-keywords is that large.
Did you made a server restart between both tests?

Second: Could you show us your solrconfig to make sure that your caches
are configured well?

How many documents are part of that test-index?

I suggest you to adjust the precisionStep-definition of your TrieDateField.

Furthermore: Take into consideration, whether you really need 500 rows
per request.

Kind regards,
Em


Am 21.02.2012 21:49, schrieb ku3ia:
> Hi, Em, thanks for your response. But seems a have a problem.
> I wrote a script, which sends a queries (curl based), with a certain delay.
> I had made a dictionary of matched words. I run my script with 500ms delay
> during 60 seconds. Take look at catalina logs:
> 
> INFO: [] webapp=/solr path=/select
> params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="baby"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500}
> status=0 QTime=1735
> INFO: [] webapp=/solr path=/select
> params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="girl"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500}
> status=0 QTime=9794
> 
> INFO: [] webapp=/solr path=/select
> params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="baby"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500}
> status=0 QTime=13885
> INFO: [] webapp=/solr path=/select
> params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="girl"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500}
> status=0 QTime=33995
> 
> Note, that not all queries from the second test are slower, for example:
> INFO: [] webapp=/solr path=/select
> params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="with"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500}
> status=0 QTime=18645 
> 
> INFO: [] webapp=/solr path=/select
> params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="with"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500}
> status=0 QTime=7877
> 
> but in average I have:
> *** Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] ***
> Queries processed: 110
> Queries cancelled: 4
> Max QTime is: 22728 ms
> Avg QTime is: 6681.31 ms
> Min QTime is:  ms
> 
> *** Date:[NOW-30DAY+TO+NOW] ***
> Queries processed: 20
> Queries cancelled: 94
> Max QTime is: 45203 ms
> Avg QTime is: 39195.2 ms
> Min QTime is:  ms
> 
> I repeated this test more times - results seems equal. Is it true, that
> [2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] is faster than
> [NOW-30DAY+TO+NOW]
> ?
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764781.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: reader/searcher refresh after replication (commit)

2012-02-21 Thread eks dev
And drinks on me to those who decoupled implicit commit from close...
this was tricky trap

On Tue, Feb 21, 2012 at 9:10 PM, eks dev  wrote:
> Thanks Mark,
> Hmm, I would like to have this information asap, not to wait until the
> first search gets executed (depends on user) . Is solr going to create
> new searcher as a part of "replication transaction"...
>
> Just to make it clear why I need it...
> I have simple master, many slaves config where master does "batch"
> updates in big chunks (things user can wait longer to see on search
> side) but slaves work in soft commit mode internally where I permit
> them to run away slightly from master in order to know where
> "incremental update" should start, I read it from UserData 
>
> Basically, ideally, before commit (after successful replication is
> finished) ends, I would like to read in these counters to let
> "incremental update" run from the right point...
>
> I need to prevent updating "replicated index" before I read this
> information (duplicates can appear) are there any "IndexWriter"
> listeners around?
>
>
> Thanks again,
> eks.
>
>
>
> On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller  wrote:
>> Post commit calls are made before a new searcher is opened.
>>
>> Might be easier to try to hook in with a new searcher listener?
>>
>> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>>
>>> Hi all,
>>> I am a bit confused with IndexSearcher refresh lifecycles...
>>> In a master slave setup, I override postCommit listener on slave
>>> (solr trunk version) to read some user information stored in
>>> userCommitData on master
>>>
>>> --
>>> @Override
>>> public final void postCommit() {
>>> // This returnes "stale" information that was present before
>>> replication finished
>>> RefCounted refC = core.getNewestSearcher(true);
>>> Map userData =
>>> refC.get().getIndexReader().getIndexCommit().getUserData();
>>> }
>>> 
>>> I expected core.getNewestSearcher(true); to return refreshed
>>> SolrIndexSearcher, but it didn't
>>>
>>> When is this information going to be refreshed to the status from the
>>> replicated index, I repeat this is postCommit listener?
>>>
>>> What is the way to get the information from the last commit point?
>>>
>>> Maybe like this?
>>> core.getDeletionPolicy().getLatestCommit().getUserData();
>>>
>>> Or I need to explicitly open new searcher (isn't solr does this behind
>>> the scenes?)
>>> core.openNewSearcher(false, false)
>>>
>>> Not critical, reopening new searcher works, but I would like to
>>> understand these lifecycles, when solr loads latest commit point...
>>>
>>> Thanks, eks
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>


Re: Date filter query

2012-02-21 Thread ku3ia
Hi, Em, thanks for your response. But seems a have a problem.
I wrote a script, which sends a queries (curl based), with a certain delay.
I had made a dictionary of matched words. I run my script with 500ms delay
during 60 seconds. Take look at catalina logs:

INFO: [] webapp=/solr path=/select
params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="baby"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500}
status=0 QTime=1735
INFO: [] webapp=/solr path=/select
params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="girl"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500}
status=0 QTime=9794

INFO: [] webapp=/solr path=/select
params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="baby"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500}
status=0 QTime=13885
INFO: [] webapp=/solr path=/select
params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="girl"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500}
status=0 QTime=33995

Note, that not all queries from the second test are slower, for example:
INFO: [] webapp=/solr path=/select
params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="with"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500}
status=0 QTime=18645 

INFO: [] webapp=/solr path=/select
params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="with"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500}
status=0 QTime=7877

but in average I have:
*** Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] ***
Queries processed: 110
Queries cancelled: 4
Max QTime is: 22728 ms
Avg QTime is: 6681.31 ms
Min QTime is:  ms

*** Date:[NOW-30DAY+TO+NOW] ***
Queries processed: 20
Queries cancelled: 94
Max QTime is: 45203 ms
Avg QTime is: 39195.2 ms
Min QTime is:  ms

I repeated this test more times - results seems equal. Is it true, that
[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] is faster than
[NOW-30DAY+TO+NOW]
?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764781.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique key constraint and optimistic locking (versioning)

2012-02-21 Thread Em
Hi Per,

Solr provides the so called "UniqueKey"-field.
Refer to the Wiki to learn more:
http://wiki.apache.org/solr/UniqueKey

> Optimistic locking (versioning)
... is not provided by Solr out of the box. If you add a new document
with the same UniqueKey it replaces the old one.
You have to do the versioning on your own (and keep in mind concurrent
updates).

Kind regards,
Em

Am 21.02.2012 13:50, schrieb Per Steffensen:
> Hi
> 
> Does solr/lucene provide any mechanism for "unique key constraint" and
> "optimistic locking (versioning)"?
> Unique key constraint: That a client will not succeed creating a new
> document in solr/lucene if a document already exists having the same
> value in some field (e.g. an id field). Of course implemented right, so
> that even though two or more threads are concurrently trying to create a
> new document with the same value in this field, only one of them will
> succeed.
> Optimistic locking (versioning): That a client will only succeed
> updating a document if this updated document is based on the version of
> the document currently stored in solr/lucene. Implemented in the
> optimistic way that clients during an update have to tell which version
> of the document they fetched from Solr and that they therefore have used
> as a starting-point for their updated document. So basically having a
> version field on the document that clients increase by one before
> sending to solr for update, and some code in Solr that only makes the
> update succeed if the version number of the updated document is exactly
> one higher than the version number of the document already stored. Of
> course again implemented right, so that even though two or more thrads
> are concurrently trying to update a document, and they all have their
> updated document based on the current version in solr/lucene, only one
> of them will succeed.
> 
> Or do I have to do stuff like this myself outside solr/lucene - e.g. in
> the client using solr.
> 
> Regards, Per Steffensen
> 


Re: reader/searcher refresh after replication (commit)

2012-02-21 Thread eks dev
Thanks Mark,
Hmm, I would like to have this information asap, not to wait until the
first search gets executed (depends on user) . Is solr going to create
new searcher as a part of "replication transaction"...

Just to make it clear why I need it...
I have simple master, many slaves config where master does "batch"
updates in big chunks (things user can wait longer to see on search
side) but slaves work in soft commit mode internally where I permit
them to run away slightly from master in order to know where
"incremental update" should start, I read it from UserData 

Basically, ideally, before commit (after successful replication is
finished) ends, I would like to read in these counters to let
"incremental update" run from the right point...

I need to prevent updating "replicated index" before I read this
information (duplicates can appear) are there any "IndexWriter"
listeners around?


Thanks again,
eks.



On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller  wrote:
> Post commit calls are made before a new searcher is opened.
>
> Might be easier to try to hook in with a new searcher listener?
>
> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>
>> Hi all,
>> I am a bit confused with IndexSearcher refresh lifecycles...
>> In a master slave setup, I override postCommit listener on slave
>> (solr trunk version) to read some user information stored in
>> userCommitData on master
>>
>> --
>> @Override
>> public final void postCommit() {
>> // This returnes "stale" information that was present before
>> replication finished
>> RefCounted refC = core.getNewestSearcher(true);
>> Map userData =
>> refC.get().getIndexReader().getIndexCommit().getUserData();
>> }
>> 
>> I expected core.getNewestSearcher(true); to return refreshed
>> SolrIndexSearcher, but it didn't
>>
>> When is this information going to be refreshed to the status from the
>> replicated index, I repeat this is postCommit listener?
>>
>> What is the way to get the information from the last commit point?
>>
>> Maybe like this?
>> core.getDeletionPolicy().getLatestCommit().getUserData();
>>
>> Or I need to explicitly open new searcher (isn't solr does this behind
>> the scenes?)
>> core.openNewSearcher(false, false)
>>
>> Not critical, reopening new searcher works, but I would like to
>> understand these lifecycles, when solr loads latest commit point...
>>
>> Thanks, eks
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Em
Well, you could create a keyword-file out of your database and join it
with your self-maintained keywordslist.
Doing so, keep in mind that you have to reload your SolrCore in order to
make the changes visible to the indexing-process (and keep in mind that
you have to reindex those documents that match your new keywordslist but
currently do not have those keywords assigned).

Kind regards,
Em

Am 21.02.2012 19:53, schrieb Xavier:
> In a way I agree that it would be easier to do that but i really wants to
> avoid this solution because it prefer to work "harder" on preparing my index
> than adding field requests on my front query :)
> 
> So the only solution i see right now is to do that on my own in order to
> have my database fully prepared to be indexed ... but i had hope that solr
> could handle it ... so if anyone see any solution to handle it directly with
> solr you are welcome :p
> 
> Anyways thanks for your help Em ;)
> 
> Best regards,
> Xavier
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3764506.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: reader/searcher refresh after replication (commit)

2012-02-21 Thread Mark Miller
Post commit calls are made before a new searcher is opened.

Might be easier to try to hook in with a new searcher listener?

On Feb 21, 2012, at 8:23 AM, eks dev wrote:

> Hi all,
> I am a bit confused with IndexSearcher refresh lifecycles...
> In a master slave setup, I override postCommit listener on slave
> (solr trunk version) to read some user information stored in
> userCommitData on master
> 
> --
> @Override
> public final void postCommit() {
> // This returnes "stale" information that was present before
> replication finished
> RefCounted refC = core.getNewestSearcher(true);
> Map userData =
> refC.get().getIndexReader().getIndexCommit().getUserData();
> }
> 
> I expected core.getNewestSearcher(true); to return refreshed
> SolrIndexSearcher, but it didn't
> 
> When is this information going to be refreshed to the status from the
> replicated index, I repeat this is postCommit listener?
> 
> What is the way to get the information from the last commit point?
> 
> Maybe like this?
> core.getDeletionPolicy().getLatestCommit().getUserData();
> 
> Or I need to explicitly open new searcher (isn't solr does this behind
> the scenes?)
> core.openNewSearcher(false, false)
> 
> Not critical, reopening new searcher works, but I would like to
> understand these lifecycles, when solr loads latest commit point...
> 
> Thanks, eks

- Mark Miller
lucidimagination.com













Re: How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Xavier
In a way I agree that it would be easier to do that but i really wants to
avoid this solution because it prefer to work "harder" on preparing my index
than adding field requests on my front query :)

So the only solution i see right now is to do that on my own in order to
have my database fully prepared to be indexed ... but i had hope that solr
could handle it ... so if anyone see any solution to handle it directly with
solr you are welcome :p

Anyways thanks for your help Em ;)

Best regards,
Xavier

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3764506.html
Sent from the Solr - User mailing list archive at Nabble.com.


filter query or boolean?

2012-02-21 Thread darren

Hi,
  Which is faster for boolean compound expressions. filter queries or a
single query with boolean expressions?
For that matter, is there any difference other than maybe speed?

thanks


Re: Date filter query

2012-02-21 Thread Em
Hi,

1) and 2) should have equal performance, given that several searches are
performed with the same fq-param.
Since the filters are cached, 1) and 2) perform better.

Kind regards,
Em

Am 21.02.2012 19:06, schrieb ku3ia:
> Hi all! 
> 
> Please advice me:
> 1) q=test&fq=date:[NOW-30DAY+TO+NOW]
> 2) q=test&fq=date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]
> 3) q=test+AND+date:[NOW-30DAY+TO+NOW]
> 4) q=test+AND+date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]
> 
> where date:
>  precisionStep="6" positionIncrementGap="0"/>
> 
> 
> Which of these queries will be faster by QTime at Solr 3.5? Thanks!
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764349.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Date filter query

2012-02-21 Thread ku3ia
Hi all! 

Please advice me:
1) q=test&fq=date:[NOW-30DAY+TO+NOW]
2) q=test&fq=date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]
3) q=test+AND+date:[NOW-30DAY+TO+NOW]
4) q=test+AND+date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]

where date:



Which of these queries will be faster by QTime at Solr 3.5? Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764349.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR - Just for search or whole site DB?

2012-02-21 Thread Spadez
Thank you for the information Damien. 

Is there a better database to use at the core of the sight which is more
compatible with SOLR than MYSQL, or is hooking MYSQL up with SOLR simple
enough.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Just-for-search-or-whole-site-DB-tp3763439p3764254.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Em
Wouldn't it be easier to store both types in different fields?
At query-time you are able to do a facet on both and can combine the
results client-side to present them within the GUI.

Kind regards,
Em

Am 21.02.2012 17:52, schrieb Xavier:
> Sure, the difference between my 2 facets are :
> 
> - 'predefined_facets' contains values already filled in my database like :
> 'web langage', 'cooking', 'fishing' 
> 
> - 'text_tag_facets' will contain the same possible value but determined
> automatically from a given wordslist by searching in the document text as
> shown in my previous post
> 
> 
> Why i want to do that ? because sometimes my 'predefined_facets' is not
> defined, and even if it is, i want to defined it the more as possible.
> 
> Best regards,
> Xavier
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3764116.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Xavier
Sure, the difference between my 2 facets are :

- 'predefined_facets' contains values already filled in my database like :
'web langage', 'cooking', 'fishing' 

- 'text_tag_facets' will contain the same possible value but determined
automatically from a given wordslist by searching in the document text as
shown in my previous post


Why i want to do that ? because sometimes my 'predefined_facets' is not
defined, and even if it is, i want to defined it the more as possible.

Best regards,
Xavier

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3764116.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Em
Hi Xavier,

> It's maybe because (As I understood) the real (stored) value of this
dynamic
> facet is still the initial fulltext  ?? (or maybe i'm wrong ...)
Exactly.
CopyField does not copy the analyzed result of a field into another one.
Instead, the original content given to that field (the unanalyzed raw
input) is getting copied.

Could you explain what is the difference between your text_tag_facets
and your predefined facets?

Kind regards,
Em

Am 21.02.2012 17:11, schrieb Xavier:
> Hi everyone,
> 
> Like explained in this post :
> http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-td3761201.html
> 
> I have created a dynamic facet at indexation by searching terms in a
> fulltext field.
> 
> But i don't know if it's possible to merge this "autocreated" facet with a
> facet already predefined ? i tried to used  (adding this to my
> code in my previous post) : 
> **
> 
>  but it's not seems to work ... (my text_tag_facet is always working, but
> didnt merged with my predefined_facet)
> 
> It's maybe because (As I understood) the real (stored) value of this dynamic
> facet is still the initial fulltext  ?? (or maybe i'm wrong ...)
> 
> I'm a little confused about this and i'm certainly doing it wrong but i
> begin to feel that those kinds of manipulation arent feasible into
> schema.xml 
> 
> Best regards.
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3763988.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: How to index a facetfield by searching words matching from another Textfield

2012-02-21 Thread Xavier
Thanks for this answer.

I have posted my new question (related to this post) into a new topic ;)

(
http://lucene.472066.n3.nabble.com/How-to-merge-an-quot-autofacet-quot-with-a-predefined-facet-td3763988.html
)


Best regards

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763993.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Xavier
Hi everyone,

Like explained in this post :
http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-td3761201.html

I have created a dynamic facet at indexation by searching terms in a
fulltext field.

But i don't know if it's possible to merge this "autocreated" facet with a
facet already predefined ? i tried to used  (adding this to my
code in my previous post) : 
**

 but it's not seems to work ... (my text_tag_facet is always working, but
didnt merged with my predefined_facet)

It's maybe because (As I understood) the real (stored) value of this dynamic
facet is still the initial fulltext  ?? (or maybe i'm wrong ...)

I'm a little confused about this and i'm certainly doing it wrong but i
begin to feel that those kinds of manipulation arent feasible into
schema.xml 

Best regards.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3763988.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOLR - Just for search or whole site DB?

2012-02-21 Thread Demian Katz
I would strongly recommend using Solr just for search.  Solr is designed for 
doing fast search lookups.  It is really not designed for performing all the 
functions of a relational database system.  You certainly COULD use Solr for 
everything, and the software is constantly being enhanced to make it more 
flexible, but you'll still probably find it awkward and inconvenient for 
certain tasks that are simple with MySQL.  It's also useful to be able to throw 
away and rebuild your Solr index at will, so you can upgrade to a new version 
or tweak your indexing rules.  If you store mission-critical data in Solr 
itself, this becomes more difficult.  The way I like to look at it is, as the 
name says, as an index.  You use one system for actually managing your data, 
and then you use Solr to create an index of that data for fast look-up.  

- Demian

> -Original Message-
> From: Spadez [mailto:james_will...@hotmail.com]
> Sent: Tuesday, February 21, 2012 7:45 AM
> To: solr-user@lucene.apache.org
> Subject: SOLR - Just for search or whole site DB?
> 
> 
> I am new to this but I wanted to pitch a setup to you. I have a website
> being coded at the moment, in the very early stages, but is effectively a
> full text scrapper and search engine. We have decided on SOLR for the search
> system.
> 
> We basically have two sets of data:
> 
> One is the content for the search engine, which is around 100K records at
> any one time. The entire system is built on PHP and currently put into a
> MSQL database. We want very quick relevant searches, this is critical. Our
> plan is to import our records into SOLR each night from the MYSQL database.
> 
> The second set of data is other parts of the site, such as our ticket
> system, stats about the number of clicks etc. The performance on this is not
> performance critical at all.
> 
> So, I have two questions:
> 
> Firstly, should everything be run through the SOLR search system, including
> tickets and site stats? Alterntively, is it better to keep only the main
> full text searches on SOLR and do the ticketing etc through normal MYSQL
> queries?
> 
> Secondly, which is probably dependant on the first question. If everything
> should go through SOLR, should we even use a MYSQL database at all? If not,
> what is the alternative? We use an XML file as a .SQL replacement for
> content including tickets, stats, users, passwords etc.
> 
> Sorry if these questions are basic, but I’m out of my depth here (but
> learning!)
> 
> James
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Just-
> for-search-or-whole-site-DB-tp3763439p3763439.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index a facetfield by searching words matching from another Textfield

2012-02-21 Thread Erick Erickson
setting stored="true" simply places a verbatim copy
of the input in the index. Returning that field in
a document will simply return that verbatim copy,
there's no way to do anything else.

The facet *values* you get back in your response should
be what you put in your index though, why doesn't that
suffice?

BTW, it's best to start a new thread rather than switch
topics mid-stream, see:

http://people.apache.org/~hossman/#threadhijack

Best
Erick


On Tue, Feb 21, 2012 at 8:35 AM, Xavier  wrote:
> Seems that's an error from the documentation with the 'Factory' missing in
> the classname !!?
>
> I found
>
> 
>
> That is working fine !!!
>
> Conclusion i have this files :
> *synonymswords.txt :*
> php,mysql,html,css=>web_langage
>
> And
>
> *keepwords.txt :*
> web langage
>
> With this fieldType :
>
>  omitNorms="true">
>        
>                
>                
>                 synonyms="synonymswords.txt"/>
>                 replacement=" "/>
>                 words="keepwords.txt" ignoreCase="true"/>
>        
>    
>
>
> And it's working fine ;)
>
>
> But I have another question, my fields are configured like that :
>
> 
>  multiValued="true"/>
>
> But if I turn "stored" to "true", it always return the full original text in
> my documents field value for "text_tag_facet" and not the facets created
> (like 'web langage')
>
> How can i get the result of the facet in the stored field of the document ?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763551.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index a facetfield by searching words matching from another Textfield

2012-02-21 Thread Xavier
Seems that's an error from the documentation with the 'Factory' missing in
the classname !!?

I found 



That is working fine !!!

Conclusion i have this files :
*synonymswords.txt :*
php,mysql,html,css=>web_langage

And

*keepwords.txt :*
web langage

With this fieldType : 












And it's working fine ;)


But I have another question, my fields are configured like that :




But if I turn "stored" to "true", it always return the full original text in
my documents field value for "text_tag_facet" and not the facets created
(like 'web langage')

How can i get the result of the facet in the stored field of the document ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763551.html
Sent from the Solr - User mailing list archive at Nabble.com.


reader/searcher refresh after replication (commit)

2012-02-21 Thread eks dev
Hi all,
I am a bit confused with IndexSearcher refresh lifecycles...
In a master slave setup, I override postCommit listener on slave
(solr trunk version) to read some user information stored in
userCommitData on master

--
@Override
public final void postCommit() {
// This returnes "stale" information that was present before
replication finished
RefCounted refC = core.getNewestSearcher(true);
Map userData =
refC.get().getIndexReader().getIndexCommit().getUserData();
}

I expected core.getNewestSearcher(true); to return refreshed
SolrIndexSearcher, but it didn't

When is this information going to be refreshed to the status from the
replicated index, I repeat this is postCommit listener?

What is the way to get the information from the last commit point?

Maybe like this?
core.getDeletionPolicy().getLatestCommit().getUserData();

Or I need to explicitly open new searcher (isn't solr does this behind
the scenes?)
core.openNewSearcher(false, false)

Not critical, reopening new searcher works, but I would like to
understand these lifecycles, when solr loads latest commit point...

Thanks, eks


Unique key constraint and optimistic locking (versioning)

2012-02-21 Thread Per Steffensen

Hi

Does solr/lucene provide any mechanism for "unique key constraint" and 
"optimistic locking (versioning)"?
Unique key constraint: That a client will not succeed creating a new 
document in solr/lucene if a document already exists having the same 
value in some field (e.g. an id field). Of course implemented right, so 
that even though two or more threads are concurrently trying to create a 
new document with the same value in this field, only one of them will 
succeed.
Optimistic locking (versioning): That a client will only succeed 
updating a document if this updated document is based on the version of 
the document currently stored in solr/lucene. Implemented in the 
optimistic way that clients during an update have to tell which version 
of the document they fetched from Solr and that they therefore have used 
as a starting-point for their updated document. So basically having a 
version field on the document that clients increase by one before 
sending to solr for update, and some code in Solr that only makes the 
update succeed if the version number of the updated document is exactly 
one higher than the version number of the document already stored. Of 
course again implemented right, so that even though two or more thrads 
are concurrently trying to update a document, and they all have their 
updated document based on the current version in solr/lucene, only one 
of them will succeed.


Or do I have to do stuff like this myself outside solr/lucene - e.g. in 
the client using solr.


Regards, Per Steffensen


SOLR - Just for search or whole site DB?

2012-02-21 Thread Spadez

I am new to this but I wanted to pitch a setup to you. I have a website
being coded at the moment, in the very early stages, but is effectively a
full text scrapper and search engine. We have decided on SOLR for the search
system.

We basically have two sets of data:

One is the content for the search engine, which is around 100K records at
any one time. The entire system is built on PHP and currently put into a
MSQL database. We want very quick relevant searches, this is critical. Our
plan is to import our records into SOLR each night from the MYSQL database.

The second set of data is other parts of the site, such as our ticket
system, stats about the number of clicks etc. The performance on this is not
performance critical at all.

So, I have two questions:

Firstly, should everything be run through the SOLR search system, including
tickets and site stats? Alterntively, is it better to keep only the main
full text searches on SOLR and do the ticketing etc through normal MYSQL
queries?

Secondly, which is probably dependant on the first question. If everything
should go through SOLR, should we even use a MYSQL database at all? If not,
what is the alternative? We use an XML file as a .SQL replacement for
content including tickets, stats, users, passwords etc.

Sorry if these questions are basic, but I’m out of my depth here (but
learning!)

James


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Just-for-search-or-whole-site-DB-tp3763439p3763439.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fast Vector Highlighter Working for some records only

2012-02-21 Thread dhaivat
Hi  Koji,

Thanks for quick reply, i am using solr 1.4.1 

i am querying *"camera"*



here is the example of documents :

which matches the 


70
  Electronics/Cell Phones
  /b/l/blackberry-8100-pearl-2.jpg
  349.99
  BlackBerry 8100 Pearl sports a large 240 x 260 screen
that supports over 65,000 colors-- plenty of real estate to view your
e-mails, Web browser content, messaging sessions, and
attachments.
  Silver
  blackberry-8100-pearl.html
  Like the BlackBerry 7105t, the BlackBerry 8100 Pearl is  The
BlackBerry 8100 Pearl sports a large 240 x 260 screen that supports over
65,000 colors-- plenty of real estate to view your e-mails, Web browser
content, messaging sessions, and attachments. The venerable BlackBerry
trackwheel has been replaced on this model with an innovative four-way
trackball placed below the screen. On the rear of the handheld, you'll
find a 1.3-megapixel camera and a self portrait mirror. The handheld's
microSD memory card slot is located inside the device, behind the battery.
There's also a standard 2.5mm headset jack that can be used with the
included headset, as well as a mini-USB port for data
connectivity.
  BlackBerry 8100 Pearl
  
    • 1.3 mega pixel camera to capture those special moments
    • MP3 player lets you listen to your favorite music on the go
    • Menu and escape keys on the front of the device for easier access
    • Bluetooth technology lets you experience hands free and wire free features
    • Package Contents: phone,AC adapter,software CD,headset,USB cable,sim- card,get started poster,reference guide
    89 Electronics/Cameras/Accessories /u/n/universal-camera-case-2.jpg 34.0 Universal Camera Case Green universal-camera-case.html A stylish digital camera demands stylish protection. This leather carrying case will defend your camera from the dings and scratches of travel and everyday use while looking smart all the time. Universal Camera Case on above documents i am getting highlighting response on documentid = 89 and not for documentId = 70 even though there is word called "camera" in document(id=70).. I have field called for your information i am using custom analyser for indexing and querying. Thanks Dhaivat Koji Sekiguchi wrote > > Dhaivat, > > Can you give us the concrete document that you are trying to search and > make > a highlight snippet? And what is your Solr version? > > koji > -- > Query Log Visualizer for Apache Solr > http://soleami.com/ > > (12/02/21 20:29), dhaivat wrote: >> >> Hi >> >> I am newbie to Solr and i am using Sorj Client to create index and query >> the >> solr data.. When i am querying the data i want to use Highlight feature >> of >> solr so i am using Fast Vector Highlighter to enable highlight on words.. >> I >> found that it's working fine for some documents and for some document >> it's >> returning any highlighted words even though the field of document >> contents >> that word.. i am using the following parameters using solrj client : >> >> query.add("hl","true"); >> query.add("hl.q",term); >> query.add("hl.fl","contents"); >> query.add("hl.snippets","100"); >> query.add("hl.fragsize","10"); >> query.add("hl.maxAnalyzedChars","10"); >> query.add("hl.useFastVectorHighlighter","true"); >> query.add("hl.highlightMultiTerm","true"); >> query.add("hl.regex.slop","0.5"); >> query.add("hl.regex.pattern","[-\\w ,/\n\\\"']*"); >> >> query.setHighlightSimplePre("*"); >> query.setHighlightSimplePost("*"); >> >> My solrconfig is pretty strait forward haven't specified anything related >> to >> highlighter there. >> >> this is how my solrConfig looks like : >> >> >> >> >> >> > multipartUploadLimitInKB="2048" /> >> >> >>> default="true" /> >> >>> class="org.apache.solr.handler.admin.AdminHandlers" /> >> >> >> >> >> >> >> solr >> >> >> >> >> i have also enabled the TermVectors,TermOffsets,TermPostions on Field on >> which i am indexing >> >> >> can anyone tell me where i am going wrong ? >> >> thanks in advance >> >> Dhaivat >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3763286.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > Koji Sekiguchi wrote > > Dhaivat, > > Can you give us the concrete document that you are trying to search and > make > a highlight snippet? And what is your Solr version? > > koji > -- > Query Log Visualizer for Apache Solr > http://soleami.com/ > > (12/02/21 20:29), dhaivat wrote: >> >> Hi >> >> I am newbie to Solr and i am using Sorj Client to create in

Re: Fast Vector Highlighter Working for some records only

2012-02-21 Thread Koji Sekiguchi

Dhaivat,

Can you give us the concrete document that you are trying to search and make
a highlight snippet? And what is your Solr version?

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/

(12/02/21 20:29), dhaivat wrote:


Hi

I am newbie to Solr and i am using Sorj Client to create index and query the
solr data.. When i am querying the data i want to use Highlight feature of
solr so i am using Fast Vector Highlighter to enable highlight on words.. I
found that it's working fine for some documents and for some document it's
returning any highlighted words even though the field of document contents
that word.. i am using the following parameters  using solrj client :

query.add("hl","true");
query.add("hl.q",term);
query.add("hl.fl","contents");
query.add("hl.snippets","100");
query.add("hl.fragsize","10");
query.add("hl.maxAnalyzedChars","10");
query.add("hl.useFastVectorHighlighter","true");
query.add("hl.highlightMultiTerm","true");
query.add("hl.regex.slop","0.5");
query.add("hl.regex.pattern","[-\\w ,/\n\\\"']*");

query.setHighlightSimplePre("*");
query.setHighlightSimplePost("*");

My solrconfig is pretty strait forward haven't specified anything related to
highlighter there.

this is how my solrConfig looks like :


   

   
 
   

   
   
   





   
 solr
   



i have also enabled the TermVectors,TermOffsets,TermPostions on Field on
which i am indexing


can anyone tell me where i am going wrong ?

thanks in advance

Dhaivat



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3763286.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: lucene operators interfearing in edismax

2012-02-21 Thread jmlucjav
Ok thanks.

But I reviewed some of my searches and the - was not surrounded by
withespaces in all cases, so I'll have to remove lucene operators myself
from the user input. I understand there is no predefined way to do so.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/lucene-operators-interfearing-in-edismax-tp3761577p3763324.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fast Vector Highlighter Working for some records only

2012-02-21 Thread dhaivat

Hi

I am newbie to Solr and i am using Sorj Client to create index and query the
solr data.. When i am querying the data i want to use Highlight feature of
solr so i am using Fast Vector Highlighter to enable highlight on words.. I
found that it's working fine for some documents and for some document it's
returning any highlighted words even though the field of document contents
that word.. i am using the following parameters  using solrj client :

   query.add("hl","true");
query.add("hl.q",term);
query.add("hl.fl","contents");
query.add("hl.snippets","100");
query.add("hl.fragsize","10");
query.add("hl.maxAnalyzedChars","10");
query.add("hl.useFastVectorHighlighter","true");
query.add("hl.highlightMultiTerm","true");
query.add("hl.regex.slop","0.5");
query.add("hl.regex.pattern","[-\\w ,/\n\\\"']*");

query.setHighlightSimplePre("*");
query.setHighlightSimplePost("*");

My solrconfig is pretty strait forward haven't specified anything related to
highlighter there.

this is how my solrConfig looks like :


  

  

  
  
  
  
  
  

   
 
   
  
solr
  



i have also enabled the TermVectors,TermOffsets,TermPostions on Field on
which i am indexing


can anyone tell me where i am going wrong ? 

thanks in advance

Dhaivat



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3763286.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index a facetfield by searching words matching from another Textfield

2012-02-21 Thread Xavier
That's it !  Thanks :)

First time i see that documentation page (which is really helpfull) :
http://lucidworks.lucidimagination.com/display/solr/Filter+Descriptions#FilterDescriptions-KeepWordsFilter

So, now i want to "associate" a wordslist to a value of an existing facets

So i tried i combine synonyms and keepwords like that : 











It works very well but my problem now is that i want to have whitespaces
return in synonym and match it with my keepwords ! (because i have
whitespaces in the values of my facet)

Exemple if i see : 'php' term i get with my synonyms_words : 'web langage'
and i keep the whole word 'web langage'

So my files are : 
synonymswords.txt : php=>web langage
keepwords.txt : web langage

The problem is that each words are analyze separatly and i dont know how to
handle it with whitespaces ...
(synonyms return 'web' and 'langage' so it don't match with 'web langage')

I tried to use 'solr.PatternReplaceFilter'  (as you can see in my
configuration above ) with a chosen caractere '_' as a space caracter but i
get an error so if you have an other tip for me it would be great :p



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763247.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Do SOLR supports Lemmatization

2012-02-21 Thread Dirceu Vieira
Hi,

Have a look at the following link:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28Lemmatization%29#Stemming


Regards,

Dirceu

On Tue, Feb 21, 2012 at 11:18 AM, dsy99  wrote:

> Dear all,
> I want to know, do SOLR support Lemmatization? If yes, which in-built
> Lemmatizer class  should be included in SOLR schema file to analyze the
> tokens using lemmatization rather than stemming.
>
> Thanks in advance.
>
> With Thanks & Regds:
> Divakar Yadav
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Do-SOLR-supports-Lemmatization-tp3763139p3763139.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Dirceu Vieira Júnior
---
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr


Do SOLR supports Lemmatization

2012-02-21 Thread dsy99
Dear all,
I want to know, do SOLR support Lemmatization? If yes, which in-built
Lemmatizer class  should be included in SOLR schema file to analyze the
tokens using lemmatization rather than stemming. 

Thanks in advance.

With Thanks & Regds:
Divakar Yadav

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Do-SOLR-supports-Lemmatization-tp3763139p3763139.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query regarding Lucene Indexing Method

2012-02-21 Thread syed kather
Hi Team ,

  Is there any article or site where I can learn about lucene index
Method: how is it written and maintained?

And one quick question : The Standard method that Lucene uses to handle
Indexes, Is it apache package or Lucene has own Index writing Method? Does
lucene use memory mapped files?




Thanks and Regards,
S SYED ABDUL KATHER