Re: how to index data in solr form database automatically

2011-06-23 Thread Romi
Yeah i am using data-import to get data from database and indexing it. but
what is cron can you please provide a link for it

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-index-data-in-solr-form-database-automatically-tp3102893p3103072.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re; DIH Scheduling

2011-06-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Jun 23, 2011 at 9:13 PM, simon  wrote:
> The Wiki page describes a design for a scheduler, which has not been
> committed to Solr yet (I checked). I did see a patch the other day
> (see https://issues.apache.org/jira/browse/SOLR-2305) but it didn't
> look well tested.
>
> I think that you're basically stuck with something like cron at this
> time. If your application is written in java, take a look at the
> Quartz scheduler - http://www.quartz-scheduler.org/

It was considered and decided against.
>
> -Simon
>



-- 
-
Noble Paul


Re: gui for solr index

2011-06-23 Thread Bùi Văn Quý

Please use the Lucene Luke
www.getopt.org/*luke*/

On 6/24/2011 1:29 PM, Алексей Цой wrote:

is there a standard solution to Apache solr (from trunk) for the following:
- GUI view solr-index.





Re: how to index data in solr form database automatically

2011-06-23 Thread Anshum
How about having a delta-import and a cron to trigger the post?

--
Anshum Gupta
http://ai-cafe.blogspot.com


On Fri, Jun 24, 2011 at 11:13 AM, Romi  wrote:

> I have MySql database for my application. i implemented solr search and
> used
> dataimporthandler(DIH)to index data from database into solr. my question
> is:
> is there any way that if database gets updated then my solr indexes
> automatically gets update for new data added in the database. . It means i
> need not to run index process manually every time data base tables
> changes.If yes then please tell me how can i achieve this.
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-index-data-in-solr-form-database-automatically-tp3102893p3102893.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


gui for solr index

2011-06-23 Thread Алексей Цой
is there a standard solution to Apache solr (from trunk) for the following:
- GUI view solr-index.


how to index data in solr form database automatically

2011-06-23 Thread Romi
I have MySql database for my application. i implemented solr search and used
dataimporthandler(DIH)to index data from database into solr. my question is:
is there any way that if database gets updated then my solr indexes
automatically gets update for new data added in the database. . It means i
need not to run index process manually every time data base tables
changes.If yes then please tell me how can i achieve this.

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-index-data-in-solr-form-database-automatically-tp3102893p3102893.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Updating the data-config file

2011-06-23 Thread sabman
Ahh! Thats interesting!

I understand what you mean. Since RSS and Atom feeds have the same structure
parsing them would be the same but I can do the for each different URLs.
These URLs can be obtained from a db, a file or through the request
parameters, right?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-the-data-config-file-tp3101241p3102225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Garbage Collection: I have given bad advice in the past!

2011-06-23 Thread Shawn Heisey
In the past I have told people on this list and in the IRC channel #solr 
what I use for Java GC settings.  A couple of days ago, I cleaned up my 
testing methodology to more closely mimic real production queries, and 
discovered that my GC settings were woefully inadequate.  Here's what I 
was using on a virtual machine with 9GB of RAM.  I've been using this 
for several months, and chose it because I had read several things 
praising it.  I should have done more research.


-Xms512M -Xmx2048M -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

On my backup servers, I am in the process of getting 3.2.0 ready to 
replace our 1.4.1 index.  I ran into a situation where committing a 
delta-import of only a few thousand records took longer than 3 minutes 
(Perl LWP default timeout) on every shard, where normally in production 
on 1.4.1 it only takes a few seconds.  This was shortly after I had hit 
the distributed index pretty hard with my improved benchmarking.


Using jstat, I found that while under benchmarking load, the system was 
spending 10-15% of it's time doing garbage collection, and that most of 
the garbage collections were from the young generation.  First I tried 
increasing the young generation size with the -XX:NewSize=1024M 
parameter.  This helped on the total GC count, but didn't really help 
with how much time was spent doing them.


A good command to see these statistics on Linux, and an Oracle link 
explaining what it all means:


jstat -gc -t `pgrep java` 5000
http://download.oracle.com/javase/6/docs/technotes/tools/share/jstat.html

I've learned that Solr will keep most of its data in young generation 
(eden), unless that memory pool is too small, then it will move data to 
the tenured generation.  The key for good performance seems to be 
creating a large enough young generation.  You do need to have a good 
chunk of tenured available, unless the solr instance has no index itself 
and exists only to distribute queries to shards living on other solr 
instances.  In that case, it hardly uses the tenured generation.  It 
turns out that CMSIncrementalMode causes more young generation 
collections and makes them take longer, which is exactly what Solr does 
NOT need.


After messing around with it for quite a while, I came up with the 
following settings, which included an increase in heap size:


-Xms3072M -Xmx3072M -XX:NewSize=1536M -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled


With these settings, it spends very little time doing garbage 
collections.  One of my shards has been up for nearly 24 hours, has been 
hit with the benchmarking script repeatedly, and it has only done 62 
young generation collections, and zero full collections, with 6.8 
seconds total GC time.  I am thinking of increasing the NewSize yet 
again, because the tenured generation (1.5GB in size) is only one third 
utilized after nearly 24 hours.


My settings will probably not work for everyone, but I hope this post 
will make it easier for others to find the right solution for themselves.


Thanks,
Shawn



Re: Understanding query explain information

2011-06-23 Thread Alexander Ramos Jardim
Yes, I am using synonims in index time.

2011/6/22 lee carroll 

> Hi are you using synonyms ?
>
>
>
> On 22 June 2011 10:30, Alexander Ramos Jardim
>  wrote:
> > Hi guys,
> >
> > I am getting some doubts about how to correctly understand the debugQuery
> > output. I have a field named itemName in my index. This is a text field,
> > just that. When I quqery a simple ?q=itemName:iPad , I end up with the
> > following query result.
> >
> > Simply trying to understand why these strings generated such scores, and
> as
> > far as I can understand, the only difference between them is the field
> > norms, as all the other results maintain themselves.
> >
> > Now, how do I get these field norm values? Field Norm is the result of
> this
> > formula right?
> >
> > *1/square root of (terms)*,* where terms is the number of terms in my
> field
> >> after it is indexed*
> >>
> >
> > Well, if this is true, the field norm for my first document should be 0.5
> > (1/sqrt(4)) as  "Livro - IPAD - O Guia do Profissional" ends up with the
> > terms "livro|ipad|guia|profissional" as tokens.
> >
> > What I am forgetting to take into account?
> >
> > 
> > 
> >
> > 
> >  0
> >  3
> >  
> >  on
> >  0
> >
> >  10
> >  
> >on
> >on
> >  
> >  itemName,score
> >  2.2
> >
> >  itemName:ipad
> >  
> > 
> > 
> >  
> >  3.6808658
> >  Livro - IPAD - O Guia do Profissional
> >  
> >
> >  
> >  3.1550279
> >  Leitor de Cartão para Ipad - Mobimax
> >  
> >  
> >  3.1550279
> >  Sleeve para iPad
> >
> >  
> >  
> >  3.1550279
> >  Sleeve de Neoprene para iPad
> >  
> >  
> >  3.1550279
> >
> >  Carregador de parede para iPad
> >  
> >  
> >  2.6291897
> >  Case Envelope para iPad - Black - Built NY
> >  
> >  
> >
> >  2.6291897
> >  Case Protetora p/ IPad de Silicone Duo - Browm
> > - Iskin
> >  
> >  
> >  2.6291897
> >  Case Protetora p/ IPad de Silicone Duo - Clear
> > - Iskin
> >  
> >
> >  
> >  2.6291897
> >  Case p/ iPad Sleeve - Black - Built NY
> >  
> >  
> >  2.6291897
> >  Bolsa de Proteção p/ iPad Preta - Geonav
> >
> >  
> > 
> > 
> >  itemName:ipad
> >  itemName:ipad
> >  itemName:ipad
> >  itemName:ipad
> >  
> >
> >  
> > 3.6808658 = (MATCH) fieldWeight(itemName:ipad in 102507), product of:
> >  1.0 = tf(termFreq(itemName:ipad)=1)
> >  8.413407 = idf(docFreq=165, maxDocs=275239)
> >  0.4375 = fieldNorm(field=itemName, doc=102507)
> > 
> >  
> > 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226401), product of:
> >  1.0 = tf(termFreq(itemName:ipad)=1)
> >  8.413407 = idf(docFreq=165, maxDocs=275239)
> >  0.375 = fieldNorm(field=itemName, doc=226401)
> > 
> >  
> > 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226409), product of:
> >  1.0 = tf(termFreq(itemName:ipad)=1)
> >  8.413407 = idf(docFreq=165, maxDocs=275239)
> >  0.375 = fieldNorm(field=itemName, doc=226409)
> > 
> >  
> > 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226447), product of:
> >  1.0 = tf(termFreq(itemName:ipad)=1)
> >  8.413407 = idf(docFreq=165, maxDocs=275239)
> >  0.375 = fieldNorm(field=itemName, doc=226447)
> > 
> >  
> >
> > 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226583), product of:
> >  1.0 = tf(termFreq(itemName:ipad)=1)
> >  8.413407 = idf(docFreq=165, maxDocs=275239)
> >  0.375 = fieldNorm(field=itemName, doc=226583)
> > 
> >  
> > 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223178), product of:
> >  1.0 = tf(termFreq(itemName:ipad)=1)
> >  8.413407 = idf(docFreq=165, maxDocs=275239)
> >  0.3125 = fieldNorm(field=itemName, doc=223178)
> > 
> >  
> > 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223196), product of:
> >  1.0 = tf(termFreq(itemName:ipad)=1)
> >  8.413407 = idf(docFreq=165, maxDocs=275239)
> >  0.3125 = fieldNorm(field=itemName, doc=223196)
> > 
> >  
> > 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223831), product of:
> >  1.0 = tf(termFreq(itemName:ipad)=1)
> >  8.413407 = idf(docFreq=165, maxDocs=275239)
> >  0.3125 = fieldNorm(field=itemName, doc=223831)
> > 
> >  
> > 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223856), product of:
> >  1.0 = tf(termFreq(itemName:ipad)=1)
> >  8.413407 = idf(docFreq=165, maxDocs=275239)
> >  0.3125 = fieldNorm(field=itemName, doc=223856)
> >
> > 
> >  
> > 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223908), product of:
> >  1.0 = tf(termFreq(itemName:ipad)=1)
> >  8.413407 = idf(docFreq=165, maxDocs=275239)
> >  0.3125 = fieldNorm(field=itemName, doc=223908)
> > 
> >  
> >  LuceneQParser
> >  
> >  3.0
> >  
> >
> >1.0
> >
> > 1.0
> >
> >
> > 0.0
> >
> >
> > name="org.apache.solr.handler.component.MoreLikeThisComponent">
> > 0.0
> >
> >
> > 0.0
> >
> >
> > 0.0
> >
> >
> >
> > 0.0
> >
> >
> > 0.0
> >
> >  
> >
> >  
> >2.0
> >
> > 1.0
> >
> >
> > 0.0
> >
> >
> > name="org.apache.solr.handler.comp

Re: Updating the data-config file

2011-06-23 Thread Ahmet Arslan
> So you mean I cannot update the
> data-config programmatically? 

Yes you can update it, and reload it via command 
dataimport?command=reload-config. However there is no built-in mechanism for 
this in solr.

> I don't
> understand how the request parameters be of use to me.

May be you can use different ulr in each import request.
dataimport?command=full-import&clean=false&url=myNewlyAddedURL

 
> This is how my data-config file looks:
> 
> 
> 
>          type="HttpDataSource" />
>         
>                
>                 
>         pk="link"
>                
>         url="http://rss.slashdot.org/Slashdot/slashdot";
>                
>        
> processor="XPathEntityProcessor"
>                
>         forEach="/RDF/channel |
> /RDF/item"
>                
>        
> transformer="DateFormatTransformer">
> 
>                
>                xpath="/RDF/item/title"
> />
>                
>                 xpath="/RDF/item/link"
> />
>                
>          xpath="/RDF/item/description" />
>                
>          xpath="/RDF/item/date"
> dateTimeFormat="-MM-dd'T'hh:mm:ss" />
>                
> 
>         
> 
> 
> I am running a Flash based application as the front end UI
> to show the
> search results. Now I want the user to be able to add new
> RSS feed data
> sources. 

How about fetching urls ("http://rss.slashdot.org/Slashdot/slashdot";) from an 
another data source, like database table, text file in a file system etc.


Re: Updating the data-config file

2011-06-23 Thread sabman
So you mean I cannot update the data-config programmatically? I don't
understand how the request parameters be of use to me.

This is how my data-config file looks:





http://rss.slashdot.org/Slashdot/slashdot";
processor="XPathEntityProcessor"
forEach="/RDF/channel | /RDF/item"
transformer="DateFormatTransformer">









I am running a Flash based application as the front end UI to show the
search results. Now I want the user to be able to add new RSS feed data
sources. 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-the-data-config-file-tp3101241p3101530.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index correctly a text save with tinyMCE

2011-06-23 Thread Ariel
Steven A Rowe the solution you have proposed doesn't work, thanks anyway.

Regards

On 6/23/11, Steven A Rowe  wrote:
> Hi Ariel,
>
> On 6/23/2011 at 12:34 PM, Ariel wrote:
>> But it still doesn't convert the code to the correct character, for
>> instance: España must be converted to España but it still
>> remains as España.
>
> So it looks like your text processing tool(s) escape markup meta-characters
> (e.g. "&" -> "&") after escaping above-ASCII characters to their named
> entity equivalents (e.g. "n" with a tilde to "ñ").  This two-level
> escaping appears to be the problem.
>
> According to the analysis.jsp output you sent, your original text
> "España" was converted to "Espa&ndilde;a" - the first level of
> escaping was reversed.
>
> I suspect you could fix the problem by including HTMLStripCharFilter twice,
> e.g.:
>
>
>
>
>...
>
> Good luck,
> Steve
>
>


Re: Updating the data-config file

2011-06-23 Thread Ahmet Arslan
> So I have some RSS feeds that I want
> to index using Solr. I am using the
> DataImportHandler and I have added the instructions on how
> to parse the
> feeds in the data-config file. 
> 
> Now if a user wants to add more RSS feeds to index, do I
> have to
> programatically instruct Solr to update the config file? Is
> there a HTTP
> POST or GET I can send to update the data-config file?

AFAIK there is no such thing to edit data-config file.

However you can pass an argument when triggering import, if thats helps.
http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters

Also you can save your rss url in a db, use multiple data sources. You only 
update the relevant table.
 


testing subscription.

2011-06-23 Thread Esteban Donato



Updating the data-config file

2011-06-23 Thread sabman
So I have some RSS feeds that I want to index using Solr. I am using the
DataImportHandler and I have added the instructions on how to parse the
feeds in the data-config file. 

Now if a user wants to add more RSS feeds to index, do I have to
programatically instruct Solr to update the config file? Is there a HTTP
POST or GET I can send to update the data-config file?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-the-data-config-file-tp3101241p3101241.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: velocity: hyperlinking to documents

2011-06-23 Thread okayndc
Yes, from the handy /browse view.

 I'll give this a try. Thanks Erik! 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/velocity-hyperlinking-to-documents-tp3091504p3100957.html
Sent from the Solr - User mailing list archive at Nabble.com.


Server Restart Required for Schema Changes After Document Delete All?

2011-06-23 Thread Brandon Fish
Are there any schema changes that would cause problems with the following
procedure from the
FAQ
?

1.Use the "match all docs" query in a delete by query command before
shutting down Solr: *:*

   1. Reload core
   2. Re-Index your data

Would this work when dynamic fields are removed?


Re: How to index correctly a text save with tinyMCE

2011-06-23 Thread Marek Tichy
Or fix the problem at it's source, i think you need to google for
entity_encoding : "raw"

on tinyMCE.


> Hi Ariel,
>
> On 6/23/2011 at 12:34 PM, Ariel wrote:
>   
>> But it still doesn't convert the code to the correct character, for
>> instance: España must be converted to España but it still
>> remains as España.
>> 
>
> So it looks like your text processing tool(s) escape markup meta-characters 
> (e.g. "&" -> "&") after escaping above-ASCII characters to their named 
> entity equivalents (e.g. "n" with a tilde to "ñ").  This two-level 
> escaping appears to be the problem.
>
> According to the analysis.jsp output you sent, your original text 
> "España" was converted to "Espa&ndilde;a" - the first level of 
> escaping was reversed.
>
> I suspect you could fix the problem by including HTMLStripCharFilter twice, 
> e.g.:
>
>
>
>
>...
>
> Good luck,
> Steve
>
>   



RE: How to index correctly a text save with tinyMCE

2011-06-23 Thread Steven A Rowe
Hi Ariel,

On 6/23/2011 at 12:34 PM, Ariel wrote:
> But it still doesn't convert the code to the correct character, for
> instance: España must be converted to España but it still
> remains as España.

So it looks like your text processing tool(s) escape markup meta-characters 
(e.g. "&" -> "&") after escaping above-ASCII characters to their named 
entity equivalents (e.g. "n" with a tilde to "ñ").  This two-level 
escaping appears to be the problem.

According to the analysis.jsp output you sent, your original text 
"España" was converted to "Espa&ndilde;a" - the first level of 
escaping was reversed.

I suspect you could fix the problem by including HTMLStripCharFilter twice, 
e.g.:

   
   
   
   ...

Good luck,
Steve



Re: How to index correctly a text save with tinyMCE

2011-06-23 Thread Ariel
I'am sorry I bother you again but this doesn't work, I have written
this configuration in my schema.xml file:








But it still doesn't convert the code to the correct character, for
instance: España must be converted to España but it still
remains as España.
I have included in this email an atachment with the results of the
analysis.jsp application.

Any help would be really appreciate it.
Regards,
Ariel

On 6/16/11, Steven A Rowe  wrote:
> Hi Ariel,
>
> As Shawn says, char filters come before tokenizers.
>
> You need to use a  tag instead of  tag.
>
> I've updated the HTMLStripCharFilter documentation on the Solr wiki to
> include this information:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
>
> Steve
>
>> -Original Message-
>> From: Shawn Heisey [mailto:s...@elyograg.org]
>> Sent: Thursday, June 16, 2011 1:32 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to index correctly a text save with tinyMCE
>>
>> On 6/16/2011 11:12 AM, Ariel wrote:
>> > Thanks for your answer, I have just put the filter in my schema.xml but
>> it
>> > doesn't work I am using solr 1.4 and my conf is:
>> >
>> > 
>> > 
>> >  
>> >  > > words="stopwords.txt"/>
>> >  
>> >  
>> >  > language="Spanish"/>
>> >  
>> >   
>> > 
>> >
>> >
>> > But it doesn't work in tomcat 6 logs I get this error:
>> >
>> >   java.lang.ClassCastException:
>> > org.apache.solr.analysis.HTMLStripCharFilterFactory cannot be cast to
>> > org.apache.solr.analysis.TokenFilterFactory
>>
>> According to the wiki, the output of that filter must be passed to
>> either another CharFilter or a Tokenizer.  Try moving it before
>> WhitespaceTokenizerFactory.
>>
>> Shawn
>
>


analysis.rar
Description: application/rar


Re: Removing duplicate documents from search results

2011-06-23 Thread simon
have you checked out the deduplication process that's available at
indexing time ? This includes a fuzzy hash algorithm .

http://wiki.apache.org/solr/Deduplication

-Simon

On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash  wrote:
> This approach would definitely work is the two documents are *Exactly* the
> same. But this is very fragile. Even if one extra space has been added, the
> whole hash would change. What I am really looking for is some %age
> similarity between documents, and remove those documents which are more than
> 95% similar.
>
> *Pranav Prakash*
>
> "temet nosce"
>
> Twitter  | Blog  |
> Google 
>
>
> On Thu, Jun 23, 2011 at 15:16, Omri Cohen  wrote:
>
>> What you need to do, is to calculate some HASH (using any message digest
>> algorithm you want, md5, sha-1 and so on), then do some reading on solr
>> field collapse capabilities. Should not be too complicated..
>>
>> *Omri Cohen*
>>
>>
>>
>> Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3-6036295
>>
>>
>>
>>
>> My profiles: [image: LinkedIn]  [image:
>> Twitter]  [image:
>> WordPress]
>>  Please consider your environmental responsibility. Before printing this
>> e-mail message, ask yourself whether you really need a hard copy.
>> IMPORTANT: The contents of this email and any attachments are confidential.
>> They are intended for the named recipient(s) only. If you have received
>> this
>> email by mistake, please notify the sender immediately and do not disclose
>> the contents to anyone or make copies thereof.
>> Signature powered by
>> <
>> http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer
>> >
>> WiseStamp<
>> http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer
>> >
>>
>>
>>
>> -- Forwarded message --
>> From: Pranav Prakash 
>> Date: Thu, Jun 23, 2011 at 12:26 PM
>> Subject: Removing duplicate documents from search results
>> To: solr-user@lucene.apache.org
>>
>>
>> How can I remove very similar documents from search results?
>>
>> My scenario is that there are documents in the index which are almost
>> similar (people submitting same stuff multiple times, sometimes different
>> people submitting same stuff). Now when a search is performed for
>> "keyword",
>> in the top N results, quite frequently, same document comes up multiple
>> times. I want to remove those duplicate (or possible duplicate) documents.
>> Very similar to what Google does when they say "In order to show you most
>> relevant result, duplicates have been removed". How can I achieve this
>> functionality using Solr? Does Solr has an implied or plugin which could
>> help me with it?
>>
>>
>> *Pranav Prakash*
>>
>> "temet nosce"
>>
>> Twitter  | Blog > >
>> |
>> Google 
>>
>


Re: response time for pdf indexing

2011-06-23 Thread simon
How long are the documents ? indexing a large document can be slow
(although 2 seconds is very slow indeed).

2011/6/22 Rode González (libnova) :
> Hi !
>
>
>
> We are using Zend Search based on Lucene. Our indexing pdf consultations
> take longer than 2 seconds.
>
> We want to change to solr to try to solve this problem.
>
> i. Can anyone tell me the response time for querys on pdf documents on solr?
>
>
> ii. Can anyone tell me some strategies to reduce this response time?
>
>
>
> Note: the pdf is not indexed in a simple way. The pdf is converted to text
> previously and then, indexed with some additional information needed.
>
> Thank you.
> ---
>
> Rode González
>  _
>
> No se encontraron virus en este mensaje.
> Comprobado por AVG - www.avg.com
> Versión: 10.0.1382 / Base de datos de virus: 1513/3719 - Fecha de
> publicación: 06/22/11
>
>


Re; DIH Scheduling

2011-06-23 Thread simon
The Wiki page describes a design for a scheduler, which has not been
committed to Solr yet (I checked). I did see a patch the other day
(see https://issues.apache.org/jira/browse/SOLR-2305) but it didn't
look well tested.

I think that you're basically stuck with something like cron at this
time. If your application is written in java, take a look at the
Quartz scheduler - http://www.quartz-scheduler.org/

-Simon


Re: SEVERE: java.lang.NoSuchFieldError: core Solr branch3.x

2011-06-23 Thread Markus Jelsma
The usual ant clean won't help either. A fresh check out did the trick.

On Thursday 23 June 2011 03:24:42 Yonik Seeley wrote:
> I just tried branch_3x and couldn't reproduce this.
> Looks like maybe there is something wrong with your build, or some old
> class files left over somewhere being picked up.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> 
> On Wed, Jun 22, 2011 at 10:15 AM, Markus Jelsma
> 
>  wrote:
> > Hi,
> > 
> > Today's checkout (Solr Specification Version: 3.4.0.2011.06.22.16.10.08)
> > produces the exception below on start up. The same exception with very
> > similar strack trace comes when committing and add. Example schema and
> > docs will reproduce the error.
> > 
> > Jun 22, 2011 4:11:57 PM org.apache.solr.common.SolrException log
> > SEVERE: java.lang.NoSuchFieldError: core
> >at
> > org.apache.lucene.index.SegmentTermDocs.(SegmentTermDocs.java:48)
> >at
> > org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:491)
> >at
> > org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1005) at
> > org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:484)
> >at
> > org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:321)
> >at
> > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:101)
> >at
> > org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.j
> > ava:298) at
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:524)
> >at
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:320)
> >at
> > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.j
> > ava:1178) at
> > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.ja
> > va:1066) at
> > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:35
> > 8) at
> > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.j
> > ava:258) at
> > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchH
> > andler.java:194) at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
> > se.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
> > at
> > org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.
> > java:54) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177) at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at
> > java.util.concurrent.FutureTask.run(FutureTask.java:138) at
> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor
> > .java:886) at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
> > a:908) at java.lang.Thread.run(Thread.java:662)
> > 
> > 
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Dismax + spatial constraints

2011-06-23 Thread kaiserwaseem
i am using dismax to boost my field as placeName^1.8 schemeName^1.5 text^1.0,
now I also want to boost my results with respect to distance to show closest
areas first, i sort it with geodist but it show irrelevant results on top, 
i also tried

q={!boost b=recip(geodist(50.1, -0.86, myGeoField), 1, 1000,
1000)}foo:bar&...

but in vain, any help



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-spatial-constraints-tp2979214p3099490.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Complex situation

2011-06-23 Thread roySolr
Hello Lee,

I thought maybe this is a solution:

I can index every night the correct openinghours for next day. So
tonight(00:01) i can index the openinghours for 2011-24-06. My query in my
dih can looks like this:

select *

from OPENINGHOURS o 
where o.startdate <= NOW() AND o.enddate >= NOW() 
AND o.companyid = '${OTHER_ENTITY.companyid}'

With this query i only save the openinghours for today. So i have only one
field(openinghours).

Openinghours
18:00

Then i can facets easilty on openinghours(facet.field=openinghours).

I don't know if i can update it every night without problems? I can use the
delte import?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Complex-situation-tp3071936p3099468.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Removing duplicate documents from search results

2011-06-23 Thread pravesh
Would you care to even index the duplicate documents? Finding duplicacy in
content fields would be not so easy as in some untokenized/keyword field.
May be you could do this filtering at indexing time before sending the
document to SOLR. Then the question comes, which one document should go(from
a group of duplicates)?? The latest one?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Removing-duplicate-documents-from-search-results-tp3099214p3099432.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [ANNOUNCEMENT] PHP Solr Extension 1.0.1 Stable Has Been Released

2011-06-23 Thread Israel Ekpo
I am working on that, I hope to have an answer within a month or so.

On Tue, Jun 21, 2011 at 9:51 AM, roySolr  wrote:

> Are you working on some changes to support earlier versions of PHP?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/ANNOUNCEMENT-PHP-Solr-Extension-1-0-1-Stable-Has-Been-Released-tp3024040p3090702.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Problem with SolrTestCaseJ4

2011-06-23 Thread Robert Muir
On Thu, Jun 23, 2011 at 4:10 AM, Tarjei Huse  wrote:
> On 06/20/2011 01:51 PM, Robert Muir wrote:
>> you must use junit 4.7.x, not junit 4.8.x
> Is there a way around this?
>

No, the only thing option we can do is decide to require 4.8

> Depending on a specific Junit version is bound to cause problems when
> working with other packages. For example Spring 2.5.6 testframework does
> not work newer junit versions than 4.4.
>

Then maybe it would be a good idea to ask the junit team for better
backwards compatibility!


Re: Complex situation

2011-06-23 Thread lee carroll
Hi Roy,

You have no relationship between time and date due to the
de-normalising of your data.

I don't have a good answer to this and I guess this is a "classic" question.

One approach is maybe to do the following:

make sure you have field collapsing available. trunk or a patch maybe

index not at the shop entity level but shop-opening level so your records are

shop fromDate toDateclosingTime
1  12/12/2010  12/12/2011  18:00
1  12/12/2011  12/12/2012   20:00

Field collapse on shop id. Note this impacts on your number of records
and could be a lot of change for your app :-)

I'm also not sure if field collapsing will have the desired effect on
the facet counts and will behave as expected. Anyone with better
knowledge? is their a better way ?

Anyway, good luck with it Roy


On 23 June 2011 08:29, roySolr  wrote:
> Hello,
>
> I have change my db dates to the correct format like 2011-01-11T00:00:00Z.
>
> Now i have the following data:
>
>
> Manchester Store        2011-01-01T00:00:00Z
> 2011-31-03T00:00:00Z     18:00
> Manchester Store        2011-01-04T00:00:00Z
> 2011-31-12T00:00:00Z     20:00
>
> The "Manchester Store" has 2 seasons with different closing times(18:00 and
> 20:00). Now i have
> 4 fields in SOLR.
>
> Companyname             Manchester Store
> startdate(multiV)          2011-01-01T00:00:00Z, 2011-01-04T00:00:00Z
> enddate(multiV)           2011-31-03T00:00:00Z, 2011-31-12T00:00:00Z
> closingTime(multiV)      18:00, 20:00
>
> I want some facets like this:
>
> Open today(2011-23-06):
> 20:00(1)
>
> The facet query needs to look what's the current date and needs to use that
> closing time. My facet.query look like this:
>
> facet.query=startdate:[* TO NOW] AND enddate:[NOW TO *] AND
> closingTime:"18:00"
>
> This returns 1 count like this: 18:00(1)
>
> When i use this facet.query it returns also 1 result:
>
> facet.query=startdate:[* TO NOW] AND enddate:[NOW TO *] AND
> closingTime:"20:00"
>
> This result is not correct because NOW(2011-23-06) it's not open till 20:00.
> It looks like there is no link between the season and the closingTime. Can
> somebody helps me?? The fields in SOLR are not correct?
>
> Thanks Roy
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Complex-situation-tp3071936p3098875.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Removing duplicate documents from search results

2011-06-23 Thread Pranav Prakash
This approach would definitely work is the two documents are *Exactly* the
same. But this is very fragile. Even if one extra space has been added, the
whole hash would change. What I am really looking for is some %age
similarity between documents, and remove those documents which are more than
95% similar.

*Pranav Prakash*

"temet nosce"

Twitter  | Blog  |
Google 


On Thu, Jun 23, 2011 at 15:16, Omri Cohen  wrote:

> What you need to do, is to calculate some HASH (using any message digest
> algorithm you want, md5, sha-1 and so on), then do some reading on solr
> field collapse capabilities. Should not be too complicated..
>
> *Omri Cohen*
>
>
>
> Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3-6036295
>
>
>
>
> My profiles: [image: LinkedIn]  [image:
> Twitter]  [image:
> WordPress]
>  Please consider your environmental responsibility. Before printing this
> e-mail message, ask yourself whether you really need a hard copy.
> IMPORTANT: The contents of this email and any attachments are confidential.
> They are intended for the named recipient(s) only. If you have received
> this
> email by mistake, please notify the sender immediately and do not disclose
> the contents to anyone or make copies thereof.
> Signature powered by
> <
> http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer
> >
> WiseStamp<
> http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer
> >
>
>
>
> -- Forwarded message --
> From: Pranav Prakash 
> Date: Thu, Jun 23, 2011 at 12:26 PM
> Subject: Removing duplicate documents from search results
> To: solr-user@lucene.apache.org
>
>
> How can I remove very similar documents from search results?
>
> My scenario is that there are documents in the index which are almost
> similar (people submitting same stuff multiple times, sometimes different
> people submitting same stuff). Now when a search is performed for
> "keyword",
> in the top N results, quite frequently, same document comes up multiple
> times. I want to remove those duplicate (or possible duplicate) documents.
> Very similar to what Google does when they say "In order to show you most
> relevant result, duplicates have been removed". How can I achieve this
> functionality using Solr? Does Solr has an implied or plugin which could
> help me with it?
>
>
> *Pranav Prakash*
>
> "temet nosce"
>
> Twitter  | Blog  >
> |
> Google 
>


Re: Removing duplicate documents from search results

2011-06-23 Thread Omri Cohen
What you need to do, is to calculate some HASH (using any message digest
algorithm you want, md5, sha-1 and so on), then do some reading on solr
field collapse capabilities. Should not be too complicated..

*Omri Cohen*



Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3-6036295




My profiles: [image: LinkedIn]  [image:
Twitter]  [image:
WordPress]
 Please consider your environmental responsibility. Before printing this
e-mail message, ask yourself whether you really need a hard copy.
IMPORTANT: The contents of this email and any attachments are confidential.
They are intended for the named recipient(s) only. If you have received this
email by mistake, please notify the sender immediately and do not disclose
the contents to anyone or make copies thereof.
Signature powered by

WiseStamp



-- Forwarded message --
From: Pranav Prakash 
Date: Thu, Jun 23, 2011 at 12:26 PM
Subject: Removing duplicate documents from search results
To: solr-user@lucene.apache.org


How can I remove very similar documents from search results?

My scenario is that there are documents in the index which are almost
similar (people submitting same stuff multiple times, sometimes different
people submitting same stuff). Now when a search is performed for "keyword",
in the top N results, quite frequently, same document comes up multiple
times. I want to remove those duplicate (or possible duplicate) documents.
Very similar to what Google does when they say "In order to show you most
relevant result, duplicates have been removed". How can I achieve this
functionality using Solr? Does Solr has an implied or plugin which could
help me with it?


*Pranav Prakash*

"temet nosce"

Twitter  | Blog 
|
Google 


Removing duplicate documents from search results

2011-06-23 Thread Pranav Prakash
How can I remove very similar documents from search results?

My scenario is that there are documents in the index which are almost
similar (people submitting same stuff multiple times, sometimes different
people submitting same stuff). Now when a search is performed for "keyword",
in the top N results, quite frequently, same document comes up multiple
times. I want to remove those duplicate (or possible duplicate) documents.
Very similar to what Google does when they say "In order to show you most
relevant result, duplicates have been removed". How can I achieve this
functionality using Solr? Does Solr has an implied or plugin which could
help me with it?


*Pranav Prakash*

"temet nosce"

Twitter  | Blog  |
Google 


solr scale on trie fields

2011-06-23 Thread Omri Cohen
Hello,

I am trying to normalize values of a certain field, and then use them in a
function query. For that I need to know the maximum and minimum values the
field gets. I am thinking of using the scale(x, minTarget, maxTarget) query
function, but i read in solr book (Solr 1.4 enterprise search server by Eric
Pugh and David Smiley) that "*scale will traverse the entire document set
and evaluate the function to determine the smallest and largest values for
each query invocation, and it is not cached " *. What makes me ask two
questions:

   1. Is this also true for TrieFields (such as solr.TrieIntField), because
   as far as I understand it suppose to have the values sorted in some manner,
   so checking for the min and max val should happen in constant time
   complexity.
   2. why are the results are not cached?!?! is there any way to defined
   them to be cached?

the first question is far more important to me than the second..

thanks

*Omri Cohen*



Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3-6036295




My profiles: [image: LinkedIn]  [image:
Twitter]  [image:
WordPress]
 Please consider your environmental responsibility. Before printing this
e-mail message, ask yourself whether you really need a hard copy.
IMPORTANT: The contents of this email and any attachments are confidential.
They are intended for the named recipient(s) only. If you have received this
email by mistake, please notify the sender immediately and do not disclose
the contents to anyone or make copies thereof.
Signature powered by

WiseStamp


Re: Problem with SolrTestCaseJ4

2011-06-23 Thread Tarjei Huse
On 06/20/2011 01:51 PM, Robert Muir wrote:
> you must use junit 4.7.x, not junit 4.8.x
Is there a way around this?

Depending on a specific Junit version is bound to cause problems when
working with other packages. For example Spring 2.5.6 testframework does
not work newer junit versions than 4.4.

Kind regards,
Tarjei
> On Mon, Jun 20, 2011 at 6:21 AM, Jakob Vad Nielsen
>  wrote:
>> Hi,
>>
>> I'm trying to create some integrations tests within my project using JUnit
>> and the SolrTestCaseJ4 (from Solr-test-framework 3.2.0) helper class. The
>> problem is that I'm getting an AssertionError for LuceneTestCase.java:
>>
>> java.lang.AssertionError: ensure your setUp() calls super.setUp()!!!
>> at org.junit.Assert.fail(Assert.java:91)
>> at org.junit.Assert.assertTrue(Assert.java:43)
>>  at
>> org.apache.lucene.util.LuceneTestCase$1.starting(LuceneTestCase.java:361)
>> at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:46)
>>  at
>> org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
>> at
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
>>  at
>> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1206)
>> at
>> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1124)
>>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>>  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>>  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>> at
>> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>>  at
>> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
>> at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>>  at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
>> at
>> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71)
>>  at
>> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199)
>> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>  at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
>>
>> My code goes like this:
>>
>> public class SolrExpressionBuilderIntegrationTest extends SolrTestCaseJ4 {
>>
>>private SolrServer solrServer;
>>
>>@BeforeClass
>>public static void beforeClass() throws Exception {
>>initCore("src/test/resources/solrConfig.xml",
>> "src/test/resources/schema.xml", "src/test/resources");
>>}
>>
>>@Test
>>public void testEmptyServer() throws Exception {
>>   // test code here
>>}
>>
>> }
>>
>> Any ideas on what I'm doing wrong?
>>
>>
>> Regards,
>>
>> Jakob
>>


-- 
Regards / Med vennlig hilsen
Tarjei Huse
Mobil: 920 63 413



Re: Complex situation

2011-06-23 Thread roySolr
Hello,

I have change my db dates to the correct format like 2011-01-11T00:00:00Z.

Now i have the following data:


Manchester Store2011-01-01T00:00:00Z   
2011-31-03T00:00:00Z 18:00
Manchester Store2011-01-04T00:00:00Z   
2011-31-12T00:00:00Z 20:00

The "Manchester Store" has 2 seasons with different closing times(18:00 and
20:00). Now i have
4 fields in SOLR. 

Companyname Manchester Store
startdate(multiV)  2011-01-01T00:00:00Z, 2011-01-04T00:00:00Z
enddate(multiV)   2011-31-03T00:00:00Z, 2011-31-12T00:00:00Z
closingTime(multiV)  18:00, 20:00

I want some facets like this:

Open today(2011-23-06):
20:00(1)

The facet query needs to look what's the current date and needs to use that
closing time. My facet.query look like this:

facet.query=startdate:[* TO NOW] AND enddate:[NOW TO *] AND
closingTime:"18:00"

This returns 1 count like this: 18:00(1)

When i use this facet.query it returns also 1 result:

facet.query=startdate:[* TO NOW] AND enddate:[NOW TO *] AND
closingTime:"20:00"

This result is not correct because NOW(2011-23-06) it's not open till 20:00.
It looks like there is no link between the season and the closingTime. Can
somebody helps me?? The fields in SOLR are not correct?

Thanks Roy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Complex-situation-tp3071936p3098875.html
Sent from the Solr - User mailing list archive at Nabble.com.