Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
the Solr distro contains all the jar files. you can take either the
latest release (1.3) or a nightly

On Tue, Apr 28, 2009 at 11:34 AM, ahmed baseet  wrote:
> As far as I know, Maven is a build/mgmt tool for java projects quite similar
> to Ant, right? No I'm not using this , then I think I don't need to worry
> about those pom files.
> But  I'm still not able to figure out the error with classpath/jar files I
> mentioned in my previous mails. Shall I try getting those jar files,
> specifically that solr-solrj jar that contains commons-http-solr-server
> class files? If yes then can you tell me where to get those jar files from,
> on the web?  Has anyone ever faced similar problems? Please help me fixing
> these silly issues?
>
> Thanks,
> Ahmed.
> On Mon, Apr 27, 2009 at 6:59 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> On Mon, Apr 27, 2009 at 6:27 PM, ahmed baseet > >wrote:
>>
>> > Can anyone help me selecting the proper pom.xml file out of the bunch of
>> > *-pom.xml.templates available.
>> >
>>
>> Ahmed, are you using Maven? If not, then you do not need these pom files.
>> If
>> you are using Maven, then you need to add a dependency to solrj.
>>
>>
>> http://wiki.apache.org/solr/Solrj#head-674dd7743df665fdd56e8eccddce16fc2de20e6e
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>



-- 
--Noble Paul


Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread ahmed baseet
As far as I know, Maven is a build/mgmt tool for java projects quite similar
to Ant, right? No I'm not using this , then I think I don't need to worry
about those pom files.
But  I'm still not able to figure out the error with classpath/jar files I
mentioned in my previous mails. Shall I try getting those jar files,
specifically that solr-solrj jar that contains commons-http-solr-server
class files? If yes then can you tell me where to get those jar files from,
on the web?  Has anyone ever faced similar problems? Please help me fixing
these silly issues?

Thanks,
Ahmed.
On Mon, Apr 27, 2009 at 6:59 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Apr 27, 2009 at 6:27 PM, ahmed baseet  >wrote:
>
> > Can anyone help me selecting the proper pom.xml file out of the bunch of
> > *-pom.xml.templates available.
> >
>
> Ahmed, are you using Maven? If not, then you do not need these pom files.
> If
> you are using Maven, then you need to add a dependency to solrj.
>
>
> http://wiki.apache.org/solr/Solrj#head-674dd7743df665fdd56e8eccddce16fc2de20e6e
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: half width katakana

2009-04-27 Thread Ashish P

After this should I be using same cjkAnalyzer or use charFilter??
Thanks,
Ashish


Koji Sekiguchi-2 wrote:
> 
> Ashish P wrote:
>> I want to convert half width katakana to full width katakana. I tried
>> using
>> cjk analyzer but not working.
>> Does cjkAnalyzer do it or is there any other way??
>>   
> 
> CharFilter which comes with trunk/Solr 1.4 just covers this type of
> problem.
> If you are using Solr 1.3, try the patch attached below:
> 
> https://issues.apache.org/jira/browse/SOLR-822
> 
> Koji
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/half-width-katakana-tp23270186p23270453.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: half width katakana

2009-04-27 Thread Koji Sekiguchi

Ashish P wrote:

I want to convert half width katakana to full width katakana. I tried using
cjk analyzer but not working.
Does cjkAnalyzer do it or is there any other way??
  


CharFilter which comes with trunk/Solr 1.4 just covers this type of problem.
If you are using Solr 1.3, try the patch attached below:

https://issues.apache.org/jira/browse/SOLR-822

Koji




Re: MacOS "Failed to initialize DataSource:db"+ DataimportHandler ???

2009-04-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
apparently you do not have the driver in the path. drop your driver
jar into ${solr.home}/lib

On Tue, Apr 28, 2009 at 4:42 AM, gateway0  wrote:
>
> Hi,
>
> sure:
> "
> message Severe errors in solr configuration. Check your log files for more
> detailed information on what may be wrong. If you want solr to continue
> after configuration errors, change:
> false in null
> -
> org.apache.solr.common.SolrException: FATAL: Could not create importer.
> DataImporter config invalid at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
> at
> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311)
> at org.apache.solr.core.SolrCore.(SolrCore.java:480) at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
> at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
> at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
> at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
> at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
> at org.apache.catalina.core.StandardContext.start(StandardContext.java:4363)
> at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
> at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
> at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at
> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:627)
> at
> org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
> at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at
> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1149) at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
> at
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
> at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at
> org.apache.catalina.core.StandardHost.start(StandardHost.java:719) at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at
> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at
> org.apache.catalina.core.StandardService.start(StandardService.java:516) at
> org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at
> org.apache.catalina.startup.Catalina.start(Catalina.java:578) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585) at
> org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at
> org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) Caused by:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to
> initialize DataSource: mydb Processing Document # at
> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:308)
> at
> org.apache.solr.handler.dataimport.DataImporter.addDataSource(DataImporter.java:273)
> at
> org.apache.solr.handler.dataimport.DataImporter.initEntity(DataImporter.java:228)
> at
> org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:98)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
> ... 31 more Caused by: org.apache.solr.common.SolrException: Could not load
> driver: com.mysql.jdbc.Driver at
> org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:112)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:65)
> at
> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:306)
> ... 35 more Caused by: java.lang.ClassNotFoundException: Unable to load
> com.mysql.jdbc.Driver or
> org.apache.solr.handler.dataimport.com.mysql.jdbc.Driver at
> org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:587)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:110)
> ... 37 more Caused by: org.apache.solr.common.SolrException: Error loading
> class 'com.mysql.jdbc.Driver' at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273)
> at
> org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:577)
> ... 38 more Caused by: java.lang.ClassNotFoundException:
> com.mysql.jdbc.Driver at
> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1387)
> at
> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233)
> at java.lang.ClassLoader.loadClassInte

Re: DataImportHandler Questions-Load data in parallel and temp tables

2009-04-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
there is an issue already to write to the index in a separate thread.

https://issues.apache.org/jira/browse/SOLR-1089

On Tue, Apr 28, 2009 at 4:15 AM, Shalin Shekhar Mangar
 wrote:
> On Tue, Apr 28, 2009 at 3:43 AM, Amit Nithian  wrote:
>
>> All,
>> I have a few questions regarding the data import handler. We have some
>> pretty gnarly SQL queries to load our indices and our current loader
>> implementation is extremely fragile. I am looking to migrate over to the
>> DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom stuff
>> to remotely load the indices so that my index loader and main search engine
>> are separated.
>
>
> Currently if you want to use DIH then the Solr master doubles up as the
> index loader as well.
>
>
>>
>> Currently, unless I am missing something, the data gathering from the
>> entity
>> and the data processing (i.e. conversion to a Solr Document) is done
>> sequentially and I was looking to make this execute in parallel so that I
>> can have multiple threads processing different parts of the resultset and
>> loading documents into Solr. Secondly, I need to create temporary tables to
>> store results of a few queries and use them later for inner joins was
>> wondering how to best go about this?
>>
>> I am thinking to add support in DIH for the following:
>> 1) Temporary tables (maybe call it temporary entities)? --Specific only to
>> SQL though unless it can be generalized to other sources.
>
>
> Pretty specific to DBs. However, isn't this something that can be done in
> your database with views?
>
>
>>
>> 2) Parallel support
>
>
> Parallelizing import of root-entities might be the easiest to attempt.
> There's also an issue open to write to Solr (tokenization/analysis) in a
> separate thread. Look at https://issues.apache.org/jira/browse/SOLR-1089
>
> We actually wrote a multi-threaded DIH during the initial iterations. But we
> discarded it because we found that the bottleneck was usually the database
> (too many queries) or Lucene indexing itself (analysis, tokenization) etc.
> The improvement was ~10% but it made the code substantially more complex.
>
> The only scenario in which it helped a lot was when importing from HTTP or a
> remote database (slow networks). But if you think it can help in your
> scenario, I'd say go for it.
>
>
>>
>>  - Including some mechanism to get the number of records (whether it be
>> count or the MAX(custom_id)-MIN(custom_id))
>
>
> Not sure what you mean here.
>
>
>>
>> 3) Support in DIH or Solr to post documents to a remote index (i.e. create
>> a
>> new UpdateHandler instead of DirectUpdateHandler2).
>>
>
> Solrj integration would be helpful to many I think. There's an issue open.
> Look at https://issues.apache.org/jira/browse/SOLR-853
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
--Noble Paul


half width katakana

2009-04-27 Thread Ashish P

I want to convert half width katakana to full width katakana. I tried using
cjk analyzer but not working.
Does cjkAnalyzer do it or is there any other way??
-- 
View this message in context: 
http://www.nabble.com/half-width-katakana-tp23270186p23270186.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Fwd: Question about MoreLikeThis

2009-04-27 Thread Otis Gospodnetic

Hello,

Well, if you want documents similar to a specific document, then just make sure 
the query ("q") matches that one document.  You can do that by using the 
uniqueKey field in the query, e.g. q=id:123 .  Then you will get documents 
similar to that one document that matched your id:123 query.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: "jli...@gmail.com" 
> To: solr-user@lucene.apache.org
> Sent: Sunday, April 26, 2009 5:49:26 AM
> Subject: Fwd: Question about MoreLikeThis
> 
> I think I understand it now. It means to return MoreLikeThis
> docs for every doc in the result.
> 
> ===8<==Original message text===
> Hi, I have a question about what MoreLikeThis means - I suppose
> it means "get more documents that are similar to _this_ document".
> So I expect the query always take a known document as argument.
> I wonder how I should interpret this query:
> 
> http://localhost:8983/solr/select?q=apache&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mintf=1&fl=id,score
> 
> It doesn't seem to specify a document. So what's the "This" in
> MoreLikeThis in this case? Or, "this" means something else, and
> not a document?



Re: SOLRizing advice?

2009-04-27 Thread Otis Gospodnetic

My turn to help, Paul.

There is no such page on the Solr Wiki, but I agree with Paul, this can really 
be a quick and painless migration for typical Lucene/Solr setups.  This is 
roughly how I'd do things:

- I'd set up Solr
- I'd create the schema.xml mimicking the fields in the existing Lucene index
- I'd copy over the Lucene index, keeping in mind Lucene jar versions, 
Solr/Lucene jar versions, and index compatibility
- Start Solr
- Go to Admin page and run test queries
- Go to schema/solrconfig.xml and add various other things - proper cache 
sizes, index replication, dismax, spellchecker, etc.

- Go to Lucene-based indexer classes and change them to use Solrj
- Go to Lucene-based searcher classes and change them to use Solrj

I'd leave embedded Solr and dynamic fields for phase 2 of the migration, unless 
those things really are necessary.
I don't think you'd need to do anything with web.xml - solr comes as a webapp, 
packaged in a way, which contains its own web.xml

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Paul Libbrecht 
> To: solr-user@lucene.apache.org
> Sent: Monday, April 27, 2009 12:35:59 AM
> Subject: SOLRizing advice?
> 
> 
> Hello list,
> 
> I am surely not the only one who wishes to migrate from bare lucene to solr.
> Many different reasons can be there, e.g. facetting, web-externalization, 
> ease 
> of update... what interests me here are the steps needed in the form of 
> advice 
> as to what to use.
> 
> Here's a few hints. I would love a web-page grouping all these:
> 
> - first change references to indexwriter/indexreader/indexsearch to be those 
> of 
> SOLR using embedded-solr-server
> 
> - make a first solr schema with appropriate analyzers by defining particular 
> dynamic fields
> 
> - slowly replace the queries methods with solr queries, slowly taking 
> advantage 
> of solr features
> 
> - web-expose the solr core for at least admin by merging the web.xml
> 
> Does such a web-page already exist?
> 
> thanks in advance
> 
> paul



Re: Term highlighting with MoreLikeThisHandler?

2009-04-27 Thread Otis Gospodnetic

Eric,

Have you tried using MLT with parameters described on 
http://wiki.apache.org/solr/HighlightingParameters ?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Eric Sabourin 
> To: solr-user@lucene.apache.org
> Sent: Monday, April 27, 2009 10:31:38 AM
> Subject: Term highlighting with MoreLikeThisHandler?
> 
> I submit a query to the MoreLikeThisHandler to find documents similar to a
> specified document.  This works and I've configured my request handler to
> also return the interesting terms.
> 
> Is it possible to have MLT return to me highlight snippets in the similar
> documents it returns? I mean generate hl snippets of the interesting terms?
> If so how?
> 
> Thanks... Eric



Re: Solr 1.4 Release Date

2009-04-27 Thread Otis Gospodnetic

Gurjot, please see http://wiki.apache.org/solr/Solr1.4 - we are currently 33 
JIRA issues away.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Gurjot Singh 
> To: solr-user@lucene.apache.org
> Sent: Monday, April 27, 2009 12:45:32 PM
> Subject: Solr 1.4 Release Date
> 
> Hi, I am curious to know when is the scheduled/tentative release date of
> Solr 1.4.
> 
> Thanks,
> Gurjot



highlighting html content

2009-04-27 Thread Matt Mitchell
Hi,

I've been looking around but can't seem to find any clear instruction on how
to do this... I'm storing html content and would like to enable highlighting
on the html content. The problem is that the search can sometimes match html
element names or attributes, and when the highlighter adds the highlight
tags, the html is bad.

I've been toying with setting custom pre/post delimiters and then removing
them in the client, but I thought I'd ask the list before I go to far with
that idea :)

Thanks,
Matt


Re: facet results in order of rank

2009-04-27 Thread Gene Campbell
Thanks for the reply

Your thoughts are what I initially was thinking.  But, given some more
consideration, I imagined a system that would take all the docs that
would be returned for a given facet, and get an average score based on
their scores from the original search that produced the facets.  This
would be the facet values rank.  So, a higher ranked facet value would
be more likely to return higher ranked results.

The idea is that if you want a broad loose search over a large
dataset, and you order the results based on rank, so you get the most
relevant results at the top, e.g. the first page in a search engine
website.  You might have pages and pages of results, but it's the
first few pages of results that are highly ranked that most users
generally see.  As the relevance tapers off, then generally do another
search.

However, if you compute facet values on these results, you have no way
of knowing if one facet value for a field is more or less likely to
return higher scored, relevant records for the user.  You end up
getting facet values that match records that is often totally
irrelevant.

We can sort by Index order, or Count of docs returned.  Would I would
like is a sort based on Score, such that it would be
sum(scores)/Count.

I would assume that most users would be interested in the higher
ranked ones more often.  So, a more efficient UI could be built to
show just the high ranked facets on this score, and provide a control
to show all the facets (not just the high ranked ones.)

Does this clear up my post at all?

Perhaps this wouldn't be too hard for me to implement.  I have lots of
Java experience, but no experience with Lucene or Solr code.
thoughts?

thanks
gene




On Tue, Apr 28, 2009 at 10:56 AM, Shalin Shekhar Mangar
 wrote:
> On Fri, Apr 24, 2009 at 12:25 PM, ristretto.rb wrote:
>
>> Hello,
>>
>> Is it possible to order the facet results on some ranking score?
>> I've had a look at the facet.sort param,
>> (
>> http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b061e37c702203c99d8853d5f1
>> )
>> but that seems to order the facet either by count or by index value
>> (in my case alphabetical.)
>>
>
> Facets are not ranked because there is no criteria for determining relevancy
> for them. They are just the count of documents for each term in a given
> field computed for the current result set.
>
>
>>
>> We are facing a big number of facet results for multiple termed
>> queries that are OR'ed together.  We want to keep the OR nature of our
>> queries,
>> but, we want to know which facet values are likely to give you higher
>> ranked results.  We could AND together the terms, to get the facet
>> list to be
>> more manageable, but we would be filtering out too many results.  We
>> prefer to OR terms and let the ranking bring the good stuff to the
>> top.
>>
>> For example, suppose we have a index of all known animals and
>> each doc has a field AO for animal-origin.
>>
>> Suppose we search for:  wolf grey forest Europe
>> And generate facets AO.  We might get the following
>> facet results:
>>
>> For the AO field, lots of countries of the world probably have grey or
>> forest or wolf or Europe in their indexing data, so I'm asserting we'd
>> get a big list here.
>> But, only some of the countries will have all 4 terms, and those are
>> the facets that will be the most interesting to drill down on.  Is
>> there
>> a way to figure out which facet is the most highly ranked like this?
>>
>
> Suppose 10 documents match the query you described. If you facet on AO, then
> it would just go through all the terms in AO and give you the number of
> documents which have that term. There's no question of relevance at all
> here. The returned documents themselves are of course ranked according to
> the relevancy score.
>
> Perhaps I've misunderstood the query?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: MacOS "Failed to initialize DataSource:db"+ DataimportHandler ???

2009-04-27 Thread gateway0

Hi,

sure:
"
message Severe errors in solr configuration. Check your log files for more
detailed information on what may be wrong. If you want solr to continue
after configuration errors, change:
false in null
-
org.apache.solr.common.SolrException: FATAL: Could not create importer.
DataImporter config invalid at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311)
at org.apache.solr.core.SolrCore.(SolrCore.java:480) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4363)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:627)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1149) at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at
org.apache.catalina.core.StandardHost.start(StandardHost.java:719) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at
org.apache.catalina.core.StandardService.start(StandardService.java:516) at
org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at
org.apache.catalina.startup.Catalina.start(Catalina.java:578) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at
org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) Caused by:
org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to
initialize DataSource: mydb Processing Document # at
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:308)
at
org.apache.solr.handler.dataimport.DataImporter.addDataSource(DataImporter.java:273)
at
org.apache.solr.handler.dataimport.DataImporter.initEntity(DataImporter.java:228)
at
org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:98)
at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
... 31 more Caused by: org.apache.solr.common.SolrException: Could not load
driver: com.mysql.jdbc.Driver at
org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:112)
at
org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:65)
at
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:306)
... 35 more Caused by: java.lang.ClassNotFoundException: Unable to load
com.mysql.jdbc.Driver or
org.apache.solr.handler.dataimport.com.mysql.jdbc.Driver at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:587)
at
org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:110)
... 37 more Caused by: org.apache.solr.common.SolrException: Error loading
class 'com.mysql.jdbc.Driver' at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273)
at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:577)
... 38 more Caused by: java.lang.ClassNotFoundException:
com.mysql.jdbc.Driver at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1387)
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374) at
java.lang.Class.forName0(Native Method) at
java.lang.Class.forName(Class.java:242) at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:257)
... 39 more -
org.apache.solr.handler.dataimport.DataImportHandler

Re: Phonetic analysis with the spell-check component?

2009-04-27 Thread Shalin Shekhar Mangar
On Sun, Apr 26, 2009 at 11:55 PM, David Smiley @MITRE.org  wrote:

>
> It appears to me that the spell-check component can't build a dictionary
> based on phonetic similarity (i.e. using a Phonetic analysis filter).
>  Sure,
> you can go ahead and configure the spell check component to use a field
> type
> that uses a phonetic filter but the suggestions presented to the user are
> based on the indexed values (i.e. phonemes), not the original words.  Thus
> the user will be presented with a suggested phoneme which is a poor user
> experience.  It's not clear how this shortcoming could be rectified because
> for a given phoneme, there are potentially multiple words to choose from
> that could be encoded to a given phoneme.
>

Hmm. I think the problem here is that spell checker creates its own index
with the indexed tokens of a Solr field. So it does not have the original
words anymore. But if we could have an option to store the original words as
well into the spell check index, we could return them as suggestions.

Do you mind creating an Jira issue so that we don't forget about this?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Authenticated Indexing Not working

2009-04-27 Thread Shalin Shekhar Mangar
On Sun, Apr 26, 2009 at 11:04 AM, Allahbaksh Asadullah <
allahbaks...@gmail.com> wrote:

> HI Otis,
> I am using HTTPClient for authentication. When I use the server with
> Authentication for searching it works fine. But when I use it for
> indexing it throws error.
>

What is the error? Is it thrown by Solr or your servlet container?

One difference between a search request and update request with Solrj is
that a search request uses HTTP GET by default but an update request uses an
HTTP POST by default. Perhaps your authentication scheme is not configured
correctly for POST requests?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Get the field value that caused the result

2009-04-27 Thread Shalin Shekhar Mangar
On Sat, Apr 25, 2009 at 8:25 PM, Wouter Samaey wrote:

>
> I'm looking into a way to determine the value of a field that caused
> the result to be returned.
>

Can highlighting help here? It returns the snipped from the document which
matched the query.

http://wiki.apache.org/solr/HighlightingParameters

-- 
Regards,
Shalin Shekhar Mangar.


Re: facet results in order of rank

2009-04-27 Thread Shalin Shekhar Mangar
On Fri, Apr 24, 2009 at 12:25 PM, ristretto.rb wrote:

> Hello,
>
> Is it possible to order the facet results on some ranking score?
> I've had a look at the facet.sort param,
> (
> http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b061e37c702203c99d8853d5f1
> )
> but that seems to order the facet either by count or by index value
> (in my case alphabetical.)
>

Facets are not ranked because there is no criteria for determining relevancy
for them. They are just the count of documents for each term in a given
field computed for the current result set.


>
> We are facing a big number of facet results for multiple termed
> queries that are OR'ed together.  We want to keep the OR nature of our
> queries,
> but, we want to know which facet values are likely to give you higher
> ranked results.  We could AND together the terms, to get the facet
> list to be
> more manageable, but we would be filtering out too many results.  We
> prefer to OR terms and let the ranking bring the good stuff to the
> top.
>
> For example, suppose we have a index of all known animals and
> each doc has a field AO for animal-origin.
>
> Suppose we search for:  wolf grey forest Europe
> And generate facets AO.  We might get the following
> facet results:
>
> For the AO field, lots of countries of the world probably have grey or
> forest or wolf or Europe in their indexing data, so I'm asserting we'd
> get a big list here.
> But, only some of the countries will have all 4 terms, and those are
> the facets that will be the most interesting to drill down on.  Is
> there
> a way to figure out which facet is the most highly ranked like this?
>

Suppose 10 documents match the query you described. If you facet on AO, then
it would just go through all the terms in AO and give you the number of
documents which have that term. There's no question of relevance at all
here. The returned documents themselves are of course ranked according to
the relevancy score.

Perhaps I've misunderstood the query?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr test anyone?

2009-04-27 Thread Shalin Shekhar Mangar
Yes, look at AbstractSolrTestCase which is the base class of almost all Solr
tests.

http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/util/AbstractSolrTestCase.java

On Mon, Apr 27, 2009 at 6:38 PM, Eric Pugh
wrote:

> Look into the test code that Solr uses, there is a lot of good stuff on how
> to do testing.
> http://svn.apache.org/repos/asf/lucene/solr/trunk/src/test/.
>
> Eric
>
>
> On Apr 27, 2009, at 6:25 AM, tarjei wrote:
>
>  Hi, I'm looking for ways to test that my indexing methods work correctly
>> with my Solr schema.
>>
>> Therefore I'm wondering if someone has created a test setup where they
>> start a Solr instance and then add some documents to the instance - as a
>> Junit/testng test - preferably with a working Maven dependencies for it as
>> well.
>>
>> I've tried googling for this as well as setting it up myself, but I have
>> never managed to get a test working like I want it to.
>>
>>
>> Kind regards,
>> Tarjei
>>
>
> -
> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com
> Free/Busy: http://tinyurl.com/eric-cal
>
>
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler Questions-Load data in parallel and temp tables

2009-04-27 Thread Shalin Shekhar Mangar
On Tue, Apr 28, 2009 at 3:43 AM, Amit Nithian  wrote:

> All,
> I have a few questions regarding the data import handler. We have some
> pretty gnarly SQL queries to load our indices and our current loader
> implementation is extremely fragile. I am looking to migrate over to the
> DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom stuff
> to remotely load the indices so that my index loader and main search engine
> are separated.


Currently if you want to use DIH then the Solr master doubles up as the
index loader as well.


>
> Currently, unless I am missing something, the data gathering from the
> entity
> and the data processing (i.e. conversion to a Solr Document) is done
> sequentially and I was looking to make this execute in parallel so that I
> can have multiple threads processing different parts of the resultset and
> loading documents into Solr. Secondly, I need to create temporary tables to
> store results of a few queries and use them later for inner joins was
> wondering how to best go about this?
>
> I am thinking to add support in DIH for the following:
> 1) Temporary tables (maybe call it temporary entities)? --Specific only to
> SQL though unless it can be generalized to other sources.


Pretty specific to DBs. However, isn't this something that can be done in
your database with views?


>
> 2) Parallel support


Parallelizing import of root-entities might be the easiest to attempt.
There's also an issue open to write to Solr (tokenization/analysis) in a
separate thread. Look at https://issues.apache.org/jira/browse/SOLR-1089

We actually wrote a multi-threaded DIH during the initial iterations. But we
discarded it because we found that the bottleneck was usually the database
(too many queries) or Lucene indexing itself (analysis, tokenization) etc.
The improvement was ~10% but it made the code substantially more complex.

The only scenario in which it helped a lot was when importing from HTTP or a
remote database (slow networks). But if you think it can help in your
scenario, I'd say go for it.


>
>  - Including some mechanism to get the number of records (whether it be
> count or the MAX(custom_id)-MIN(custom_id))


Not sure what you mean here.


>
> 3) Support in DIH or Solr to post documents to a remote index (i.e. create
> a
> new UpdateHandler instead of DirectUpdateHandler2).
>

Solrj integration would be helpful to many I think. There's an issue open.
Look at https://issues.apache.org/jira/browse/SOLR-853

-- 
Regards,
Shalin Shekhar Mangar.


Re: offline solr indexing

2009-04-27 Thread Shalin Shekhar Mangar
On Tue, Apr 28, 2009 at 12:38 AM, Charles Federspiel <
charles.federsp...@gmail.com> wrote:

> Solr Users,
> Our app servers are setup on read-only filesystems.  Is there a way
> to perform indexing from the command line, then copy the index files to the
> app-server and use Solr to perform search from inside the servlet
> container?


If the filesystem is read-only, then how can you index at all?

But what I think you are describing is the regular master-slave setup that
we use. A dedicated master on which writes are performed. Multiple slaves on
which searches are performed. The index is replicated to slaves through
script or the new java based replication.


> If the Solr implementation is bound to http requests, can Solr perform
> searches against an index that I create with Lucene?
> thank you,


It can but it is a little tricky to get the schema and analysis correct
between your Lucene writer and Solr searcher.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Date faceting - howto improve performance

2009-04-27 Thread Shalin Shekhar Mangar
Sorry, I'm late in this thread.

Did you try using Trie fields (new in 1.4)? The regular date faceting won't
work out-of-the-box for trie fields I think. But you could use facet.query
to achieve the same effect. On my simple benchmarks I've found trie fields
to give a huge improvement in range searches.

On Sat, Apr 25, 2009 at 4:24 PM, Marcus Herou wrote:

> Hi.
>
> One of our faceting use-cases:
> We are creating trend graphs of how many blog posts that contains a certain
> term and groups it by day/week/year etc. with the nice DateMathParser
> functions.
>
> The performance degrades really fast and consumes a lot of memory which
> forces OOM from time to time
> We think it is due the fact that the cardinality of the field publishedDate
> in our index is huge, almost equal to the nr of documents in the index.
>
> We need to address that...
>
> Some questions:
>
> 1. Can a datefield have other date-formats than the default of -MM-dd
> HH:mm:ssZ ?
>
> 2. We are thinking of adding a field to the index which have the format
> -MM-dd to reduce the cardinality, if that field can't be a date, it
> could perhaps be a string, but the question then is if faceting can be used
> ?
>
> 3. Since we now already have such a huge index, is there a way to add a
> field afterwards and apply it to all documents without actually reindexing
> the whole shebang ?
>
> 4. If the field cannot be a string can we just leave out the
> hour/minute/second information and to reduce the cardinality and improve
> performance ? Example: 2009-01-01 00:00:00Z
>
> 5. I am afraid that we need to reindex everything to get this to work
> (negates Q3). We have 8 shards as of current, what would the most efficient
> way be to reindexing the whole shebang ? Dump the entire database to disk
> (sigh), create many xml file splits and use curl in a
> random/hash(numServers) manner on them ?
>
>
> Kindly
>
> //Marcus
>
>
>
>
>
>
>
> --
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> marcus.he...@tailsweep.com
> http://www.tailsweep.com/
> http://blogg.tailsweep.com/
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: MacOS "Failed to initialize DataSource:db"+ DataimportHandler ???

2009-04-27 Thread Shalin Shekhar Mangar
On Tue, Apr 28, 2009 at 1:18 AM, gateway0  wrote:

>
>
> Everything works fine except for the dataimporthandler in solr, I get this
> error message:
> "org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to
> initialize DataSource: mydb Processing Document # at
>
> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:308)
> at "
>
>
Can you please post the complete stack trace?

-- 
Regards,
Shalin Shekhar Mangar.


Re: adding plug-in after search is done

2009-04-27 Thread Shalin Shekhar Mangar
On Tue, Apr 28, 2009 at 12:04 AM, siping liu  wrote:

>
> trying to manipulate search result (like further filtering out unwanted),
> and ordering the results differently. Where is the suitable place for doing
> it? I've been using QueryResponseWriter but that doesn't seem to be the
> right place.
>
>
You should probably look at writing your own SearchComponent. Also look at
the QueryElevationComponent which can help with fixing the positions of some
documents in the result set.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Configuration of format and type index with solr

2009-04-27 Thread Shalin Shekhar Mangar
On Mon, Apr 27, 2009 at 10:40 PM, hpn1975 nasc  wrote:

>
>   1- Guarantee that my searcher (solr) ALWAYS search in my index in *memory
> * (use RAMDirectory). Not to use cache.


It is possible to disable all caches. But it is not possible to use
RAMDirectory right now. This is in progress.

https://issues.apache.org/jira/browse/SOLR-465


>
>   2- Guarantee that my searcher (solr) ALWAYS search in my index in *file
> system* (use FSDirectory).


Yes, that is the default and only way currently.


>
>   3- Persist my index is genereted in only one archive in File System
> (optimized)


The useCompoundFile setting in solrconfig.xml can help here.


>
>   4- Persist my index (RAMDirectory) in serializable archive java. I need
> create a loader that load my .ser and deseriaze the class RAMDirectory e
> set
> in the searcher class.


I don't think you can use RAMDirectory right now. However if the use-case
behind serializing a ram directory is for replication, then there are
alternate methods available.

http://wiki.apache.org/solr/CollectionDistribution
http://wiki.apache.org/solr/SolrReplication


> Is possible add any component that manipule the index
> ?
>

Yes. You can write your own request handlers and search components.

-- 
Regards,
Shalin Shekhar Mangar.


DataImportHandler Questions-Load data in parallel and temp tables

2009-04-27 Thread Amit Nithian
All,
I have a few questions regarding the data import handler. We have some
pretty gnarly SQL queries to load our indices and our current loader
implementation is extremely fragile. I am looking to migrate over to the
DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom stuff
to remotely load the indices so that my index loader and main search engine
are separated.
Currently, unless I am missing something, the data gathering from the entity
and the data processing (i.e. conversion to a Solr Document) is done
sequentially and I was looking to make this execute in parallel so that I
can have multiple threads processing different parts of the resultset and
loading documents into Solr. Secondly, I need to create temporary tables to
store results of a few queries and use them later for inner joins was
wondering how to best go about this?

I am thinking to add support in DIH for the following:
1) Temporary tables (maybe call it temporary entities)? --Specific only to
SQL though unless it can be generalized to other sources.
2) Parallel support
  - Including some mechanism to get the number of records (whether it be
count or the MAX(custom_id)-MIN(custom_id))
3) Support in DIH or Solr to post documents to a remote index (i.e. create a
new UpdateHandler instead of DirectUpdateHandler2).

If any of these exist or anyone else is working on this (OR you have better
suggestions), please let me know.

Thanks!
Amit


Re: Date faceting - howto improve performance

2009-04-27 Thread Marcus Herou
Yes that's exactly what I meant.

Think adding "new" fields to a separate index and use ParallelReader at
query time would be something to investigate at SOLR level.
I think I can spend some time creating a patch for this if you think it is a
good idea and if you think it would be merged into the repo haha.
It is not very main stream but I think everyone with more than a million
docs curses alot over that they need to stop the entire service for a couple
of days just to add a field :)

We have 60M rows now and 50 000M index size (shit 800k per doc, man that is
too much) so we are getting into a state where reindexing is starting to
become impossible...

Keep up the fantastic work

//Marcus



On Mon, Apr 27, 2009 at 5:09 PM, Ning Li  wrote:

> You mean doc A and doc B will become one doc after adding index 2 to
> index 1? I don't think this is currently supported either at Lucene
> level or at Solr level. If index 1 has m docs and index 2 has n docs,
> index 1 will have m+n docs after adding index 2 to index 1. Documents
> themselves are not modified by index merge.
>
> Cheers,
> Ning
>
>
> On Sat, Apr 25, 2009 at 4:03 PM, Marcus Herou
>  wrote:
> > Hmm looking in the code for the IndexMerger in Solr
> > (org.apache.solr.update.DirectUpdateHandler(2)
> >
> > See that the IndexWriter.addIndexesNoOptimize(dirs) is used (union of
> > indexes) ?
> >
> > And the test class
> org.apache.solr.client.solrj.MergeIndexesExampleTestBase
> > suggests:
> > add doc A to index1 with id=AAA,name=core1
> > add doc B to index2 with id=BBB,name=core2
> > merge the two indexes into one index which then contains both docs.
> > The resulting index will have 2 docs.
> >
> > Great but in my case I think it should work more like this.
> >
> > add doc A to index1 with id=X,title=blog entry title,description=blog
> entry
> > description
> > add doc B to index2 with id=X,score=1.2
> > somehow add index2 to index1 so id=XX has score=1.2 when searching in
> index1
> > The resulting index should have 1 doc.
> >
> > So this is not really what I want right ?
> >
> > Sorry for being a smart-ass...
> >
> > Kindly
> >
> > //Marcus
> >
> >
> >
> >
> >
> > On Sat, Apr 25, 2009 at 5:10 PM, Marcus Herou <
> marcus.he...@tailsweep.com>wrote:
> >
> >> Guys!
> >>
> >> Thanks for these insights, I think we will head for Lucene level merging
> >> strategy (two or more indexes).
> >> When merging I guess the second index need to have the same doc ids
> >> somehow. This is an internal id in Lucene, not that easy to get hold of
> >> right ?
> >>
> >> So you are saying the the solr: ExternalFileField + FunctionQuery stuff
> >> would not work very well performance wise or what do you mean ?
> >>
> >> I sure like bleeding edge :)
> >>
> >> Cheers dudes
> >>
> >> //Marcus
> >>
> >>
> >>
> >>
> >>
> >> On Sat, Apr 25, 2009 at 3:46 PM, Otis Gospodnetic <
> >> otis_gospodne...@yahoo.com> wrote:
> >>
> >>>
> >>> I should emphasize that the PR trick I mentioned is something you'd do
> at
> >>> the Lucene level, outside Solr, and then you'd just slip the modified
> index
> >>> back into Solr.
> >>> Of, if you like the bleeding edge, perhaps you can make use of Ning
> Li's
> >>> Solr index merging functionality (patch in JIRA).
> >>>
> >>>
> >>> Otis --
> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>>
> >>>
> >>>
> >>> - Original Message 
> >>> > From: Otis Gospodnetic 
> >>> > To: solr-user@lucene.apache.org
> >>> > Sent: Saturday, April 25, 2009 9:41:45 AM
> >>> > Subject: Re: Date faceting - howto improve performance
> >>> >
> >>> >
> >>> > Yes, you could simply round the date, no need for a non-date type
> field.
> >>> > Yes, you can add a field after the fact by making use of
> ParallelReader
> >>> and
> >>> > merging (I don't recall the details, search the ML for ParallelReader
> >>> and
> >>> > Andrzej), I remember he once provided the working recipe.
> >>> >
> >>> >
> >>> > Otis --
> >>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>> >
> >>> >
> >>> >
> >>> > - Original Message 
> >>> > > From: Marcus Herou
> >>> > > To: solr-user@lucene.apache.org
> >>> > > Sent: Saturday, April 25, 2009 6:54:02 AM
> >>> > > Subject: Date faceting - howto improve performance
> >>> > >
> >>> > > Hi.
> >>> > >
> >>> > > One of our faceting use-cases:
> >>> > > We are creating trend graphs of how many blog posts that contains a
> >>> certain
> >>> > > term and groups it by day/week/year etc. with the nice
> DateMathParser
> >>> > > functions.
> >>> > >
> >>> > > The performance degrades really fast and consumes a lot of memory
> >>> which
> >>> > > forces OOM from time to time
> >>> > > We think it is due the fact that the cardinality of the field
> >>> publishedDate
> >>> > > in our index is huge, almost equal to the nr of documents in the
> >>> index.
> >>> > >
> >>> > > We need to address that...
> >>> > >
> >>> > > Some questions:
> >>> > >
> >>> > > 1. Can a datefield have other date-formats than the default of
> >

Re: offline solr indexing

2009-04-27 Thread Amit Nithian
Not sure if this helps but could you make this a solr server that is not
accessible by any other means (except internal), perform your index build
using the dataimporthandler and use Solr's replication mechanisms to move
the indices across?
You can issue the HTTP request to rebuild the index from the command line
(i.e. GET ..)

On Mon, Apr 27, 2009 at 12:08 PM, Charles Federspiel <
charles.federsp...@gmail.com> wrote:

> Solr Users,
> Our app servers are setup on read-only filesystems.  Is there a way
> to perform indexing from the command line, then copy the index files to the
> app-server and use Solr to perform search from inside the servlet
> container?
>
> If the Solr implementation is bound to http requests, can Solr perform
> searches against an index that I create with Lucene?
> thank you,
> Charles Federspiel
>


MacOS "Failed to initialize DataSource:db"+ DataimportHandler ???

2009-04-27 Thread gateway0

Hi,

I want to transfer my solr project to mac os leopard so I installed xampp
for mac and Tomcat 6.0.18.

Everything works fine except for the dataimporthandler in solr, I get this
error message:
"org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to
initialize DataSource: mydb Processing Document # at
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:308)
at "

it seems failing in my data-config.xml file strange thing though the exact
same file structure works under windows. Anyway here is the file:
"


  

 

 



  

"

I have no idea what could cause the error. The only difference to my windows
system is that xampp and the tomcat container are running seperatedly
because of the fact that there is no xampp tomcat plugin for mac os but
without the dataimporthandler solr is working just fine anyways.

Ideas?

best regards, Sebastian 
-- 
View this message in context: 
http://www.nabble.com/MacOS-%22Failed-to-initialize-DataSource%3Adb%22%2B-DataimportHandler-tp23263640p23263640.html
Sent from the Solr - User mailing list archive at Nabble.com.



offline solr indexing

2009-04-27 Thread Charles Federspiel
Solr Users,
Our app servers are setup on read-only filesystems.  Is there a way
to perform indexing from the command line, then copy the index files to the
app-server and use Solr to perform search from inside the servlet container?

If the Solr implementation is bound to http requests, can Solr perform
searches against an index that I create with Lucene?
thank you,
Charles Federspiel


Re: fail to create or find snapshoot

2009-04-27 Thread Jian Han Guo
Actually, I found the snapshot in the directory where solr was lauched. Is
this done on purpose? shouldn't it be in the data directory?

Thanks,

Jianhan


On Mon, Apr 27, 2009 at 11:43 AM, Jian Han Guo  wrote:

> Hi,
>
> According to Solr's wiki page http://wiki.apache.org/solr/SolrReplication,
> if I send the following request to master, a snapshoot will be created
>
> http://master_host:port/solr/replication?command=snapshoot
>
>
> But after I did it, nothing seemed happening.
>
> I got this response back,
>
> 
> 
> 0 name="QTime">2
> 
>
> and I checked the data directory, no snapshoot was created.
>
> I am not sure what to expect after making the request, and where to find
> the snapshoot files (and what they are).
>
> Thanks,
>
> Jianhan
>
>
>
>
>
>


fail to create or find snapshoot

2009-04-27 Thread Jian Han Guo
Hi,

According to Solr's wiki page http://wiki.apache.org/solr/SolrReplication,
if I send the following request to master, a snapshoot will be created

http://master_host:port/solr/replication?command=snapshoot


But after I did it, nothing seemed happening.

I got this response back,



02


and I checked the data directory, no snapshoot was created.

I am not sure what to expect after making the request, and where to find the
snapshoot files (and what they are).

Thanks,

Jianhan


adding plug-in after search is done

2009-04-27 Thread siping liu

trying to manipulate search result (like further filtering out unwanted), and 
ordering the results differently. Where is the suitable place for doing it? 
I've been using QueryResponseWriter but that doesn't seem to be the right place.

thanks.

_
Rediscover Hotmail®: Get quick friend updates right in your inbox. 
http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates2_042009

Re: Solr Performance bottleneck

2009-04-27 Thread Walter Underwood
This isn't a new problem, NFS was 100X slower than local disk for me
with Solr 1.1.

Backing up indexes is very tricky. You need to do it while the are
not being updated, or you'll get a corrupt copy. If your indexes
aren't large, you are probably better off backing up the source
documents and building new indexes from scratch.

wunder

On 4/27/09 11:27 AM, "Jon Bodner"  wrote:

> 
> As a follow-up note, we solved our problem by moving the indexes to local
> store and upgrading to Solr 1.4.  I did a thread dump against our 1.3 Solr
> instance and it was spending lots of time blocking on index section loading.
> The NIO implementation in 1.4 solved that problem and copying to local store
> almost certainly reduced file loading time.
> 
> Trying to point multiple Solrs  on multiple boxes at a single shared
> directory is almost certainly doomed to failure; the read-only Solrs won't
> know when the read/write Solr instance has updated the index.
> 
> We are going to try to move our indexes back to shared disk, as our backup
> solutions are all tied to the shared disk.  Also, if an individual box
> fails, we can bring up a new box and point it at the shared disk.  Are there
> any known problems with NIO and NFS that will cause this to fail?  Can
> anyone suggest a better solution?
> 
> Thanks,
> 
> Jon



Re: Solr Performance bottleneck

2009-04-27 Thread Jon Bodner

As a follow-up note, we solved our problem by moving the indexes to local
store and upgrading to Solr 1.4.  I did a thread dump against our 1.3 Solr
instance and it was spending lots of time blocking on index section loading. 
The NIO implementation in 1.4 solved that problem and copying to local store
almost certainly reduced file loading time.

Trying to point multiple Solrs  on multiple boxes at a single shared
directory is almost certainly doomed to failure; the read-only Solrs won't
know when the read/write Solr instance has updated the index.

We are going to try to move our indexes back to shared disk, as our backup
solutions are all tied to the shared disk.  Also, if an individual box
fails, we can bring up a new box and point it at the shared disk.  Are there
any known problems with NIO and NFS that will cause this to fail?  Can
anyone suggest a better solution?

Thanks,

Jon

-- 
View this message in context: 
http://www.nabble.com/Solr-Performance-bottleneck-tp23209595p23262198.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to index the contents from SVN repository

2009-04-27 Thread Ryan McKinley

I would suggest looking at Apache commons VFS and using the solrj API:

http://commons.apache.org/vfs/

With SVN, you may be able to use the webdav provider.

ryan



On Apr 26, 2009, at 4:08 AM, Ashish P wrote:



Is there any way to index contents of SVN rep in Solr ??
--
View this message in context: 
http://www.nabble.com/How-to-index-the-contents-from-SVN-repository-tp23240110p23240110.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: ExtractingRequestHandler and SolrRequestHandler issue

2009-04-27 Thread francisco treacy
Thanks for your answers. Still no success.

>> These need to be in your Solr home lib, not example/lib.  I sometimes get
>> confused on this one, too, forgetting that I need to go down a few more
>> directories.  The example/lib directory is where the Jetty stuff lives,
>> example/solr/lib is the lib where the plugins go.

Well, actually I need libs in example, cause I'm launching it like so:

java -Dsolr.solr.home="/my/path/to/solr" -jar start.jar

Anyway, I tried copying libraries to solr home lib, this didn't help.
I keep getting the aforementioned ClassCastException.

>> In fact, if you run "ant
>> example" from the top level (or contrib/extraction) it should place the JARs
>> in the right places for the example.

Also, if I try to compile "ant example" it fails with some other
exception (some mozilla js class not found). I will try some
workaround here.

 in solr.xml didn't help either.

Should I be using the Jetty provided for the example while I'm in
development? It has worked great so far, but I'm stuck with
extraction. Will let you know, but please if any other ideas ping me a
message.

Thanks

Francisco



2009/4/22 Peter Wolanin :
> I had problems with this when trying to set this up with multiple
> cores - I had to set the shared lib as:
>
> 
>
> in example/solr/solr.xml in order for it to find the jars in example/solr/lib
>
> -Peter
>
> On Wed, Apr 22, 2009 at 11:43 AM, Grant Ingersoll  wrote:
>>
>> On Apr 20, 2009, at 12:46 PM, francisco treacy wrote:
>>
>>> Additionally, here's what I've got in example/lib:
>>
>> These need to be in your Solr home lib, not example/lib.  I sometimes get
>> confused on this one, too, forgetting that I need to go down a few more
>> directories.  The example/lib directory is where the Jetty stuff lives,
>> example/solr/lib is the lib where the plugins go.  In fact, if you run "ant
>> example" from the top level (or contrib/extraction) it should place the JARs
>> in the right places for the example.
>>
>>>
>>>
>>> apache-solr-cell-nightly.jar   bcmail-jdk14-132.jar
>>> commons-lang-2.1.jar       icu4j-3.8.jar         log4j-1.2.14.jar
>>> poi-3.5-beta5.jar             slf4j-api-1.5.5.jar
>>> xml-apis-1.0.b2.jar
>>> apache-solr-core-nightly.jar   bcprov-jdk14-132.jar
>>> commons-logging-1.0.4.jar  jetty-6.1.3.jar       nekohtml-1.9.9.jar
>>> poi-ooxml-3.5-beta5.jar       slf4j-jdk14-1.5.5.jar
>>> xmlbeans-2.3.0.jar
>>> apache-solr-solrj-nightly.jar  commons-codec-1.3.jar  dom4j-1.6.1.jar
>>>         jetty-util-6.1.3.jar  ooxml-schemas-1.0.jar
>>> poi-scratchpad-3.5-beta5.jar  tika-0.3.jar
>>> asm-3.1.jar                    commons-io-1.4.jar
>>> fontbox-0.1.0-dev.jar      jsp-2.1               pdfbox-0.7.3.jar
>>> servlet-api-2.5-6.1.3.jar     xercesImpl-2.8.1.jar
>>>
>>> Actually I wasn't very accurate. Following the wiki didn't suffice. I
>>> had to add other jars, in order to avoid ClassNotFoundExceptions at
>>> startup. These are
>>>
>>> apache-solr-core-nightly.jar
>>> apache-solr-solrj-nightly.jar
>>> slf4j-api-1.5.5.jar
>>> slf4j-jdk14-1.5.5.jar
>>>
>>> even while using solr nightly war (in example/webapps).
>>>
>>> Perhaps something wrong with jar versions?
>>>
>>> Francisco
>>>
>>>
>>> 2009/4/20 francisco treacy :

 Hi Grant,

 Here is the full stacktrace:

 20-Apr-2009 12:36:39 org.apache.solr.common.SolrException log
 SEVERE: java.lang.ClassCastException:
 org.apache.solr.handler.extraction.ExtractingRequestHandler cannot be
 cast to org.apache.solr.request.SolrRequestHandler
       at
 org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:154)
       at
 org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:163)
       at
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
       at
 org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:171)
       at org.apache.solr.core.SolrCore.(SolrCore.java:535)
       at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:122)
       at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
       at
 org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
       at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
       at
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
       at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
       at
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
       at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
       at
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
       at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
       at
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:

Configuration of format and type index with solr

2009-04-27 Thread hpn1975 nasc
Hi,

  I work with Lucne there is some years and I use some advanced resources of
the library as different formats of index and types of persistency. Now I
would like use Solr.

  Is possible to configure these resources using solr ? My doubt is about of
possibility of configurate in solr this four themes:

   1- Guarantee that my searcher (solr) ALWAYS search in my index in *memory
* (use RAMDirectory). Not to use cache.
   2- Guarantee that my searcher (solr) ALWAYS search in my index in *file
system* (use FSDirectory).
   3- Persist my index is genereted in only one archive in File System
(optimized)
   4- Persist my index (RAMDirectory) in serializable archive java. I need
create a loader that load my .ser and deseriaze the class RAMDirectory e set
in the searcher class. Is possible add any component that manipule the index
?

  Thanks

  Haroldo


Solr 1.4 Release Date

2009-04-27 Thread Gurjot Singh
Hi, I am curious to know when is the scheduled/tentative release date of
Solr 1.4.

Thanks,
Gurjot


RE: How to index the contents from SVN repository

2009-04-27 Thread Steven A Rowe
Hi Ashish,

The excellent SVN/CVS repo browser ViewVC  has tools to 
record SVN/CVS commit metadata in a database - seeing how they do it may give 
you some hints.

The INSTALL file gives pointers to the relevant tools (look for the "SQL 
CHECKIN DATABASE" section):

http://viewvc.tigris.org/svn/viewvc/trunk/INSTALL

ViewVC doesn't have file content search capabilities yet - maybe while you're 
at it, you could contribute your work to that project :).

Good luck,
Steve

On 4/27/2009 at 1:12 AM, Ashish P wrote:
> Right. But is there a way to track file updates and diffs.
> Thanks,
> Ashish
> 
> Noble Paul നോബിള്‍  नोब्ळ् wrote:
> >
> > If you can check it out into a directory using SVN command then you
> > may use DIH to index the content.
> >
> > a combination of FileListEntityProcessor and PlainTextEntityProcessor
> > may help
> >
> > On Sun, Apr 26, 2009 at 1:38 PM, Ashish P 
> > wrote:
> >>
> >> Is there any way to index contents of SVN rep in Solr ??


Re: Date faceting - howto improve performance

2009-04-27 Thread Ning Li
You mean doc A and doc B will become one doc after adding index 2 to
index 1? I don't think this is currently supported either at Lucene
level or at Solr level. If index 1 has m docs and index 2 has n docs,
index 1 will have m+n docs after adding index 2 to index 1. Documents
themselves are not modified by index merge.

Cheers,
Ning


On Sat, Apr 25, 2009 at 4:03 PM, Marcus Herou
 wrote:
> Hmm looking in the code for the IndexMerger in Solr
> (org.apache.solr.update.DirectUpdateHandler(2)
>
> See that the IndexWriter.addIndexesNoOptimize(dirs) is used (union of
> indexes) ?
>
> And the test class org.apache.solr.client.solrj.MergeIndexesExampleTestBase
> suggests:
> add doc A to index1 with id=AAA,name=core1
> add doc B to index2 with id=BBB,name=core2
> merge the two indexes into one index which then contains both docs.
> The resulting index will have 2 docs.
>
> Great but in my case I think it should work more like this.
>
> add doc A to index1 with id=X,title=blog entry title,description=blog entry
> description
> add doc B to index2 with id=X,score=1.2
> somehow add index2 to index1 so id=XX has score=1.2 when searching in index1
> The resulting index should have 1 doc.
>
> So this is not really what I want right ?
>
> Sorry for being a smart-ass...
>
> Kindly
>
> //Marcus
>
>
>
>
>
> On Sat, Apr 25, 2009 at 5:10 PM, Marcus Herou 
> wrote:
>
>> Guys!
>>
>> Thanks for these insights, I think we will head for Lucene level merging
>> strategy (two or more indexes).
>> When merging I guess the second index need to have the same doc ids
>> somehow. This is an internal id in Lucene, not that easy to get hold of
>> right ?
>>
>> So you are saying the the solr: ExternalFileField + FunctionQuery stuff
>> would not work very well performance wise or what do you mean ?
>>
>> I sure like bleeding edge :)
>>
>> Cheers dudes
>>
>> //Marcus
>>
>>
>>
>>
>>
>> On Sat, Apr 25, 2009 at 3:46 PM, Otis Gospodnetic <
>> otis_gospodne...@yahoo.com> wrote:
>>
>>>
>>> I should emphasize that the PR trick I mentioned is something you'd do at
>>> the Lucene level, outside Solr, and then you'd just slip the modified index
>>> back into Solr.
>>> Of, if you like the bleeding edge, perhaps you can make use of Ning Li's
>>> Solr index merging functionality (patch in JIRA).
>>>
>>>
>>> Otis --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>>
>>>
>>> - Original Message 
>>> > From: Otis Gospodnetic 
>>> > To: solr-user@lucene.apache.org
>>> > Sent: Saturday, April 25, 2009 9:41:45 AM
>>> > Subject: Re: Date faceting - howto improve performance
>>> >
>>> >
>>> > Yes, you could simply round the date, no need for a non-date type field.
>>> > Yes, you can add a field after the fact by making use of ParallelReader
>>> and
>>> > merging (I don't recall the details, search the ML for ParallelReader
>>> and
>>> > Andrzej), I remember he once provided the working recipe.
>>> >
>>> >
>>> > Otis --
>>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> >
>>> >
>>> >
>>> > - Original Message 
>>> > > From: Marcus Herou
>>> > > To: solr-user@lucene.apache.org
>>> > > Sent: Saturday, April 25, 2009 6:54:02 AM
>>> > > Subject: Date faceting - howto improve performance
>>> > >
>>> > > Hi.
>>> > >
>>> > > One of our faceting use-cases:
>>> > > We are creating trend graphs of how many blog posts that contains a
>>> certain
>>> > > term and groups it by day/week/year etc. with the nice DateMathParser
>>> > > functions.
>>> > >
>>> > > The performance degrades really fast and consumes a lot of memory
>>> which
>>> > > forces OOM from time to time
>>> > > We think it is due the fact that the cardinality of the field
>>> publishedDate
>>> > > in our index is huge, almost equal to the nr of documents in the
>>> index.
>>> > >
>>> > > We need to address that...
>>> > >
>>> > > Some questions:
>>> > >
>>> > > 1. Can a datefield have other date-formats than the default of
>>> -MM-dd
>>> > > HH:mm:ssZ ?
>>> > >
>>> > > 2. We are thinking of adding a field to the index which have the
>>> format
>>> > > -MM-dd to reduce the cardinality, if that field can't be a date,
>>> it
>>> > > could perhaps be a string, but the question then is if faceting can be
>>> used
>>> > > ?
>>> > >
>>> > > 3. Since we now already have such a huge index, is there a way to add
>>> a
>>> > > field afterwards and apply it to all documents without actually
>>> reindexing
>>> > > the whole shebang ?
>>> > >
>>> > > 4. If the field cannot be a string can we just leave out the
>>> > > hour/minute/second information and to reduce the cardinality and
>>> improve
>>> > > performance ? Example: 2009-01-01 00:00:00Z
>>> > >
>>> > > 5. I am afraid that we need to reindex everything to get this to work
>>> > > (negates Q3). We have 8 shards as of current, what would the most
>>> efficient
>>> > > way be to reindexing the whole shebang ? Dump the entire database to
>>> disk
>>> > > (sigh), create many xml file splits and use curl in a
>>> > > 

Term highlighting with MoreLikeThisHandler?

2009-04-27 Thread Eric Sabourin
I submit a query to the MoreLikeThisHandler to find documents similar to a
specified document.  This works and I've configured my request handler to
also return the interesting terms.

Is it possible to have MLT return to me highlight snippets in the similar
documents it returns? I mean generate hl snippets of the interesting terms?
If so how?

Thanks... Eric


Re: boost qf weight between 0 and 10

2009-04-27 Thread sunnyfr

Hi Hoss, 
thanks for this answser, and is there a way to get the weight of a field ? 
like that and use it in the bf? queryWeight


  0.14232224 = (MATCH) weight(text:chien^0.2 in 9412049), product of:
0.0813888 = queryWeight(text:chien^0.2), product of:
  0.2 = boost
  6.5946517 = idf(docFreq=55585, numDocs=14951742)
  0.061708186 = queryNorm


thanks 


hossman wrote:
> 
> 
> : I don't get really, I try to boost a field according to another one but
> I've
> : a huge weight when I'm using qf boost like :
> : 
> : /select?qt=dismax&fl=*&q="obama
> : meeting"&debugQuery=true&qf=title&bf=product(title,stat_views)
> 
> bf is a boost function -- you are using a product fucntion to multiply the 
> "title" field by the stat_views" field ... this doesn't make sense to me?
> 
> i'm assuming the "title" field contains text (the rest of your score 
> explanation confirms this).  when you try to do a math function on a 
> string based field it deals with the "ordinal" value -- the higher the 
> string is lexigraphically compared to all other docs ,the higher the 
> ordinal value.
> 
> i have no idea what's in your stat_views field -- but i can't imagine any 
> way in which multipling it by the ordinal value of your text field would 
> make sense...
> 
> :   5803675.5 = (MATCH)
> FunctionQuery(product(ord(title),sint(stat_views))),
> : product of:
> : 9.5142968E7 = product(ord(title)=1119329,sint(stat_views)=85)
> : 1.0 = boost
> : 0.06099952 = queryNorm
> 
> : But this is not equilibrate between this boost in qf and bf, how can I
> do ?
> 
> when it comes to function query, you're on your own to figure out an 
> appropriate query boost to blanace the scores out -- when you use a 
> product function the scores are going to get huge like this unless you 
> balance it somehow (and that ord(title) is just making this massively 
> worse)
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/boost-qf-weight-between-0-and-10-tp22081396p23257545.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread Shalin Shekhar Mangar
On Mon, Apr 27, 2009 at 6:27 PM, ahmed baseet wrote:

> Can anyone help me selecting the proper pom.xml file out of the bunch of
> *-pom.xml.templates available.
>

Ahmed, are you using Maven? If not, then you do not need these pom files. If
you are using Maven, then you need to add a dependency to solrj.

http://wiki.apache.org/solr/Solrj#head-674dd7743df665fdd56e8eccddce16fc2de20e6e

-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr test anyone?

2009-04-27 Thread Eric Pugh
Look into the test code that Solr uses, there is a lot of good stuff  
on how to do testing.  http://svn.apache.org/repos/asf/lucene/solr/trunk/src/test/ 
.


Eric

On Apr 27, 2009, at 6:25 AM, tarjei wrote:

Hi, I'm looking for ways to test that my indexing methods work  
correctly with my Solr schema.


Therefore I'm wondering if someone has created a test setup where  
they start a Solr instance and then add some documents to the  
instance - as a Junit/testng test - preferably with a working Maven  
dependencies for it as well.


I've tried googling for this as well as setting it up myself, but I  
have never managed to get a test working like I want it to.



Kind regards,
Tarjei


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal






Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread ahmed baseet
Can anyone help me selecting the proper pom.xml file out of the bunch of
*-pom.xml.templates available.
I got the following when searched for pom.xml files,
solr-common-csv-pom.xml
solr-lucene-analyzers-pom.xml
solr-lucene-contrib-pom.xml
solr-lucene-*-pom.xml [ a lot of solr-lucene-... pom files are available,
hence shortened to avoid typing all]
solr-dataimporthandler-pom.xml
solr-common-pom.xml
solr-core-pom.xml
solr-parent-pom.xml
solr-solr-pom.xml

Thanks,
Ahmed.

On Mon, Apr 27, 2009 at 5:38 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Apr 27, 2009 at 4:53 PM, ahmed baseet  >wrote:
>
> >
> > To be precise it gives me the following error,
> >  .cannot find symbol:
> > symbol : class CommonsHttpSolrServer
> >
> > I rechecked to make sure that "commons-httpclient-3.1.jar" is in the
> class
> > path. Can someone please point me what is the issue?
> >
> > I'm working on Windows and my classpath variable is this:
> >
> > .;E:\Program Files\Java\jdk1.6.0_05\bin;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-httpclient-3.1.jar;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-common.jar;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-1.3.0.jar;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\solr-solrj-1.3.0.jar;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-io-1.3.1.jar;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-codec-1.3.jar;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-logging-1.0.4.jar
> >
>
> The jars look right. It is likely a problem with your classpath.
> CommonsHttpSolrServer is in the solr-solrj jar.
>
> If you are using Maven, then you'd need to change your pom.xml
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread Shalin Shekhar Mangar
On Mon, Apr 27, 2009 at 4:53 PM, ahmed baseet wrote:

>
> To be precise it gives me the following error,
>  .cannot find symbol:
> symbol : class CommonsHttpSolrServer
>
> I rechecked to make sure that "commons-httpclient-3.1.jar" is in the class
> path. Can someone please point me what is the issue?
>
> I'm working on Windows and my classpath variable is this:
>
> .;E:\Program Files\Java\jdk1.6.0_05\bin;D:\firefox
>
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-httpclient-3.1.jar;D:\firefox
>
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-common.jar;D:\firefox
>
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-1.3.0.jar;D:\firefox
>
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\solr-solrj-1.3.0.jar;D:\firefox
>
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-io-1.3.1.jar;D:\firefox
>
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-codec-1.3.jar;D:\firefox
>
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-logging-1.0.4.jar
>

The jars look right. It is likely a problem with your classpath.
CommonsHttpSolrServer is in the solr-solrj jar.

If you are using Maven, then you'd need to change your pom.xml

-- 
Regards,
Shalin Shekhar Mangar.


Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread ahmed baseet
Hi,
After going through the solrj wiki I found that we've to set some
dependencies in pom.xml for using Solrj, which I haven't done yet. So I
googled to know how to do that but no help. I searched the solr directory
and found a bunch of *-pom.template files [like solr-core-pom.xml,
solr-solrj-pom.xml etc] and I'm not able to figure out which one to use. Any
help would be appreciated.

Thanks,
Ahmed.

On Mon, Apr 27, 2009 at 4:53 PM, ahmed baseet wrote:

> Shalin, thanks for your quick response.
>
> Actually I'm trying to pull plaintext from html pages and trying to make
> xml files for each page. I went through the SolrJ webpage and found that the
> we've to add all the field and its contents anyway, right? but yes it makes
> adding/updating etc quite easier than using that SimplePostTool.
>  I tried to use SolrJ client but it doesnot seem to be working. I added all
> the jar files mentioned in SolrJ wiki to classpath but still its giving me
> some error.
>
> To be precise it gives me the following error,
>  .cannot find symbol:
> symbol : class CommonsHttpSolrServer
>
> I rechecked to make sure that "commons-httpclient-3.1.jar" is in the class
> path. Can someone please point me what is the issue?
>
> I'm working on Windows and my classpath variable is this:
>
> .;E:\Program Files\Java\jdk1.6.0_05\bin;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-httpclient-3.1.jar;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-common.jar;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-1.3.0.jar;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\solr-solrj-1.3.0.jar;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-io-1.3.1.jar;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-codec-1.3.jar;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-logging-1.0.4.jar
>
> Thank you very much.
> Ahmed.
>
>
>
> On Mon, Apr 27, 2009 at 3:55 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> On Mon, Apr 27, 2009 at 3:30 PM, ahmed baseet > >wrote:
>>
>> > Hi All,
>> > I'm trying to post some files to Solr server. I've done this using the
>> > post.jar files for posting xml files residing on my local disk[I tried
>> > posting all those xml files from example directory]. Now I'm trying to
>> > generate xml files on the fly, with required text to be indexed included
>> > therein though, and want to post these files to solr. As per the
>> examples
>> > we've used "SimplePostTool" for posting locally resinding files but can
>> > some
>> > one give me direction on indexing in-memory xml files[files generated on
>> > the
>> > fly]. Actually I want to automate this process in a loop, so that I'll
>> > extract some information and put that to xml file and push it off to
>> Solr
>> > for indexing.
>> > Thanks in appreciation.
>> >
>>
>>
>> You can use the Solrj client to avoid building the intermediate XML
>> yourself. Extract the information, use the Solrj api to add the extracted
>> text to fields and send them to the solr server.
>>
>> http://wiki.apache.org/solr/Solrj
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>


Re: Solr index

2009-04-27 Thread aidahaj

Thanks a lot,
I have made a look in these classes.
But what I exactly want to do is to detect if a Document(in the index of
solr)has changed when I recrawl a site with Nutch.
Not to block deduplication, but to detect if a Document has changed and
extract changes in a file without writing them over the old Document.
After that I decide wether to rewrite the Document or to keep both of them
the old and the new one.
I wish I am more precise.
Thanks and permit my poor english.


-- 
View this message in context: 
http://www.nabble.com/Solr-index-tp23219842p23254601.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread ahmed baseet
Shalin, thanks for your quick response.

Actually I'm trying to pull plaintext from html pages and trying to make xml
files for each page. I went through the SolrJ webpage and found that the
we've to add all the field and its contents anyway, right? but yes it makes
adding/updating etc quite easier than using that SimplePostTool.
 I tried to use SolrJ client but it doesnot seem to be working. I added all
the jar files mentioned in SolrJ wiki to classpath but still its giving me
some error.

To be precise it gives me the following error,
 .cannot find symbol:
symbol : class CommonsHttpSolrServer

I rechecked to make sure that "commons-httpclient-3.1.jar" is in the class
path. Can someone please point me what is the issue?

I'm working on Windows and my classpath variable is this:

.;E:\Program Files\Java\jdk1.6.0_05\bin;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-httpclient-3.1.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-common.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-1.3.0.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\solr-solrj-1.3.0.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-io-1.3.1.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-codec-1.3.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-logging-1.0.4.jar

Thank you very much.
Ahmed.


On Mon, Apr 27, 2009 at 3:55 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Apr 27, 2009 at 3:30 PM, ahmed baseet  >wrote:
>
> > Hi All,
> > I'm trying to post some files to Solr server. I've done this using the
> > post.jar files for posting xml files residing on my local disk[I tried
> > posting all those xml files from example directory]. Now I'm trying to
> > generate xml files on the fly, with required text to be indexed included
> > therein though, and want to post these files to solr. As per the examples
> > we've used "SimplePostTool" for posting locally resinding files but can
> > some
> > one give me direction on indexing in-memory xml files[files generated on
> > the
> > fly]. Actually I want to automate this process in a loop, so that I'll
> > extract some information and put that to xml file and push it off to Solr
> > for indexing.
> > Thanks in appreciation.
> >
>
>
> You can use the Solrj client to avoid building the intermediate XML
> yourself. Extract the information, use the Solrj api to add the extracted
> text to fields and send them to the solr server.
>
> http://wiki.apache.org/solr/Solrj
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Solr test anyone?

2009-04-27 Thread tarjei
Hi, I'm looking for ways to test that my indexing methods work correctly 
with my Solr schema.


Therefore I'm wondering if someone has created a test setup where they 
start a Solr instance and then add some documents to the instance - as a 
Junit/testng test - preferably with a working Maven dependencies for it 
as well.


I've tried googling for this as well as setting it up myself, but I have 
never managed to get a test working like I want it to.



Kind regards,
Tarjei


Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread Shalin Shekhar Mangar
On Mon, Apr 27, 2009 at 3:30 PM, ahmed baseet wrote:

> Hi All,
> I'm trying to post some files to Solr server. I've done this using the
> post.jar files for posting xml files residing on my local disk[I tried
> posting all those xml files from example directory]. Now I'm trying to
> generate xml files on the fly, with required text to be indexed included
> therein though, and want to post these files to solr. As per the examples
> we've used "SimplePostTool" for posting locally resinding files but can
> some
> one give me direction on indexing in-memory xml files[files generated on
> the
> fly]. Actually I want to automate this process in a loop, so that I'll
> extract some information and put that to xml file and push it off to Solr
> for indexing.
> Thanks in appreciation.
>


You can use the Solrj client to avoid building the intermediate XML
yourself. Extract the information, use the Solrj api to add the extracted
text to fields and send them to the solr server.

http://wiki.apache.org/solr/Solrj

-- 
Regards,
Shalin Shekhar Mangar.


How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread ahmed baseet
Hi All,
I'm trying to post some files to Solr server. I've done this using the
post.jar files for posting xml files residing on my local disk[I tried
posting all those xml files from example directory]. Now I'm trying to
generate xml files on the fly, with required text to be indexed included
therein though, and want to post these files to solr. As per the examples
we've used "SimplePostTool" for posting locally resinding files but can some
one give me direction on indexing in-memory xml files[files generated on the
fly]. Actually I want to automate this process in a loop, so that I'll
extract some information and put that to xml file and push it off to Solr
for indexing.
Thanks in appreciation.

--Ahmed.