Re: How to Facet on a price range

2010-11-13 Thread Govind Kanshi
Kudos to Jan's pre-compute option and gwk's range facet answer.

On Wed, Nov 10, 2010 at 2:52 PM, Geert-Jan Brits  wrote:

> Ah I see: like you said it's part of the facet range implementation.
> Frontend is already working, just need the 'update-on-slide' behavior.
>
> Thanks
> Geert-Jan
>
> 2010/11/10 gwk 
>
> > On 11/9/2010 7:32 PM, Geert-Jan Brits wrote:
> >
> >> when you drag the sliders , an update of how many results would match is
> >> immediately shown. I really like this. How did you do this? IS this
> >> out-of-the-box available with the suggested Facet_by_range patch?
> >>
> >
> > Hi,
> >
> > With the range facets you get the facet counts for every discrete step of
> > the slider, these values are requested in the AJAX request whenever
> search
> > criteria change and then someone uses the sliders we simply check the
> range
> > that is selected and add the discrete values of that range to get the
> > expected amount of results. So yes it is available, but as Solr is just
> the
> > search backend the frontend stuff you'll have to write yourself.
> >
> > Regards,
> >
> > gwk
> >
>


Re: A Newbie Question

2010-11-13 Thread Govind Kanshi
Another pov you might want to think about - what kind of search you want.
Just plain - full text search or there is something more to those text
files. Are they grouped in folders? Do the folders imply certain kind of
grouping/hierarchy/tagging?

I recently was trying to help somebody who had files across lot of places
grouped by date/subject/author - he wanted to ensure these are "fields"
which too can act as filters/navigators.

Just an input - ignore it if you just want plain full text search.

On Sat, Nov 13, 2010 at 11:25 AM, Lance Norskog  wrote:

> About web servers: Solr is a servlet war file and needs a Java web server
> "container" to run. The example/ folder in the Solr disribution uses
> 'Jetty', and this is fine for small production-quality projects.  You can
> just copy the example/ directory somewhere to set up your own running Solr;
> that's what I always do.
>
> About indexing programs: if you know Unix scripting, it may be easiest to
> walk the file system yourself with the 'find' program and create Solr input
> XML files.
>
> But yes, you definitely want the Solr 1.4 Enterprise manual. I spent months
> learning this stuff very slowly, and the book would have been great back
> then.
>
> Lance
>
>
> Erick Erickson wrote:
>
>> Think of the data import handler (DIH) as Solr pulling data to index
>> from some source based on configuration. So, once you set up
>> your DIH config to point to your file system, you issue a command
>> to solr like "OK, do your data import thing". See the
>> FileListEntityProcessor.
>> http://wiki.apache.org/solr/DataImportHandler
>>
>> SolrJ is a clent library
>> you'd use to push data to Solr. Basically, you
>> write a Java program that uses SolrJ to walk the file system, find
>> documents, create a Solr document and sent that to Solr. It's not
>> nearly as complex as it sounds. See:
>> http://wiki.apache.org/solr/Solrj
>>
>> It's probably worth your while to get
>> a
>> copy of "Solr 1.4, Enterprise Search Server"
>> by Erik Pugh and David Smiley.
>>
>> Best
>> Erick
>>
>> On Fri, Nov 12, 2010 at 8:37 AM, K. Seshadri Iyer> >wrote:
>>
>>
>>
>>> Hi Lance,
>>>
>>> Thank you very much for responding (not sure how I reply to the group,
>>> so,
>>> writing to you).
>>>
>>> Can you please expand on your suggestion? I am not a web guy and so,
>>> don't
>>> know where to start.
>>>
>>> What is the difference between SolrJ and DataImportHandler? Do I need to
>>> set
>>> up web servers on all my storage boxes?
>>>
>>> Apologies for the basic level of questions, but hope I can get started
>>> and
>>> implement this before the year end (you know why :o)
>>>
>>> Thanks,
>>>
>>> Sesh
>>>
>>> On 12 November 2010 13:31, Lance Norskog  wrote:
>>>
>>>
>>>
 Using 'curl' is fine. There is a library called SolrJ for Java and
 other libraries for other scripting languages that let you upload with
 more control. There is a thing in Solr called the DataImportHandler
 that lets you script walking a file system.

 On Thu, Nov 11, 2010 at 8:38 PM, K. Seshadri Iyer>>>
 wrote:


> Hi,
>
> Pardon me if this sounds very elementary, but I have a very basic
>
>
 question


> regarding Solr search. I have about 10 storage devices running Solaris
>
>
 with


> hundreds of thousands of text files (there are other files, as well,
>
>
 but
>>>
>>>
 my


> target is these text files). The directories on the Solaris boxes are
> exported and are available as NFS mounts.
>
> I have installed Solr 1.4 on a Linux box and have tested the
>
>
 installation,


> using curl to post  documents. However, the manual says that curl is
>
>
 not
>>>
>>>
 the


> recommended way of posting documents to Solr. Could someone please tell
>
>
 me


> what is the preferred approach in such an environment? I am not a
>
>
 programmer


> and would appreciate some hand-holding here :o)
>
> Thanks in advance,
>
> Sesh
>
>
>


 --
 Lance Norskog
 goks...@gmail.com



>>>
>>>
>>
>>
>


Re: Searching problem

2010-11-13 Thread Govind Kanshi
You must spend time on -
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters




On Sat, Nov 13, 2010 at 10:42 AM, M.Rizwan  wrote:

> Hi All,
>
> Do you have any idea that why solr search for "panasonic*" ( without
> quotes ) does not match "panasonic" ?
> If we search "panasonic" it matches a result but if we search with
> "panasonic*" it does not find it.
>
> What needs to be done here ?
>
> Thanks
>
> Riz
>


Re: Color search for images

2010-09-18 Thread Govind Kanshi
Not exactly sure how one would put context of what object is more dominant
than other.
Think of landscape with snow, green mountains and set of flowers of varied
colors including a rose

On Fri, Sep 17, 2010 at 8:43 PM, Shashi Kant  wrote:

> >
> > What I am envisioning (at least to start) is have all this add two fields
> in
> > the index.  One would be for color information for the color similarity
> > search.  The other would be a simple multivalued text field that we put
> > keywords into based on what OpenCV can detect about the image.  If it
> > detects faces, we would put "face" into this field.  Other things that it
> > can detect would result in other keywords.
> >
> > For the color search, I have a few inter-related hurdles.  I've got to
> > figure out what form the color data actually takes and how to represent
> it
> > in Solr.  I need Java code for Solr that can take an input color value
> and
> > find similar values in the index.  Then I need some code that can go in
> our
> > feed processing scripts for new content.  That code would also go into a
> > crawler script to handle existing images.
> >
>
> You are on the right track. You can create a set of representative
> keywords from the image. OpenCV  gets a color histogram from the image
> - you can set the bin values to be as granular as you need, and create
> a look-up list of color names to generate a MVF representative of the
> image.
> If you want to get more sophisticated, represent the colors with
> payloads in correlation with the distribution of the color in the
> image.
>
> Another approach would be to segment the image and extract colors from
> each. So if you have a red rose with all white background, the textual
> representation would be something like:
>
> white, white...red...white, white
>
> Play around and see which works best.
>
> HTH
>


Re: DatImportHandler and cron issue

2010-07-07 Thread Govind Kanshi
How did you verify it was not processed? Did you
1. Query for docs - with no results
2. Use Solr Admin tool?
3. Bypass data import handler and see if the doc post/commit works.


On Tue, Jun 15, 2010 at 10:29 PM, iboppana
wrote:

>
> Hi All,
>
> We are trying implement solr for our newspapers site search.
> To build out the index with all the articles published so far, we are
> running script which send the request to dataimport handler with different
> dates.
> What we are seeing is the request is dispatched to solr server,but its not
> being processed.
> Just wanted to check if its some kind of threading issues, and whats the
> best approach to achieve this.
>
> We are sleeping for 75 secs between the requests,
>
>
> while (($date+=86400) < $now) {
>   $curdate=strftime("%D", localtime($date));
>
>   print "Updating index for $curdate\n";
>
>   $curdate=uri_escape($curdate);
>
>   my $url =
> '
> http://test.solr.ddtc.cmgdigital.com:8080/solr/npmetrosearch_statesman/dataimport?command=full-import&entity=initialLoad&clean=false&commit=true&forDate=
> '
> . $curdate .
>
> '&numArticles=-1&server=app5&site=statesman&articleTypes=story,slideshow,video,poll,specialArticle,list';
>
>   print "Sending: $url\n";
>
>
>   #if (system("wget -q -O - \'$url\' | egrep -q \'$regex_pat\'")) {
>   if (system("curl -s \'$url\' | egrep -q \'$regex_pat\'")) {
>  print "Failed to match expected regex reply: \"$regex_pat\"\n";
>  exit 1;
>   }
>
>   sleep 75;
> }
>
>
>
>
> This is what we are seeing on the server logs
> 2010-06-14 12:51:01,328 INFO  [org.apache.solr.core.SolrCore]
> (http-0.0.0.0-8080-1) [npmetrosearch_statesman] webapp=/solr
> path=/dataimport
>
> params={site=statesman&forDate=03/24/10&articleTypes=story,slideshow,video,poll,specialArticle,list&clean=false&commit=true&entity=initialLoad&command=full-import&numArticles=-1&server=app5}
> status=0 QTime=0
> 2010-06-14 12:51:01,329 INFO
> [org.apache.solr.handler.dataimport.DataImporter] (Thread-378) Starting
> Full
> Import
> 2010-06-14 12:51:01,332 INFO
> [org.apache.solr.handler.dataimport.SolrWriter] (Thread-378) Read
> dataimport.properties
> 2010-06-14 12:51:01,425 INFO
> [org.apache.solr.handler.dataimport.DocBuilder] (Thread-378) Time taken =
> 0:0:0.93
> 2010-06-14 12:51:16,338 INFO  [org.apache.solr.core.SolrCore]
> (http-0.0.0.0-8080-1) [npmetrosearch_statesman] webapp=/solr
> path=/dataimport
>
> params={site=statesman&forDate=03/25/10&articleTypes=story,slideshow,video,poll,specialArticle,list&clean=false&commit=true&entity=initialLoad&command=full-import&numArticles=-1&server=app5}
> status=0 QTime=0
> 2010-06-14 12:51:16,338 INFO
> [org.apache.solr.handler.dataimport.DataImporter] (Thread-379) Starting
> Full
> Import
> 2010-06-14 12:51:16,338 INFO
> [org.apache.solr.handler.dataimport.SolrWriter] (Thread-379) Read
> dataimport.properties
> 2010-06-14 12:51:16,465 INFO
> [org.apache.solr.handler.dataimport.DocBuilder] (Thread-379) Time taken =
> 0:0:0.126
>
> Appreciate any thoughts on this.
>
> Thanks
>  Indrani
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/DatImportHandler-and-cron-issue-tp897698p897698.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Adding new elements to index

2010-07-07 Thread Govind Kanshi
Just for testing purpose - I would
1. Use curl to create new docs
2. Use Solrj to go to individual dbs and collect docs.



On Wed, Jul 7, 2010 at 12:45 PM, Xavier Rodriguez  wrote:

> Thanks for the quick reply!
>
> In fact it was a typo, the 200 rows I got were from postgres. I tried to
> say
> that the full-import was omitting the 100 oracle rows.
>
> When I run the full import, I run it as a single job, using the url
> command=full-import. I've tried to clear the index both using the clean
> command and manually deleting it, but when I run the full-import, the
> number
> of indexed documents are the documents coming from postgres.
>
> To be sure that the id field is unique, i get the id by assigning a letter
> before the id value. When indexed, the id looks like s_123, and that's the
> id 123 for an entity identified as "s". Other entities use different
> prefixes, but never "s".
>
> I used DIH to index the data. My configuration is the folllowing:
>
> File db-data-config.xml
>
>  type="JdbcDataSource"
>name="ds_ora"
>driver="oracle.jdbc.OracleDriver"
>url="jdbc:oracle:thin:@xxx.xxx.xxx.xxx:1521:SID"
>user="user"
>password="password"
>/>
>
>  type="JdbcDataSource"
>name="ds_pg"
>driver="org.postgresql.Driver"
>url="jdbc:postgresql://xxx.xxx.xxx.yyy:5432/sid"
>user="user"
>password="password"
>/>
>
> 
>
>
> 
>
>
> 
>
>
>  
>
> --
>
> In that configuration, all the fields coming from ds_pg are indexed, and
> the
> fields coming from ds_ora are not indexed. As I've said, the strange
> behaviour for me is that no error is logged in tomcat, the number of
> documents created is the number of rows returned by "hidrants", while the
> number of rows returned is the sum of the rows from "hidrants" and
> "carrers".
>
> Thanks in advance.
>
> Xavi.
>
>
>
>
>
>
>
> On 7 July 2010 02:46, Erick Erickson  wrote:
>
> > first do you have a unique key defined in your schema.xml? If you
> > do, some of those 300 rows could be replacing earlier rows.
> >
> > You say: " if I have 200
> > rows indexed from postgres and 100 rows from Oracle, the full-import
> > process
> > only indexes 200 documents from oracle, although it shows clearly that
> the
> > query retruned 300 rows."
> >
> > Which really looks like a typo, if you have 100 rows from Oracle how
> > did you get 200 rows from Oracle?
> >
> > Are you perhaps doing this in two different jobs and deleting the
> > first import before running the second?
> >
> > And if this is irrelevant, could you provide more details like how you're
> > indexing things (I'm assuming DIH, but you don't state that anywhere).
> > If it *is* DIH, providing that configuration would help.
> >
> > Best
> > Erick
> >
> > On Tue, Jul 6, 2010 at 11:19 AM, Xavier Rodriguez 
> > wrote:
> >
> > > Hi,
> > >
> > > I have a SOLR installed on a Tomcat application server. This solr
> > instance
> > > has some data indexed from a postgres database. Now I need to add some
> > > entities from an Oracle database. When I run the full-import command,
> the
> > > documents indexed are only documents from postgres. In fact, if I have
> > 200
> > > rows indexed from postgres and 100 rows from Oracle, the full-import
> > > process
> > > only indexes 200 documents from oracle, although it shows clearly that
> > the
> > > query retruned 300 rows.
> > >
> > > I'm not doing a delta-import, simply a full import. I've tried to clean
> > the
> > > index, reload the configuration, and manually remove
> > dataimport.properties
> > > because it's the only metadata i found.  Is there any other file to
> check
> > > or
> > > modify just to get all 300 rows indexed?
> > >
> > > Of course, I tried to find one of that oracle fields, with no results.
> > >
> > > Thanks a lot,
> > >
> > > Xavier Rodriguez.
> > >
> >
>


Re: Field Collapsing SOLR-236

2010-06-23 Thread Govind Kanshi
fieldType:analyzer without class or tokenizer & filter list seems to point
to the config - you may want to correct.


On Wed, Jun 23, 2010 at 3:09 PM, Rakhi Khatwani  wrote:

> Hi,
>I checked out modules & lucene from the trunk.
> Performed a build using the following commands
> ant clean
> ant compile
> ant example
>
> Which compiled successfully.
>
>
> I then put my existing index(using schema.xml from solr1.4.0/conf/solr/) in
> the multicore folder, configured solr.xml and started the server
>
> When i type in http://localhost:8983/solr
>
> i get the following error:
> org.apache.solr.common.SolrException: Plugin init failure for [schema.xml]
> fieldType:analyzer without class or tokenizer & filter list
> at
>
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168)
> at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
> at org.apache.solr.schema.IndexSchema.(IndexSchema.java:122)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:286)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:198)
> at
>
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:123)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86)
> at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
> at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
> at
>
> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662)
> at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
> at
>
> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
> at
> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
> at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
> at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
> at
>
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
> at
>
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
> at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
> at
>
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
> at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
> at
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
> at org.mortbay.jetty.Server.doStart(Server.java:224)
> at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
> at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.mortbay.start.Main.invokeMain(Main.java:194)
> at org.mortbay.start.Main.start(Main.java:534)
> at org.mortbay.start.Main.start(Main.java:441)
> at org.mortbay.start.Main.main(Main.java:119)
> Caused by: org.apache.solr.common.SolrException: analyzer without class or
> tokenizer & filter list
> at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:908)
> at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:60)
> at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)
> at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)
> at
>
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:142)
> ... 32 more
>
>
> Then i picked up an existing index (schema.xml from solr1.3/solr/conf) and
> put it in multicore folder, configured solr.xml and restarted my index
>
> Collapsing worked fine.
>
> Any pointers, which part of schema.xml (solr 1.4) is causing this
> exception?
>
> Regards,
> Raakhi
>
>
>
> On Wed, Jun 23, 2010 at 1:35 PM, Rakhi Khatwani 
> wrote:
>
> >
> > Oops this is probably i didn't checkout the modules file from the trunk.
> > doing that right now :)
> >
> > Regards
> > Raakhi
> >
> > On Wed, Jun 23, 2010 at 1:12 PM, Rakhi Khatwani  >wrote:
> >
> >> Hi,
> >>Patching did work. but when i build the trunk, i get the
> following
> >> exception:
> >>
> >> [SolrTrunk]# ant compile
> >> Buildfile: /testWorkspace/SolrTrunk/build.xml
> >>
> >> init-forrest-entities:
> >>   [mkdir] Created dir: /testWorkspace/SolrTrunk/build
> >>   [mkdir] Created dir: /testWorkspace/SolrTrunk/build/web
> >>
> >> compile-lucene:
> >>
> >> BUILD FAILED
> >> /testWorkspace/SolrTrunk/common-build.xml:207:
> >> /testWorkspace/modules/analysis/common does not exist.
> >>
> >> Regards,
> >> Raakhi
> >>
> >> On Wed, Jun 23, 2010 at 2:39 AM, Martijn v Groningen <
> >> martijn.is.h...@gmail.com> wrote:
> >>
> >>> What exactly did not work? Patching, compiling or running it?
> >>>
> >>> On 22 June 2010 16:06, 

Re: Nested table support ability

2010-06-23 Thread Govind Kanshi
Amit - unless you test it would not be apparent. Key piece is as Otis
mentioned "flatten everything". This requires effort from your side to
actually create documents in manner suitable for your searches. The
relationship needs to be "merged" into the document. To avoid storing text
representations  - you may want to store just the "identifier" and use front
end to translate between human readable text vs stored identifier.
Taking your case further - Rather than storing ADMIN store just a
representation may be a smallint with customer information.

On Wed, Jun 23, 2010 at 11:30 AM, amit_ak  wrote:

>
> Hi Otis, Thanks for the update.
>
> My paramteric search has to span across customer table and 30 child tables.
> We have close to 1 million customers. Do you think Lucene/Solr is the right
> fsolution for such requirements? or database search would be more optimal.
>
> Regards,
> Amit
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p916087.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>