Re: Searching problem

2010-11-13 Thread Govind Kanshi
You must spend time on -
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters




On Sat, Nov 13, 2010 at 10:42 AM, M.Rizwan griz...@gmail.com wrote:

 Hi All,

 Do you have any idea that why solr search for panasonic* ( without
 quotes ) does not match panasonic ?
 If we search panasonic it matches a result but if we search with
 panasonic* it does not find it.

 What needs to be done here ?

 Thanks

 Riz



Re: A Newbie Question

2010-11-13 Thread Govind Kanshi
Another pov you might want to think about - what kind of search you want.
Just plain - full text search or there is something more to those text
files. Are they grouped in folders? Do the folders imply certain kind of
grouping/hierarchy/tagging?

I recently was trying to help somebody who had files across lot of places
grouped by date/subject/author - he wanted to ensure these are fields
which too can act as filters/navigators.

Just an input - ignore it if you just want plain full text search.

On Sat, Nov 13, 2010 at 11:25 AM, Lance Norskog goks...@gmail.com wrote:

 About web servers: Solr is a servlet war file and needs a Java web server
 container to run. The example/ folder in the Solr disribution uses
 'Jetty', and this is fine for small production-quality projects.  You can
 just copy the example/ directory somewhere to set up your own running Solr;
 that's what I always do.

 About indexing programs: if you know Unix scripting, it may be easiest to
 walk the file system yourself with the 'find' program and create Solr input
 XML files.

 But yes, you definitely want the Solr 1.4 Enterprise manual. I spent months
 learning this stuff very slowly, and the book would have been great back
 then.

 Lance


 Erick Erickson wrote:

 Think of the data import handler (DIH) as Solr pulling data to index
 from some source based on configuration. So, once you set up
 your DIH config to point to your file system, you issue a command
 to solr like OK, do your data import thing. See the
 FileListEntityProcessor.
 http://wiki.apache.org/solr/DataImportHandler

 http://wiki.apache.org/solr/DataImportHandlerSolrJ is a clent library
 you'd use to push data to Solr. Basically, you
 write a Java program that uses SolrJ to walk the file system, find
 documents, create a Solr document and sent that to Solr. It's not
 nearly as complex as it soundsG. See:
 http://wiki.apache.org/solr/Solrj

 http://wiki.apache.org/solr/SolrjIt's probably worth your while to get
 a
 copy of Solr 1.4, Enterprise Search Server
 by Erik Pugh and David Smiley.

 Best
 Erick

 On Fri, Nov 12, 2010 at 8:37 AM, K. Seshadri Iyerseshadri...@gmail.com
 wrote:



 Hi Lance,

 Thank you very much for responding (not sure how I reply to the group,
 so,
 writing to you).

 Can you please expand on your suggestion? I am not a web guy and so,
 don't
 know where to start.

 What is the difference between SolrJ and DataImportHandler? Do I need to
 set
 up web servers on all my storage boxes?

 Apologies for the basic level of questions, but hope I can get started
 and
 implement this before the year end (you know why :o)

 Thanks,

 Sesh

 On 12 November 2010 13:31, Lance Norskoggoks...@gmail.com  wrote:



 Using 'curl' is fine. There is a library called SolrJ for Java and
 other libraries for other scripting languages that let you upload with
 more control. There is a thing in Solr called the DataImportHandler
 that lets you script walking a file system.

 On Thu, Nov 11, 2010 at 8:38 PM, K. Seshadri Iyerseshadri...@gmail.com

 wrote:


 Hi,

 Pardon me if this sounds very elementary, but I have a very basic


 question


 regarding Solr search. I have about 10 storage devices running Solaris


 with


 hundreds of thousands of text files (there are other files, as well,


 but


 my


 target is these text files). The directories on the Solaris boxes are
 exported and are available as NFS mounts.

 I have installed Solr 1.4 on a Linux box and have tested the


 installation,


 using curl to post  documents. However, the manual says that curl is


 not


 the


 recommended way of posting documents to Solr. Could someone please tell


 me


 what is the preferred approach in such an environment? I am not a


 programmer


 and would appreciate some hand-holding here :o)

 Thanks in advance,

 Sesh





 --
 Lance Norskog
 goks...@gmail.com










Re: How to Facet on a price range

2010-11-13 Thread Govind Kanshi
Kudos to Jan's pre-compute option and gwk's range facet answer.

On Wed, Nov 10, 2010 at 2:52 PM, Geert-Jan Brits gbr...@gmail.com wrote:

 Ah I see: like you said it's part of the facet range implementation.
 Frontend is already working, just need the 'update-on-slide' behavior.

 Thanks
 Geert-Jan

 2010/11/10 gwk g...@eyefi.nl

  On 11/9/2010 7:32 PM, Geert-Jan Brits wrote:
 
  when you drag the sliders , an update of how many results would match is
  immediately shown. I really like this. How did you do this? IS this
  out-of-the-box available with the suggested Facet_by_range patch?
 
 
  Hi,
 
  With the range facets you get the facet counts for every discrete step of
  the slider, these values are requested in the AJAX request whenever
 search
  criteria change and then someone uses the sliders we simply check the
 range
  that is selected and add the discrete values of that range to get the
  expected amount of results. So yes it is available, but as Solr is just
 the
  search backend the frontend stuff you'll have to write yourself.
 
  Regards,
 
  gwk
 



Re: Color search for images

2010-09-18 Thread Govind Kanshi
Not exactly sure how one would put context of what object is more dominant
than other.
Think of landscape with snow, green mountains and set of flowers of varied
colors including a rose

On Fri, Sep 17, 2010 at 8:43 PM, Shashi Kant sk...@sloan.mit.edu wrote:

 
  What I am envisioning (at least to start) is have all this add two fields
 in
  the index.  One would be for color information for the color similarity
  search.  The other would be a simple multivalued text field that we put
  keywords into based on what OpenCV can detect about the image.  If it
  detects faces, we would put face into this field.  Other things that it
  can detect would result in other keywords.
 
  For the color search, I have a few inter-related hurdles.  I've got to
  figure out what form the color data actually takes and how to represent
 it
  in Solr.  I need Java code for Solr that can take an input color value
 and
  find similar values in the index.  Then I need some code that can go in
 our
  feed processing scripts for new content.  That code would also go into a
  crawler script to handle existing images.
 

 You are on the right track. You can create a set of representative
 keywords from the image. OpenCV  gets a color histogram from the image
 - you can set the bin values to be as granular as you need, and create
 a look-up list of color names to generate a MVF representative of the
 image.
 If you want to get more sophisticated, represent the colors with
 payloads in correlation with the distribution of the color in the
 image.

 Another approach would be to segment the image and extract colors from
 each. So if you have a red rose with all white background, the textual
 representation would be something like:

 white, white...red...white, white

 Play around and see which works best.

 HTH



Re: Adding new elements to index

2010-07-07 Thread Govind Kanshi
Just for testing purpose - I would
1. Use curl to create new docs
2. Use Solrj to go to individual dbs and collect docs.



On Wed, Jul 7, 2010 at 12:45 PM, Xavier Rodriguez xee...@gmail.com wrote:

 Thanks for the quick reply!

 In fact it was a typo, the 200 rows I got were from postgres. I tried to
 say
 that the full-import was omitting the 100 oracle rows.

 When I run the full import, I run it as a single job, using the url
 command=full-import. I've tried to clear the index both using the clean
 command and manually deleting it, but when I run the full-import, the
 number
 of indexed documents are the documents coming from postgres.

 To be sure that the id field is unique, i get the id by assigning a letter
 before the id value. When indexed, the id looks like s_123, and that's the
 id 123 for an entity identified as s. Other entities use different
 prefixes, but never s.

 I used DIH to index the data. My configuration is the folllowing:

 File db-data-config.xml

  dataSource
type=JdbcDataSource
name=ds_ora
driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@xxx.xxx.xxx.xxx:1521:SID
user=user
password=password
/

  dataSource
type=JdbcDataSource
name=ds_pg
driver=org.postgresql.Driver
url=jdbc:postgresql://xxx.xxx.xxx.yyy:5432/sid
user=user
password=password
/

 entity name=carrers dataSource=ds_ora query=select 's_'||id as
 id_carrer,'a' as tooltip from imi_carrers
field column=id_carrer name=identificador /
field column=tooltip name=Nom /
 /entity


 entity name=hidrants dataSource=ds_pg query=select 'h_'||id as
 id_hidrant, parc as tooltip from hidrants
field column=id_hidrant name=identificador /
field column=tooltip name=Nom /
  /entity

 --

 In that configuration, all the fields coming from ds_pg are indexed, and
 the
 fields coming from ds_ora are not indexed. As I've said, the strange
 behaviour for me is that no error is logged in tomcat, the number of
 documents created is the number of rows returned by hidrants, while the
 number of rows returned is the sum of the rows from hidrants and
 carrers.

 Thanks in advance.

 Xavi.







 On 7 July 2010 02:46, Erick Erickson erickerick...@gmail.com wrote:

  first do you have a unique key defined in your schema.xml? If you
  do, some of those 300 rows could be replacing earlier rows.
 
  You say:  if I have 200
  rows indexed from postgres and 100 rows from Oracle, the full-import
  process
  only indexes 200 documents from oracle, although it shows clearly that
 the
  query retruned 300 rows.
 
  Which really looks like a typo, if you have 100 rows from Oracle how
  did you get 200 rows from Oracle?
 
  Are you perhaps doing this in two different jobs and deleting the
  first import before running the second?
 
  And if this is irrelevant, could you provide more details like how you're
  indexing things (I'm assuming DIH, but you don't state that anywhere).
  If it *is* DIH, providing that configuration would help.
 
  Best
  Erick
 
  On Tue, Jul 6, 2010 at 11:19 AM, Xavier Rodriguez xee...@gmail.com
  wrote:
 
   Hi,
  
   I have a SOLR installed on a Tomcat application server. This solr
  instance
   has some data indexed from a postgres database. Now I need to add some
   entities from an Oracle database. When I run the full-import command,
 the
   documents indexed are only documents from postgres. In fact, if I have
  200
   rows indexed from postgres and 100 rows from Oracle, the full-import
   process
   only indexes 200 documents from oracle, although it shows clearly that
  the
   query retruned 300 rows.
  
   I'm not doing a delta-import, simply a full import. I've tried to clean
  the
   index, reload the configuration, and manually remove
  dataimport.properties
   because it's the only metadata i found.  Is there any other file to
 check
   or
   modify just to get all 300 rows indexed?
  
   Of course, I tried to find one of that oracle fields, with no results.
  
   Thanks a lot,
  
   Xavier Rodriguez.
  
 



Re: Nested table support ability

2010-06-23 Thread Govind Kanshi
Amit - unless you test it would not be apparent. Key piece is as Otis
mentioned flatten everything. This requires effort from your side to
actually create documents in manner suitable for your searches. The
relationship needs to be merged into the document. To avoid storing text
representations  - you may want to store just the identifier and use front
end to translate between human readable text vs stored identifier.
Taking your case further - Rather than storing ADMIN store just a
representation may be a smallint with customer information.

On Wed, Jun 23, 2010 at 11:30 AM, amit_ak amit...@mindtree.com wrote:


 Hi Otis, Thanks for the update.

 My paramteric search has to span across customer table and 30 child tables.
 We have close to 1 million customers. Do you think Lucene/Solr is the right
 fsolution for such requirements? or database search would be more optimal.

 Regards,
 Amit

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p916087.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field Collapsing SOLR-236

2010-06-23 Thread Govind Kanshi
fieldType:analyzer without class or tokenizer  filter list seems to point
to the config - you may want to correct.


On Wed, Jun 23, 2010 at 3:09 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi,
I checked out modules  lucene from the trunk.
 Performed a build using the following commands
 ant clean
 ant compile
 ant example

 Which compiled successfully.


 I then put my existing index(using schema.xml from solr1.4.0/conf/solr/) in
 the multicore folder, configured solr.xml and started the server

 When i type in http://localhost:8983/solr

 i get the following error:
 org.apache.solr.common.SolrException: Plugin init failure for [schema.xml]
 fieldType:analyzer without class or tokenizer  filter list
 at

 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168)
 at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
 at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:122)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:286)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:198)
 at

 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:123)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86)
 at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at

 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662)
 at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
 at

 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
 at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
 at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at

 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
 at

 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at

 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
 at org.mortbay.jetty.Server.doStart(Server.java:224)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.mortbay.start.Main.invokeMain(Main.java:194)
 at org.mortbay.start.Main.start(Main.java:534)
 at org.mortbay.start.Main.start(Main.java:441)
 at org.mortbay.start.Main.main(Main.java:119)
 Caused by: org.apache.solr.common.SolrException: analyzer without class or
 tokenizer  filter list
 at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:908)
 at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:60)
 at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)
 at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)
 at

 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:142)
 ... 32 more


 Then i picked up an existing index (schema.xml from solr1.3/solr/conf) and
 put it in multicore folder, configured solr.xml and restarted my index

 Collapsing worked fine.

 Any pointers, which part of schema.xml (solr 1.4) is causing this
 exception?

 Regards,
 Raakhi



 On Wed, Jun 23, 2010 at 1:35 PM, Rakhi Khatwani rkhatw...@gmail.com
 wrote:

 
  Oops this is probably i didn't checkout the modules file from the trunk.
  doing that right now :)
 
  Regards
  Raakhi
 
  On Wed, Jun 23, 2010 at 1:12 PM, Rakhi Khatwani rkhatw...@gmail.com
 wrote:
 
  Hi,
 Patching did work. but when i build the trunk, i get the
 following
  exception:
 
  [SolrTrunk]# ant compile
  Buildfile: /testWorkspace/SolrTrunk/build.xml
 
  init-forrest-entities:
[mkdir] Created dir: /testWorkspace/SolrTrunk/build
[mkdir] Created dir: /testWorkspace/SolrTrunk/build/web
 
  compile-lucene:
 
  BUILD FAILED
  /testWorkspace/SolrTrunk/common-build.xml:207:
  /testWorkspace/modules/analysis/common does not exist.
 
  Regards,
  Raakhi
 
  On Wed, Jun 23, 2010 at 2:39 AM, Martijn v Groningen 
  martijn.is.h...@gmail.com wrote:
 
  What exactly did not work? Patching, compiling or running it?
 
  On 22 June 2010 16:06, Rakhi Khatwani rkhatw...@gmail.com wrote:
   Hi,
I tried checking out the latest code (rev 956715) the patch did
  not
   work on it.