Re: Question about Fuzzy search in Solr

2012-09-17 Thread Rahul Warawdekar
Thanks Jack.
We are using Solr 3.4.

On Mon, Sep 17, 2012 at 8:18 PM, Jack Krupansky wrote:

> That doc is out of date for 4.0. See the 4.0 Javadoc on FuzzyQuery for
> updated info. The tilda right operand is now an integer editing distance
> (number of times to insert char, delete char, change char, or transpose two
> adjacent chars to map index term to query term) that is limited to 2.
>
> Be aware that if you use fuzzy query in 3.6/3.6.1 or earlier, it will
> change when you go to 4.0.
>
> -- Jack Krupansky
>
> -Original Message- From: Rafał Kuć
> Sent: Monday, September 17, 2012 7:15 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Question about Fuzzy search in Solr
>
>
> Hello!
>
> Is this what you are looking for
> https://lucene.apache.org/**core/old_versioned_docs/**versions/3_0_0/**
> queryparsersyntax.html#Fuzzy%**20Searches
> ?
>
> --
> Regards,
> Rafał Kuć
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
>  Hi,
>>
>
>  I need to know how we can implement fuzzy searches using Solr.
>> Can someone provide any links to any relevant documentation ?
>>
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Question about Fuzzy search in Solr

2012-09-17 Thread Rahul Warawdekar
Got it.
Thanks Rafał !

On Mon, Sep 17, 2012 at 6:37 PM, Rafał Kuć  wrote:

> Hello!
>
> There is no need to include any changes or additional component to
> have fuzzy search working in Solr.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > Thanks.
> > Is any extra configuration from the Solr side to make this work ?
> > Any additional text files like synonyms.txt, any additional fields or any
> > changes in schema.xml or solrconfig.xml ?
>
> > On Mon, Sep 17, 2012 at 4:45 PM, Rafał Kuć  wrote:
>
> >> Hello!
> >>
> >> Is this what you are looking for
> >>
> >>
> https://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches
> >> ?
> >>
> >> --
> >> Regards,
> >>  Rafał Kuć
> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> ElasticSearch
> >>
> >> > Hi,
> >>
> >> > I need to know how we can implement fuzzy searches using Solr.
> >> > Can someone provide any links to any relevant documentation ?
> >>
> >>
>
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Question about Fuzzy search in Solr

2012-09-17 Thread Rahul Warawdekar
Thanks.
Is any extra configuration from the Solr side to make this work ?
Any additional text files like synonyms.txt, any additional fields or any
changes in schema.xml or solrconfig.xml ?

On Mon, Sep 17, 2012 at 4:45 PM, Rafał Kuć  wrote:

> Hello!
>
> Is this what you are looking for
>
> https://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches
> ?
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > Hi,
>
> > I need to know how we can implement fuzzy searches using Solr.
> > Can someone provide any links to any relevant documentation ?
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: DIH XML configs for multi environment

2012-07-11 Thread Rahul Warawdekar
http://wiki.eclipse.org/Jetty/Howto/Configure_JNDI_Datasource
http://docs.codehaus.org/display/JETTY/DataSource+Examples


On Wed, Jul 11, 2012 at 2:30 PM, Pranav Prakash  wrote:

> That's cool. Is there something similar for Jetty as well? We use Jetty!
>
> *Pranav Prakash*
>
> "temet nosce"
>
>
>
> On Wed, Jul 11, 2012 at 1:49 PM, Rahul Warawdekar <
> rahul.warawde...@gmail.com> wrote:
>
> > Hi Pranav,
> >
> > If you are using Tomcat to host Solr, you can define your data source in
> > context.xml file under tomcat configuration.
> > You have to refer to this datasource with the same name in all the 3
> > environments from DIH data-config.xml.
> > This context.xml file will vary across 3 environments having different
> > credentials for dev, stag and prod.
> >
> > eg
> > DIH data-config.xml will refer to the datasource as listed below
> >  > type="JdbcDataSource" readOnly="true" />
> >
> > context.xml file which is located under "//conf" folder will
> > have the resource entry as follows
> >> type="" username="X" password="X"
> > driverClassName=""
> > url=""
> > maxActive="8"
> > />
> >
> > On Wed, Jul 11, 2012 at 1:31 PM, Pranav Prakash 
> wrote:
> >
> > > The DIH XML config file has to be specified dataSource. In my case, and
> > > possibly with many others, the logon credentials as well as mysql
> server
> > > paths would differ based on environments (dev, stag, prod). I don't
> want
> > to
> > > end up coming with three different DIH config files, three different
> > > handlers and so on.
> > >
> > > What is a good way to deal with this?
> > >
> > >
> > > *Pranav Prakash*
> > >
> > > "temet nosce"
> > >
> >
> >
> >
> > --
> > Thanks and Regards
> > Rahul A. Warawdekar
> >
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: DIH XML configs for multi environment

2012-07-11 Thread Rahul Warawdekar
Hi Pranav,

If you are using Tomcat to host Solr, you can define your data source in
context.xml file under tomcat configuration.
You have to refer to this datasource with the same name in all the 3
environments from DIH data-config.xml.
This context.xml file will vary across 3 environments having different
credentials for dev, stag and prod.

eg
DIH data-config.xml will refer to the datasource as listed below


context.xml file which is located under "//conf" folder will
have the resource entry as follows
  

On Wed, Jul 11, 2012 at 1:31 PM, Pranav Prakash  wrote:

> The DIH XML config file has to be specified dataSource. In my case, and
> possibly with many others, the logon credentials as well as mysql server
> paths would differ based on environments (dev, stag, prod). I don't want to
> end up coming with three different DIH config files, three different
> handlers and so on.
>
> What is a good way to deal with this?
>
>
> *Pranav Prakash*
>
> "temet nosce"
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Can't index sub-entitties in DIH

2012-06-05 Thread Rahul Warawdekar
Hi,

One of the possibilities for this kind of issue to occur may be the case
sensitivity of column names in Oracle.
Can you apply a transformer and check the entity map which actually
contains the keys and their values ?
Also, please try specifying upper case field names for Oracle and try if
that works.
something like



 

On Tue, Jun 5, 2012 at 9:57 AM, Rafael Taboada wrote:

> Hi Gora,
>
>
> > Your configuration files look fine. It would seem that something
> > is going wrong with the SELECT in Oracle, or with the JDBC
> > driver used to access Oracle. Could you try:
>
> * Manually doing the SELECT for the entity, and sub-entity
> >  to ensure that things are working.
> >
>
> The SELECTs are working OK.
>
>
>
> > * Check the JDBC settings.
> >
>
> I'm using tha last version of jdbc6.jar for Oracle 11g. It seems JDBC
> setting is OK because solr brings data.
>
>
>
> > Sorry, I do not have access to Oracle so that I cannot try this
> > out myself.
> >
> > Also, have you checked the Solr logs for any error messages?
> > Finally, I just noticed that you have extra quotes in:
> > ...where usuario_idusuario = '${usuario.idusuario}'"
> > I doubt that is the cause of your problem, but you could try
> > removing them.
> >
>
> If I remove quotes, there is an error about this:
>
> SEVERE: Full Import failed:java.lang.RuntimeException:
> java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento =
>  Processing Document # 1
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264)
> at
>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
> at
>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
> at
>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento =
>  Processing Document # 1
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621)
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
> ... 3 more
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to execute query: SELECT nombre FROM tipodocumento WHERE
> idtipodocumento =  Processing Document # 1
> at
>
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
> at
>
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
> at
>
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
> at
>
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330)
> at
>
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296)
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683)
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
> ... 5 more
> Caused by: java.sql.SQLSyntaxErrorException: ORA-00936: missing expression
>
> at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
> at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
> at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879)
> at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450)
> at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
> at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
> at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)
> at
> oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:873)
> at
>
> oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
> at
>
> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
> at
>
> oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1909)
> at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1871)
> at
>
> oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:318)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:246
> My config files using Oracle are:
>
>
> db-data-config.xml
> 
> url="jdbc:oracle:thin:@localhost:1521:solr" user="solr" password="s

Re: how to show DIH query sql in log file

2012-06-01 Thread Rahul Warawdekar
Hi,

Turn the Solr logging level to "FINE" for the DIH packages/classes and they
will show up in the log.
http://:/solr//admin/logging

On Fri, Jun 1, 2012 at 9:34 AM, wangjing  wrote:

> how to show DIH query's sql in log file for troubleshooting?
>
> thanks.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: possible status codes from solr during a (DIH) data import process

2012-05-31 Thread Rahul Warawdekar
Hi,

Thats correct.
For failure, you have to check for the text *"Indexing failed. Rolled back
changes"* under the  tag.
One more thing to note here is that there may be a time during the indexing
process where the indexing is complete but the index is not committed and
optimized yet.
You would need to check if the response listed below is present along with
the success message to term it as a complete success.

*2012-05-31 15:10:45
2012-05-31 15:10:45*

On Thu, May 31, 2012 at 3:42 PM, geeky2  wrote:

> hello all,
>
> i have been asked to write a small polling script (bash) to periodically
> check the status of an import on our Master.  our import times are small,
> but there are business reasons why we want to know the status of an import
> after a specified amount of time.
>
> i need to perform certain actions based on the "status" of the import, and
> therefore need to quantify which tags to check and their appropriate
> states.
>
> i am using the command from the DataImportHandler HTTP API to get the
> status
> of the import:
>
> OUTPUT=$(curl -v
> http://${SERVER}:${PORT}/somecore/dataimport?command=status)
>
>
>
>
> can someone tell me if i have these rules correct?
>
> 1) during an import - the status tag will have a busy state:
>
> example:
>
>  busy
>
> 2) at the completion of an import (regardless of failure or success) the
> status tag will have an "idle" state:
>
> example:
>
>  idle
>
>
> 3) to determine if an import failed or succeeded - you must interrogate the
> tags underand specifically look for :
>
> success:
> Indexing completed. Added/Updated: 603378 documents. Deleted 0
> documents.
>
> failure:
> Indexing completed. Added/Updated: 603378 documents. Deleted 0
> documents.
>
> thank you,
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Not able to use the highlighting feature! Want to return snippets of text

2012-05-21 Thread Rahul Warawdekar
Hi,

I believe, in your "colored" fragmentsBuilder definition, you have not
mentioned anything in your pre and post tags and that may be the reason
that you are getting snippets of text, without highlighting.
Please refer http://wiki.apache.org/solr/HighlightingParameters and check
the "hl.fragmentsBuilder" section.
Try specifying the pre and post tags with information as mentioned below.
(same as wiki link above)



  


  



On Mon, May 21, 2012 at 3:52 PM, 12rad  wrote:

> For the fragListBuilder
>  it's
>default="true"
>   class="solr.highlight.SimpleFragListBuilder"/>
>
> fragment builder is
> class="solr.highlight.ScoreOrderFragmentsBuilder">
>
>  
>  
>
>  
>
>
>class="solr.highlight.RegexFragmenter">
>
>
>  70
>
>  0.5
>
>  [-\w ,/\n\"']{20,200}
>
>  
>
>
> Thanks!
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985212.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Not able to use the highlighting feature! Want to return snippets of text

2012-05-21 Thread Rahul Warawdekar
Hi,

Can you please provide the definitions of the following 3 objects from your
solrconfig.xml ?

simple
colored
regex


For eg,
the "simple" hl.fragListBuilder should be defined as mentioned below in
your solrconfig.xml
   


On Mon, May 21, 2012 at 2:06 PM, 12rad  wrote:

> The field I am trying to highlight is stored.
>
>
>  omitNorms="false"
>indexed="true" stored="true" multiValued="true" termVectors="true"
> termPositions="true"
>termOffsets="true"/>
>
>
> In the searchHandler i've set the parameters as follows:
>
>   on
>   text
>   5
>   1000
>   51
>   true
>   regex
>   simple
>   colored
>   1000
>   true
>   true
>   true
>
>
> I still don't see any highlighting. I've managed to get snippets of text
> but
> the actual word is not highlighted. I don't know where I am going wrong?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985174.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Issue with DIH when database is down

2012-05-17 Thread Rahul Warawdekar
Hi,

I am using Solr 3.4 on Tomcat 6 and using DIH to index data from a MS SQL
Server 2008 database.

In case my database is down, or is refusing connections due to any reason,
DIH throws an exception as mentioned below

"org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: ...

Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368)"

But when the database is up and running and the next indexing job runs, it
gives me the same error.
I need to restart Tomcat in order to succesfully connect again to the
database.

My dataSource settings in data-config.xml are as follows


Has anyone come across this issue before ?
If yes, what is the resolution ?
Am I missng anything in the dataSource attributes (autoCommit=true)  ??
-- 
Thanks and Regards
Rahul A. Warawdekar


Solr request tracking

2012-05-16 Thread Rahul Warawdekar
Hi,

Is there any mechanism by which we can track and trend the incoming Solr
search requests ?
Some mechanisms like logging all incoming Solr requests to a different log
file than Tomcat's and have a tool to trend the patterns ?


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: how to limit solr indexing to specific number of rows

2012-05-03 Thread Rahul Warawdekar
Hi,

What is the error that you are getting ?
ROWNUM works fine with DIH, I have tried and tested it with Solr 3.1.

One thing that comes to my mind is the query that you are using to
implement ROWNUM.
Do you replaced the "<" in the query by a "<" in dataconfig.xml ?
like "ROMNUM < =100" ?

On Thu, May 3, 2012 at 4:11 PM, srini  wrote:

> I am doing database import using solr DIH. I would like to limit the solr
> indexing to specific number. In other words If Solr reaches indexing 100
> records I want to database import to stop importing.
>
> Not sure if there is any particular setting that would tell solr that I
> only
> want to import 100 rows from database and index those 100 records.
>
> I tried to give select query with ROMNUM<=100 (using oracle) in
> data-config.xml, but it gave error. Any ideas!!!
>
> Thanks in Advance
> Srini
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-limit-solr-indexing-to-specific-number-of-rows-tp3960344.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-25 Thread Rahul Warawdekar
Hi,

Is the replication still failing or working fine with that change ?

On Tue, Apr 24, 2012 at 2:16 PM, geeky2  wrote:

> that was it!
>
> thank you.
>
> i did notice something else in the logs now ...
>
> what is the meaning or implication of the message, "Connection reset".?
>
>
>
> 2012-04-24 12:59:19,996 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 12:59:39,998 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> *2012-04-24 12:59:59,997 SEVERE [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Master at:
> http://bogus:bogusport/somepath/somecore/replication/ is not available.
> Index fetch failed. Exception: Connection reset*
> 2012-04-24 13:00:19,998 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:00:40,004 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:00:59,992 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:01:19,993 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:01:39,992 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:01:59,989 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:02:19,990 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:02:39,989 INFO  [org.apache.solr.handler.SnapPuller]
> (pool-12-thread-1) Slave in sync with master.
> 2012-04-24 13:02:59,991 INFO  [org.a
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3936107.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-24 Thread Rahul Warawdekar
Hi,

In Solr wiki, for replication, the master url is defined as follows
http://master_host:port
/solr/corename/replication

This url does not contain "admin" in its path where as in the master url
provided by you, you have an additional "admin" in the url.
Not very sure if this might be an issue but you can just check removing
"admin" and check if replication works.


On Tue, Apr 24, 2012 at 11:49 AM, geeky2  wrote:

> hello,
>
> thank you for the reply,
>
> yes - master has been indexed.
>
> ok - makes sense - the polling interval needs to change
>
> i did check the solr war file on both boxes (master and slave).  they are
> identical.  actually - if they were not indentical - this would point to a
> different issue altogether - since our deployment infrastructure - rolls
> the
> war file to the slaves when you do a deployment on the master.
>
> this has me stumped - not sure what to check next.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3935699.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Solr with UIMA

2012-04-19 Thread Rahul Warawdekar
Hi Divakar,

Try making your updateRequestProcessorChain as default. Simply add
default="true" as follows and check if that works.




On Thu, Apr 19, 2012 at 12:01 PM, dsy99  wrote:

> Hi Chris,
> Are you been able to get success to integrate the UIMA in SOLR.
>
> I too  tried to integrate Uima in Solr by following the instructions
> provided in README i.e. the following four steps:
>
> Step1. I set  tags in solrconfig.xml appropriately to point the jar
> files.
>
>   
>
>
> Step2. modified my "schema.xml" adding the fields I wanted to  hold
> metadata
> specifying proper values for type, indexed, stored and multiValued options
> as follows:
>
> required="false"/>
>   multiValued="true" required="false"/>
>multiValued="true" required="false" />
>
> Step3. modified my solrconfig.xml adding the following snippet:
>
>  
> class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
>  
>
>   VALID_ALCHEMYAPI_KEY
>  VALID_ALCHEMYAPI_KEY
>  VALID_ALCHEMYAPI_KEY
>  VALID_ALCHEMYAPI_KEY
>  VALID_ALCHEMYAPI_KEY
>  VALID_OPENCALAIS_KEY
>
>
> name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml
>
>true
>
> 
>  false
>  
> text
>   
>
>
>  
> name="name">org.apache.uima.alchemy.ts.concept.ConceptFS
>
>  text
>  concept
>
>  
>  
> name="name">org.apache.uima.alchemy.ts.language.LanguageFS
>
>  language
>  language
>
>  
>  
>org.apache.uima.SentenceAnnotation
>
>  coveredText
>  sentence
> 
>  
>
>  
>
>
>
>  
>
> Step 4: and finally created a new UpdateRequestHandler with the following:
>   
>
>  uima
>
>
>
> Further I  indexed a word file called text.docx using the following
> command:
>
> curl
> "
> http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true
> "
> -F "myfile=@UIMA_sample_test.docx"
>
> When I searched the file I am not able to see the additional UIMA fields.
>
> Can you please help if you been able to solve the problem.
>
>
> With Regds & Thanks
> Divakar
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3923443.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: DataImportHandler w/ multivalued fields

2011-12-01 Thread Rahul Warawdekar
Hi Briggs,

By saying "multivalued fields are not getting indexed prperly", do you mean
to say that you are not able to search on those fields ?
Have you tried actually searching your Solr index for those multivalued
terms and make sure if it returns the search results ?

One possibility could be that the multivalued fields are getting indexed
correctly and are searchable.
However, since your schema.xml has a "raw_tag" field whose "stored"
attribute is set to false, you may not be able to see those fields.



On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson  wrote:

> In addition, I tried a query like below and changed the column definition
> to
>
> and still no luck. It is indexing the full content now but not multivalued.
> It seems like the "splitBy" ins't working properly.
>
>select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.*
> from site
> left outer join
>  (freetags inner join freetagged_objects)
> on (freetags.id = freetagged_objects.tag_id
>   and site.siteId = freetagged_objects.object_id)
> group  by site.siteId
>
> Am I doing something wrong?
> Thanks,
> Briggs Thompson
>
> On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson <
> w.briggs.thomp...@gmail.com> wrote:
>
> > Hello Solr Community!
> >
> > I am implementing a data connection to Solr through the Data Import
> > Handler and non-multivalued fields are working correctly, but multivalued
> > fields are not getting indexed properly.
> >
> > I am new to DataImportHandler, but from what I could find, the entity is
> > the way to go for multivalued field. The weird thing is that data is
> being
> > indexed for one row, meaning first raw_tag gets populated.
> >
> >
> > Anyone have any ideas?
> > Thanks,
> > Briggs
> >
> > This is the relevant part of the schema:
> >
> > > stored="false" multivalued="true"/>
> > > stored="true" multivalued="true"/>
> >
> >
> > And the relevant part of data-import.xml:
> >
> > 
> >  >   query="select * from site ">
> > 
> > 
> > 
> > 
> > 
> > 
> >  />
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >  />
> > 
> > 
> > 
> >  > query="select raw_tag, freetags.id,
> > freetagged_objects.object_id as siteId
> >from freetags
> >inner join freetagged_objects
> >on freetags.id=freetagged_objects.tag_id
> > where freetagged_objects.object_id='${site.siteId}'">
> > 
> >  
> > 
> > 
> >
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Architecture and Capacity planning for large Solr index

2011-11-21 Thread Rahul Warawdekar
Thanks Otis !
Please ignore my earlier email which does not have all the information.

My business requirements have changed a bit.
We now need one year rolling data in Production, with the following details
- Number of records -> 1.2 million
- Solr index size for these records comes to approximately 200 - 220
GB. (includes large attachments)
- Approx 250 users who will be searching the applicaiton with a peak of
1 search request every 40 seconds.

I am planning to address this using Solr distributed search on a VMWare
virtualized environment as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves
(load balanced)

2. Master configuration for each server is as follows
- 4 CPUs
- 16 GB RAM
- 300 GB disk space

3. Slave configuration for each server is as follows
- 4 CPUs
- 16 GB RAM
- 150 GB disk space

4. I am planning to use SAN instead of local storage to store Solr index.

And my questions are as follows:
Will 3 shards serve the purpose here ?
Is SAN a a good option for storing solr index, given the high index volume ?




On Mon, Nov 21, 2011 at 3:05 PM, Rahul Warawdekar <
rahul.warawde...@gmail.com> wrote:

> Thanks !
>
> My business requirements have changed a bit.
> We need one year rolling data in Production.
> The index size for the same comes to approximately 200 - 220 GB.
> I am planning to address this using Solr distributed search as follows.
>
> 1. Whole index to be split up between 3 shards, with 3 masters and 6
> slaves (load balanced)
> 2. Master configuration
>  will be 4 CPU
>
>
>
> On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wrote:
>
>> Hi Rahul,
>>
>> This is unfortunately not enough information for anyone to give you very
>> precise answers, so I'll just give some rough ones:
>>
>> * best disk - SSD :)
>> * CPU - multicore, depends on query complexity, concurrency, etc.
>> * sharded search and failover - start with SolrCloud, there are a couple
>> of pages about it on the Wiki and
>> http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/
>>
>> Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>> >
>> >From: Rahul Warawdekar 
>> >To: solr-user 
>> >Sent: Tuesday, October 11, 2011 11:47 AM
>> >Subject: Architecture and Capacity planning for large Solr index
>> >
>> >Hi All,
>> >
>> >I am working on a Solr search based project, and would highly appreciate
>> >help/suggestions from you all regarding Solr architecture and capacity
>> >planning.
>> >Details of the project are as follows
>> >
>> >1. There are 2 databases from which, data needs to be indexed and made
>> >searchable,
>> >- Production
>> >- Archive
>> >2. Production database will retain 6 months old data and archive data
>> every
>> >month.
>> >3. Archive database will retain 3 years old data.
>> >4. Database is SQL Server 2008 and Solr version is 3.1
>> >
>> >Data to be indexed contains a huge volume of attachments (PDF, Word,
>> excel
>> >etc..), approximately 200 GB per month.
>> >We are planning to do a full index every month (multithreaded) and
>> >incremental indexing on a daily basis.
>> >The Solr index size is coming to approximately 25 GB per month.
>> >
>> >If we were to use distributed search, what would be the best
>> configuration
>> >for Production as well as Archive indexes ?
>> >What would be the best CPU/RAM/Disk configuration ?
>> >How can I implement failover mechanism for sharded searches ?
>> >
>> >Please let me know in case I need to share more information.
>> >
>> >
>> >--
>> >Thanks and Regards
>> >Rahul A. Warawdekar
>> >
>> >
>> >
>>
>
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Architecture and Capacity planning for large Solr index

2011-11-21 Thread Rahul Warawdekar
Thanks !

My business requirements have changed a bit.
We need one year rolling data in Production.
The index size for the same comes to approximately 200 - 220 GB.
I am planning to address this using Solr distributed search as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves
(load balanced)
2. Master configuration
 will be 4 CPU


On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Hi Rahul,
>
> This is unfortunately not enough information for anyone to give you very
> precise answers, so I'll just give some rough ones:
>
> * best disk - SSD :)
> * CPU - multicore, depends on query complexity, concurrency, etc.
> * sharded search and failover - start with SolrCloud, there are a couple
> of pages about it on the Wiki and
> http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> >
> >From: Rahul Warawdekar 
> >To: solr-user 
> >Sent: Tuesday, October 11, 2011 11:47 AM
> >Subject: Architecture and Capacity planning for large Solr index
> >
> >Hi All,
> >
> >I am working on a Solr search based project, and would highly appreciate
> >help/suggestions from you all regarding Solr architecture and capacity
> >planning.
> >Details of the project are as follows
> >
> >1. There are 2 databases from which, data needs to be indexed and made
> >searchable,
> >- Production
> >- Archive
> >2. Production database will retain 6 months old data and archive data
> every
> >month.
> >3. Archive database will retain 3 years old data.
> >4. Database is SQL Server 2008 and Solr version is 3.1
> >
> >Data to be indexed contains a huge volume of attachments (PDF, Word, excel
> >etc..), approximately 200 GB per month.
> >We are planning to do a full index every month (multithreaded) and
> >incremental indexing on a daily basis.
> >The Solr index size is coming to approximately 25 GB per month.
> >
> >If we were to use distributed search, what would be the best configuration
> >for Production as well as Archive indexes ?
> >What would be the best CPU/RAM/Disk configuration ?
> >How can I implement failover mechanism for sharded searches ?
> >
> >Please let me know in case I need to share more information.
> >
> >
> >--
> >Thanks and Regards
> >Rahul A. Warawdekar
> >
> >
> >
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Ordered proximity search

2011-11-04 Thread Rahul Warawdekar
Hi Thomas,

Do you always need the ordered proximity search by default ?
You may want to check SpanNearQuery at "
http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/";.

We are using edismax query parser provided by Solr.
I had a similar type of requirement in our project in here is how we
addressed it

1. Wrote a customized query parser similar to edismax.
2. Identified the method in the code which takes care of "PhraseQuery" and
replaced it with a snippet of "SpanNearQuery" code.

Please check more on SpanNearQuery if that works for you.



On Thu, Nov 3, 2011 at 2:11 PM, LT.thomas  wrote:

> Hi,
>
> By ordered I mean term1 will always come before term2 in the document.
>
> I have two documents:
> 1. "By ordered I mean term1 will always come before term2 in the document"
> 2. "By ordered I mean term2 will always come before term1 in the document"
>
> if I make the query:
>
> "term1 term2"~Integer.MAX_VALUE
>
> my results is: 2 documents
>
> How can I query to have one result (only if term1 come before term2):
> "By ordered I mean term1 will always come before term2 in the document"
>
> Thanks
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Ordered-proximity-search-tp3477946p3477946.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Issue with Shard configuration in solrconfig.xml (Solr 3.1)

2011-10-20 Thread Rahul Warawdekar
Hi,

I am trying to evaluate distributed search for my project by splitting up
our single index on 2 shards with Solr 3.1
When I query the first solr server by passing the "shards" parameter, I get
correct search results from both shards.
(
http://server1:8080/solr/test/select/?shards=server1:8080/solr/test,server2:8080/solr/test&q=solr&start=0&rows=20
)

I want to avoid the use of this shards parameter in the http url and specify
it in solrconfig.xml as follows.


server1:8080/solr/test,server2:8080/solr/test
..


After adding the shards parameter in solrconfig.xml, I get search results
only from the first shard and not from the from the second one.
Am I missing any configuration ?

Also, can the urls with the shard parameter be load balanced for a failover
mechanism ?



-- 
Thanks and Regards
Rahul A. Warawdekar


Architecture and Capacity planning for large Solr index

2011-10-11 Thread Rahul Warawdekar
Hi All,

I am working on a Solr search based project, and would highly appreciate
help/suggestions from you all regarding Solr architecture and capacity
planning.
Details of the project are as follows

1. There are 2 databases from which, data needs to be indexed and made
searchable,
- Production
- Archive
2. Production database will retain 6 months old data and archive data every
month.
3. Archive database will retain 3 years old data.
4. Database is SQL Server 2008 and Solr version is 3.1

Data to be indexed contains a huge volume of attachments (PDF, Word, excel
etc..), approximately 200 GB per month.
We are planning to do a full index every month (multithreaded) and
incremental indexing on a daily basis.
The Solr index size is coming to approximately 25 GB per month.

If we were to use distributed search, what would be the best configuration
for Production as well as Archive indexes ?
What would be the best CPU/RAM/Disk configuration ?
How can I implement failover mechanism for sharded searches ?

Please let me know in case I need to share more information.


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Rahul Warawdekar
Hi Joshua,

Can you try updating your solr.xml as follows:
Specify
"" instead of
""

Basically remove the extra text "cores" in the core element from the
instanceDir attribute.

Just try and let us know if it works.

On Wed, Sep 28, 2011 at 3:40 PM, Joshua Miller wrote:

> Hello,
>
> I am trying to get SOLR working with multiple cores and have a problem
> accessing the admin page once I configure multiple cores.
>
> Problem:
> When accessing the admin page via http://solrhost:8080/solr/admin, I get a
> 404, "missing core name in path".
>
> Question:  when using the multicore option, is the standard admin page
> still available?
>
> Environment:
> - solr 1.4.1
> - Windows server 2008 R2
> - Java SE 1.6u27
> - Tomcat 6.0.33
> - Solr Experience:  none
>
> I have set -Dsolr.solr.home=c:\solr and within that I have a solr.xml with
> the following contents:
>
> 
>  
>
>
>  
> 
>
> I have copied the example/solr directory to c:\solr and have populated that
> directory with the cores/{core{0,1}} as well as the proper configs and data
> directories within.
>
> When I restart tomcat, it shows a couple of exceptions related to
> queryElevationComponent and null pointers that I think are due to the DB not
> yet being available but I see that the cores appear to initialize properly
> other than that
>
> So the problem I'm looking to solve/clarify here is the admin page - should
> that remain available and usable when using the multicore configuration or
> am I doing something wrong?  Do I need to use the CoreAdminHandler type
> requests to manage multicore instead?
>
> Thanks,
> --
> Josh Miller
> Open Source Solutions Architect
> (425) 737-2590
> http://itsecureadmin.com/
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Solr stopword problem in Query

2011-09-27 Thread Rahul Warawdekar
Hi Isan,

The schema.xml seems OK to me.

Is "textForQuery" the only field you are searching in ?
Are you also searching on any other non text based fields ? If yes, please
provide schema description for those fields also.
Also, provide your solrconfig.xml file.


On Tue, Sep 27, 2011 at 1:12 AM, Isan Fulia wrote:

> Hi Rahul,
>
> I also tried searching "Coke Studio MTV" but no documents were returned.
>
> Here is the snippet of my schema file.
>
>   positionIncrementGap="100" autoGeneratePhraseQueries="true">
>
>  
>
>
>ignoreCase="true"
>
>words="stopwords_en.txt"
>enablePositionIncrements="true"
>
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
>
>
> protected="protwords.txt"/>
>
>
>  
>
>  
>
>
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>
>ignoreCase="true"
>
>words="stopwords_en.txt"
>enablePositionIncrements="true"
>
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
>
>
> protected="protwords.txt"/>
>
>
>  
>
>
>
>
> * multiValued="false"/>
>  multiValued="false"/>
>
> ** multiValued="true" omitTermFreqAndPositions="true"/>**
>
> 
> *
>
>
> Thanks,
> Isan Fulia.
>
>
> On 26 September 2011 21:19, Rahul Warawdekar  >wrote:
>
> > Hi Isan,
> >
> > Does your search return any documents when you remove the 'at' keyword
> and
> > just search for "Coke studio MTV" ?
> > Also, can you please provide the snippet of schema.xml file where you
> have
> > mentioned this field name and its "type" description ?
> >
> > On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia  > >wrote:
> >
> > > Hi all,
> > >
> > > I have a text field named* textForQuery* .
> > > Following content has been indexed into solr in field textForQuery
> > > *Coke Studio at MTV*
> > >
> > > when i fired the query as
> > > *textForQuery:("coke studio at mtv")* the results showed 0 documents
> > >
> > > After runing the same query in debugMode i got the following results
> > >
> > > 
> > > 
> > > textForQuery:("coke studio at mtv")
> > > textForQuery:("coke studio at mtv")
> > > PhraseQuery(textForQuery:"coke studio ?
> > mtv")
> > > textForQuery:"coke studio *?
> *mtv"
> > >
> > > Why the query did not matched any document even when there is a
> document
> > > with value of textForQuery as *Coke Studio at MTV*?
> > > Is this because of the stopword *at* present in stopwordList?
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Isan Fulia.
> > >
> >
> >
> >
> > --
> > Thanks and Regards
> > Rahul A. Warawdekar
> >
>
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Solr stopword problem in Query

2011-09-26 Thread Rahul Warawdekar
Hi Isan,

Does your search return any documents when you remove the 'at' keyword and
just search for "Coke studio MTV" ?
Also, can you please provide the snippet of schema.xml file where you have
mentioned this field name and its "type" description ?

On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia wrote:

> Hi all,
>
> I have a text field named* textForQuery* .
> Following content has been indexed into solr in field textForQuery
> *Coke Studio at MTV*
>
> when i fired the query as
> *textForQuery:("coke studio at mtv")* the results showed 0 documents
>
> After runing the same query in debugMode i got the following results
>
> 
> 
> textForQuery:("coke studio at mtv")
> textForQuery:("coke studio at mtv")
> PhraseQuery(textForQuery:"coke studio ? mtv")
> textForQuery:"coke studio *? *mtv"
>
> Why the query did not matched any document even when there is a document
> with value of textForQuery as *Coke Studio at MTV*?
> Is this because of the stopword *at* present in stopwordList?
>
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: JdbcDataSource and threads

2011-09-23 Thread Rahul Warawdekar
I am using Solr 3.1.
But you can surely try the patch with 3.3.

On Fri, Sep 23, 2011 at 1:35 PM, Vazquez, Maria (STM) <
maria.vazq...@dexone.com> wrote:

> Thanks Rahul.
> Are you using 3.3 or 3.4? I'm on 3.3 right now
> I will try the patch today
> Thanks again,
> Maria
>
>
> -Original Message-
> From: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com]
> Sent: Thursday, September 22, 2011 12:46 PM
> To: solr-user@lucene.apache.org
> Subject: Re: JdbcDataSource and threads
>
> Hi,
>
> Have you applied the patch that is provided with the Jira you mentioned
> ?
> https://issues.apache.org/jira/browse/SOLR-2233
>
> Please apply the patch and check if you are getting the same exceptions.
> It has worked well for me till now.
>
> On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) <
> maria.vazq...@dexone.com> wrote:
>
> > Hi!
> >
> > So as of 3.4 JdbcDataSource doesn't work with threads, correct?
> >
> >
> >
> > https://issues.apache.org/jira/browse/SOLR-2233
> >
> >
> >
> > I'm using Microsoft SQL Server, my data-config.xml has a lot of very
> > complex SQL queries and it takes a long time to index.
> >
> > I'm migrating from Lucene to Solr and the Lucene code uses threads so
> it
> > takes little time to index, now in Solr if I add threads=xx to my
> > rootEntity I get lots of errors about connections being closed.
> >
> >
> >
> > Thanks a lot,
> >
> > Maria
> >
> >
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: JdbcDataSource and threads

2011-09-22 Thread Rahul Warawdekar
Hi,

Have you applied the patch that is provided with the Jira you mentioned ?
https://issues.apache.org/jira/browse/SOLR-2233

Please apply the patch and check if you are getting the same exceptions.
It has worked well for me till now.

On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) <
maria.vazq...@dexone.com> wrote:

> Hi!
>
> So as of 3.4 JdbcDataSource doesn't work with threads, correct?
>
>
>
> https://issues.apache.org/jira/browse/SOLR-2233
>
>
>
> I'm using Microsoft SQL Server, my data-config.xml has a lot of very
> complex SQL queries and it takes a long time to index.
>
> I'm migrating from Lucene to Solr and the Lucene code uses threads so it
> takes little time to index, now in Solr if I add threads=xx to my
> rootEntity I get lots of errors about connections being closed.
>
>
>
> Thanks a lot,
>
> Maria
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: How to get the fields that match the request?

2011-09-22 Thread Rahul Warawdekar
Hi,

Before considering highlighting to address this requirement, you also need
to consider the performance implications of highlighting for large text
fields.

On Thu, Sep 22, 2011 at 11:42 AM, Nicolas Martin wrote:

> yes, highlights can help to do that, but if you wants to paginate your
> results, you can't use hl.
>
> It'd be great to have a scoring average by fields...
>
>
>
>
>
> On 22/09/2011 17:37, Tanner Postert wrote:
>
>> this would be useful to me as well.
>>
>> even when searching with q=test, I know it defaults to the default search
>> field, but it would helpful to know what field(s) match the query term.
>>
>> On Thu, Sep 22, 2011 at 3:29 AM, Nicolas Martin**
>> wrote:
>>
>>
>>
>>> Hi everyBody,
>>>
>>> I need your help to get more information in my solR query's response.
>>>
>>> i've got a simple input text which allows me to query several fields in
>>> the
>>> same query.
>>>
>>> So my query  looks like this
>>> "q=email:martyn+OR+name:martynn+OR+commercial:martyn ..."
>>>
>>> Is it possible in the response to know the fields where "martynn" has
>>> been
>>> found ?
>>>
>>> Thanks a Lot :-)
>>>
>>>
>>>
>>
>>
>
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Index not getting refreshed

2011-09-14 Thread Rahul Warawdekar
Hi Pawan,

Can you please share more details on the indexing mechanism ? (DIH,  SolrJ
or any other)
Please let us know the configuration details.


On Wed, Sep 14, 2011 at 12:48 PM, Pawan Darira wrote:

> Hi
>
> I am using Solr 3.2 on a live website. i get live user's data of about 2000
> per day. I do an incremental index every 8 hours. but my search results
> always show the same result with same sorting order. when i check the same
> search from corresponding db, it gives me different results always (as new
> data regularly gets added)
>
> please suggest what might be the issue. is there any cache related problem
> at SOLR level
>
> thanks
> pawan
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: DIH delta last_index_time

2011-09-14 Thread Rahul Warawdekar
Hi Maria/Gora,

I see this as more of a problem with the timezones in which the Solr server
and the database server are located.
Is this true ?
If yes, one more possibility of handling this scenario would be to customize
DataImportHandler code as follows

1. Add one more configuration property named "dbTimeZone" at the entity
level in "data-config.xml" file
2. While saving the lastIndexTime in the properties file, save it according
to the timezone specified in the config so that it is in sync with the
database
server time.

Basically customize the code so that all the time related updates to the
dataimport.properties file should be timezone specific.


On Wed, Sep 14, 2011 at 4:31 AM, Gora Mohanty  wrote:

> On Wed, Sep 14, 2011 at 11:23 AM, Maria Vazquez
>  wrote:
> > Hi,
> > How do you handle the situation where the time on the server running Solr
> > doesn¹t match the time in the database?
>
> Firstly, why is that the case? NTP is pretty universal
> these days.
>
> > I¹m using the last_index_time saved by Solr in the delta query checking
> it
> > against lastModifiedDate field in the database but the times are not in
> sync
> > so I might lose some changes.
> > Can we use something else other than last_index_time? Maybe something
> like
> > last_pk or something.
>
> One possible way is to edit dataimport.properties, manually or through
> a script, to put the last_index_time back to a "safe" value.
>
> Regards,
> Gora
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Solr: Return field names that contain search term

2011-09-12 Thread Rahul Warawdekar
Thanks Chris !

Will try out the second approach you suggested and share my findings.

On Mon, Sep 12, 2011 at 5:03 PM, Chris Hostetter
wrote:

>
> : > Would highly appreciate if someone can suggest other efficient ways to
> : > address this kind of a requirement.
>
> one approach would be to index each attachment as it's own document and
> search those.  you could then use things like the group collapsing
> features to return onlly the "main" type documents when multiple
> attachments match.
>
> similarly: you could still index each "main" document with a giant
> text field containing all of the attachment text, *and* you could indx
> each attachment as it's own document.  You would search on the main docs
> as you do now, but then your app could issue a secondary request searching
> for all  "attachment" docs that match on one of the main docIds in a
> special field, and use the results to note which attachment of each doc
> (if any) caused the match.
>
> -Hoss
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Solr: Return field names that contain search term

2011-09-12 Thread Rahul Warawdekar
Hi,

I have a a query on Solr search as follows.

I am indexing an entity which includes a multivalued field using DIH.
This multivalued field contains content from multiple attachments for
a single entity.

Now, for eg. if i search for the term "solr", will I be able to know
which field contains this search term ?
And if it is a multivaued field, which field number in that
multivalued field contains the search term ?

Currently, to achieve this, I am using a workaround using the
highlighting feature.
I am indexing all the multiple attachments within a single entity and
document as dynamic fields "_i".

While searching, I am highlighting on these dynamic fields (hl.fl=*_i)
and from the highlighitng section in the results, I am able to get the
attachment number which contains the search term.
But since this approach involves highlighting large attachments, the
search response times are very slow.

Would highly appreciate if someone can suggest other efficient ways to
address this kind of a requirement.

-- 
Thanks and Regards
Rahul A. Warawdekar


Re: FastVectorHighlighter with wildcard queries

2011-09-12 Thread Rahul Warawdekar
Hi Koji,

Thanks for the information !
I will try the patches provided by you.

On 9/8/11, Koji Sekiguchi  wrote:
> (11/09/09 6:16), Rahul Warawdekar wrote:
>> Hi,
>>
>> I am currently evaluating the FastVectorHighlighter in a Solr search based
>> project and have a couple of questions
>>
>> 1. Is there any specific reason why the FastVectorHighlighter does not
>> provide support for multiterm(wildcard) queries ?
>> 2. What are the other constraints when using FastVectorHighlighter ?
>>
>
> FVH used to have typical constrains:
>
> 1. supports only TermQuery and PhraseQuery (and
> BooleanQuery/DisjunctionMaxQuery that
> include TQ and PQ)
> 2. ignores word boundary
>
> But now for 1, FVH will support other queries:
>
> https://issues.apache.org/jira/browse/LUCENE-1889
>
> I believe it is almost closed to be fixed. For 2, FVH in the latest
> trunk/3x, pays
> regard to word or sentence boundary through BoundaryScanner:
>
> https://issues.apache.org/jira/browse/LUCENE-1824
>
> koji
> --
> Check out "Query Log Visualizer" for Apache Solr
> http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
> http://www.rondhuit.com/en/
>


-- 
Thanks and Regards
Rahul A. Warawdekar


FastVectorHighlighter with wildcard queries

2011-09-08 Thread Rahul Warawdekar
Hi,

I am currently evaluating the FastVectorHighlighter in a Solr search based
project and have a couple of questions

1. Is there any specific reason why the FastVectorHighlighter does not
provide support for multiterm(wildcard) queries ?
2. What are the other constraints when using FastVectorHighlighter ?

-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Delta import issue

2011-07-12 Thread Rahul Warawdekar


On Tue, Jul 12, 2011 at 11:34 AM, PeterKerk  wrote:

> Hi Rahul,
>
> Not sure how I would do this "Try adding the primary key attribute to the
> root entity 'ad'"?
>
> In my entity ad I already have these fields (I left those out earlier for
> readability):
><-- this is primary key of ads table
> 
> 
>
> Is that what you mean?
>
> And I'm using MSSQL2008
>
>
> Thanks!
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Delta-import-issue-tp3162581p3162809.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Delta import issue

2011-07-12 Thread Rahul Warawdekar
Hi Peter,

Try adding the primary key attribute to the root entity 'ad' and check if
delta import works.
By the way, which database are you using ?

On Tue, Jul 12, 2011 at 10:27 AM, PeterKerk  wrote:

>
> I'm having an issue with a delta import.
>
> I have the following in my data-config.xml:
>
>
>query="select * from ads WHERE approvedate > '1/1/1900' and
> publishdate
> < getdate() AND depublishdate > getdate() and deletedate = '1/1/1900'"
>deltaImportQuery="select * from ads WHERE approvedate >
> '1/1/1900' and
> publishdate < getdate() AND depublishdate > getdate() and deletedate =
> '1/1/1900' and id='${dataimporter.delta.id}'"
>deltaQuery="select id from ads where updatedate >
> '${dataimporter.last_index_time}'">
>
>deltaImportQuery="select locpath as locpath FROM
> ad_photos where
> adid='${dataimporter.delta.id}'"
>deltaQuery="select locpath as locpath FROM ad_photos
> where createdate
> > '${dataimporter.last_index_time}'">
>
>
>
>
>
>
> Now, when I add a new photo to the ad_photos table, its not index when I
> perform a delta import like so:
> http://localhost:8983/solr/i2m/dataimport?command=delta-import.
> When I do a FULL import I do see the new images.
>
>
> Here's the definition of ad_photos table:
>
> CREATE TABLE [dbo].[ad_photos](
>[id] [int] IDENTITY(1,1) NOT NULL,
>[adid] [int] NOT NULL,
>[locpath] [nvarchar](150) NOT NULL,
>[title] [nvarchar](50) NULL,
>[createdate] [datetime] NOT NULL,
>  CONSTRAINT [PK_ad_photos] PRIMARY KEY CLUSTERED
> (
>[id] ASC
> )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY =
> OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
> ) ON [PRIMARY]
>
> GO
>
>
>
> What am I doing wrong?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Delta-import-issue-tp3162581p3162581.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Solr Multithreading

2011-06-19 Thread Rahul Warawdekar
Hi,

I am currently working on a search based project which involves
indexing data from a SQL Server database including attachments using
DIH.
For indexing attachments (varbinary DB objects), I am using TikaEntityProcessor.

I am trying to use the multithreading to speed up the indexing but it
seems to fail when indexing attachments, even after appying a few Solr
fix patches.

My question is, Is the current multithreading feature stable in Solr
3.1 or it needs further enhancements ?

-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-26 Thread Rahul Warawdekar
Hi Markus,

It is Tika.
I tried using tika standalone.

On 5/26/11, Markus Jelsma  wrote:
> Can you rule out Tika or Solr by trying to parse the file with a stand-alone
> Tika?
>
>> Hi All,
>>
>> I am using Solr 3.1 for one of our search based applications.
>> We are using DIH to index our data and TikaEntityProcessor to index
>> attachments.
>> Currently we are running into an issue while extracting content from one
>> of
>> our MS Excel 2007 files, using TikaEntityProcessor.
>>
>> The issue is the TikaEntityProcessor is hung without throwing any
>> exception
>> which in tuen causes the indexing to be hung on the server.
>>
>> Has anyone faced a similar kind of issue in the past with
>> TikaEntityProcessor ?
>>
>> Also, does someone know of a way to just skip this type of behaviour for
>> that file and move to the next document to be indexed ?
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-26 Thread Rahul Warawdekar
Hi All,

I am using Solr 3.1 for one of our search based applications.
We are using DIH to index our data and TikaEntityProcessor to index
attachments.
Currently we are running into an issue while extracting content from one of
our MS Excel 2007 files, using TikaEntityProcessor.

The issue is the TikaEntityProcessor is hung without throwing any exception
which in tuen causes the indexing to be hung on the server.

Has anyone faced a similar kind of issue in the past with
TikaEntityProcessor ?

Also, does someone know of a way to just skip this type of behaviour for
that file and move to the next document to be indexed ?



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: 2 index within the same Solr server ?

2011-03-29 Thread Rahul Warawdekar
Please refer
http://wiki.apache.org/solr/MultipleIndexes

On 3/29/11, Amel Fraisse  wrote:
> Hello every body,
>
> Is it possible to create 2 index within the same Solr server ?
>
> Thank you.
>
> Amel.
>


-- 
Thanks and Regards
Rahul A. Warawdekar


Query regarding search term count in Solr

2011-02-09 Thread Rahul Warawdekar
Hi All,

This is Rahul and am using Solr for one of my upcoming projects.
I had a query regarding search term count using Solr.
We have a requirement in one of our search based projects to search the
results based on search term counts per document.

For eg,
if a user searches for something like "solr[4:9]", this query should return
only documents in which solr appears between 4 and 9 times (inclusively).
 if a user searches for something like "solr lucene[4:9]", this query should
return only documents in which the phrase "solr lucene" appears between 4
and 9 times (inclusively).

Is there any way from Solr to return results based on the search term and
phrase counts ?
If  not, can it be customized by extending existing Solr/Lucene libraries ?


-- 
Thanks and Regards
Rahul A. Warawdekar