data-config.xml: delta-import unclear behaviour pre/postDeleteImportQuery with clean

2011-01-31 Thread manuel aldana
I have some unclear behaviour with using clean and 
pre/postImportDeleteQuery for delta-imports. The docs under 
http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml 
are not clear enough.


My observation is:
- preImportDeleteQuery is only executed if clean=true is set
- postImportDeleteQuery is only executed if clean=true is set
- if preImportDeleteQuery is ommitted and clean=true then the whole 
index is cleaned

=> config with postImportDeleteQuery itself won't work

Is above correct?

I don't need preImportDeleteQuery only post is necessary. But to make 
post work I am doubling the post to pre so clean=true doesn't delete 
whole index. This looks a bit like a workaround as wanted behaviour.


solr version is 1.4.1

thanks.

--
 manuel aldana
 mail: ald...@gmx.de | man...@aldana-online.de
 blog: www.aldana-online.de



Re: DeletepkQuery is not actually deleting the records, please help

2011-01-31 Thread Stefan Matheis
Hey Prad,

already had a look at your mysql-query log, to check if the relevant select
query is executed? and if, what results it had?

Regards
Stefan

On Sat, Jan 29, 2011 at 12:50 AM, makeyourrules wrote:

>
> Hello,
> I am trying to delete some records from my index with delta-import using
> deletePkQuery with the below config, the log prints the deleted documents
> and says delta import successfully, but when I search, my search results
> still have those deleted documents, i have already spent so much time
> researching it but couldn't find any solution. All my database updates are
> getting updated with deltaQuery and deltaImportQuery but not the deletes.
> Could any one suggest me any solution?
>
> URL:
> http://localhost:8983/dataimport?command=delta-import
>
> dataConfig.xml:
>dataSource="MyDatasource"
>query="SELECT col1, col2, col3, col4 FROM
> MyTable1"
>
>deltaImportQuery="SELECT MyTable1.col1,
> MyTable1.col2, MyTable1.col3,
> MyTable1.col4 FROM MyTable1 MyTable1
>where MyTable1.col1 =
> '${dataimporter.delta.bcol1}'"
>
>deltaQuery="SELECT bcol1 from MyTable2
>where LastModifiedTime >
> '${dataimporter.last_index_time}' and
> status in ('A','U')"
>
>deletedPkQuery="SELECT bcol1 from MyTable2
>where LastModifiedTime >
> '${dataimporter.last_index_time}' and
> status='D'">
>
>
>
>
>
>
>
>
>
> Log file:
>
>
>
> [2011/01/28 16:56:26.498] Completed ModifiedRowKey for Entity: item rows
> obtained : 0
> [2011/01/28 16:56:26.499] Completed DeletedRowKey for Entity: item rows
> obtained : 6
> [2011/01/28 16:56:26.499] Completed parentDeltaQuery for Entity: item
> [2011/01/28 16:56:32.563] Deleting stale documents
> .
> .
> [2011/01/28 16:58:00.319] Deleting document: BAAH
> [2011/01/28 17:06:50.537] Deleting document: BAAI
> [2011/01/28 17:07:28.470] Deleting document: BAAL
> [2011/01/28 17:08:13.187] Deleting document: BAAM
> [2011/01/28 17:08:27.011] Deleting document: BAAJ
> [2011/01/28 17:08:44.218] Deleting document: BAAK
> [2011/01/28 17:09:13.487] Delta Import completed successfully
> [2011/01/28 17:09:32.174] Import completed successfully
> [2011/01/28 17:09:32.175] start
>
> commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=false)
> ..
> [2011/01/28 17:09:32.212] autowarming Searcher@a44b35 main from
> Searcher@f41f34 main
> [2011/01/28 17:09:32.215] Read dataimport.properties
> [2011/01/28 17:09:32.217] Wrote last indexed time to dataimport.properties
> [2011/01/28 17:09:33.791] Time taken = 0:13:45.366
>
>
> Any suggestions would be highly appreciated.
>
> Thanks,
> Prad.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/DeletepkQuery-is-not-actually-deleting-the-records-please-help-tp2368463p2368463.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr for noSQL

2011-01-31 Thread Steven Noels
On Fri, Jan 28, 2011 at 1:30 AM, Jianbin Dai  wrote:

> Hi,
>
>
>
> Do we have data import handler to fast read in data from noSQL database,
> specifically, MongoDB I am thinking to use?
>
> Or a more general question, how does Solr work with noSQL database?
>


Can't say anything about MongoDB, but we have an integration of SOLR with
HBase inside Lily - www.lilyproject.org. It indeed uses the 'normal' SOLR
index update API rather than a DIH - as we had the need to have incremental
updates. The Indexer component we wrote does mapping from Lily/HBase schema
to SOLR, as we also felt the need that both schemas shouldn't necessarily be
identical.

Steven.
-- 
Steven Noels
http://outerthought.org/
Scalable Smart Data
Makers of Kauri, Daisy CMS and Lily


Re: Http Connection is hanging while deleteByQuery

2011-01-31 Thread shan2812

This is the log trace..

2011-01-31 10:07:18,837 ERROR (main)[SearchBusinessControllerImpl] Solr
connecting to url: http://10.145.10.154:8081/solr
2011-01-31 10:07:18,873 DEBUG (main)[DefaultHttpParams] Set parameter
http.useragent = Jakarta Commons-HttpClient/3.1
2011-01-31 10:07:18,880 DEBUG (main)[DefaultHttpParams] Set parameter
http.protocol.version = HTTP/1.1
2011-01-31 10:07:18,881 DEBUG (main)[DefaultHttpParams] Set parameter
http.connection-manager.class = class
org.apache.commons.httpclient.SimpleHttpConnectionManager
2011-01-31 10:07:18,881 DEBUG (main)[DefaultHttpParams] Set parameter
http.protocol.cookie-policy = default
2011-01-31 10:07:18,881 DEBUG (main)[DefaultHttpParams] Set parameter
http.protocol.element-charset = US-ASCII
2011-01-31 10:07:18,881 DEBUG (main)[DefaultHttpParams] Set parameter
http.protocol.content-charset = ISO-8859-1
2011-01-31 10:07:18,881 DEBUG (main)[DefaultHttpParams] Set parameter
http.method.retry-handler =
org.apache.commons.httpclient.DefaultHttpMethodRetryHandler@15299647
2011-01-31 10:07:18,882 DEBUG (main)[DefaultHttpParams] Set parameter
http.dateparser.patterns = [EEE, dd MMM  HH:mm:ss zzz, , dd-MMM-yy
HH:mm:ss zzz, EEE MMM d HH:mm:ss , EEE, dd-MMM- HH:mm:ss z, EEE,
dd-MMM- HH-mm-ss z, EEE, dd MMM yy HH:mm:ss z, EEE dd-MMM- HH:mm:ss
z, EEE dd MMM  HH:mm:ss z, EEE dd-MMM- HH-mm-ss z, EEE dd-MMM-yy
HH:mm:ss z, EEE dd MMM yy HH:mm:ss z, EEE,dd-MMM-yy HH:mm:ss z,
EEE,dd-MMM- HH:mm:ss z, EEE, dd-MM- HH:mm:ss z]
2011-01-31 10:07:18,882 DEBUG (main)[DefaultHttpParams] Set parameter
http.connection.timeout = 1
2011-01-31 10:07:18,882 DEBUG (main)[DefaultHttpParams] Set parameter
http.connection-manager.max-total = 10
2011-01-31 10:07:18,883 DEBUG (main)[HttpClient] Java version: 1.5.0_22
2011-01-31 10:07:18,883 DEBUG (main)[HttpClient] Java vendor: Sun
Microsystems Inc.
2011-01-31 10:07:18,883 DEBUG (main)[HttpClient] Java class path:
:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/activation-1.0.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/apache-solr-cell-1.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/apache-solr-clustering-1.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/apache-solr-core-1.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/apache-solr-dataimporthandler-1.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/apache-solr-dataimporthandler-extras-1.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/apache-solr-solrj-1.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/castor-1.0.5-xml.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-cache-0.1-dev.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-codec-1.3.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-collections-3.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-configuration-1.3.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-dbcp-1.2.1.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-dbutils-1.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-fileupload-1.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-httpclient-3.1.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-lang-2.1.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-logging-1.0.4.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-pool-1.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/dom4j-1.6.1.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/google-api-translate-java-0.6.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/jakarta-oro-2.0.6.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/log4j-1.2.15.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/mail-1.3.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/mysql-connector-java-5.0.7-bin.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/netrics-likeit-4.1.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/ojdbc14-10.2.0.3.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/oro-2.0.8.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/slf4j-api-1.4.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/slf4j-log4j12-1.4.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/toplink-10.1.3.1.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/vrp-business-cmd-4.3.24.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/vrp-cmd-dataloader-4.2.18.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/vrp-xml-crdif-2.0.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/xerces-2.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/xmlparserv2-10.1.3.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program

Re: first search on index

2011-01-31 Thread Grijesh

Solr is Http Caching enabled by default. Try to cleare cache before quering .
Shift+refresh(F5) may cleare cache.

Due to cache it may be possible old results may displayed after index have
been changed

-
Thanx:
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/first-search-on-index-tp2386558p2389115.html
Sent from the Solr - User mailing list archive at Nabble.com.


Patch for edismax Query Parser

2011-01-31 Thread Isan Fulia
Hi all,
I want to know how to apply patch for extended dismax query parser on solr
1.4.1.


-- 
Thanks & Regards,
Isan Fulia.


Re: Patch for edismax Query Parser

2011-01-31 Thread Erick Erickson
Do you know how to apply patches in general? Or is this specifically
about the edismax patch?

Quick response for the general "how to apply a patch" question:
1> get the source code for Solr
2> get to the point you can run "ant clean test" successfully.
3> apply the source patch
4> execute "ant dist".

You should now have a war file in your /dist

See: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches

NOTE: I haven't applied that specific patch to 1.4.1, so I don't know what
gremlins
are hanging around.

Best
Erick

On Mon, Jan 31, 2011 at 7:12 AM, Isan Fulia wrote:

> Hi all,
> I want to know how to apply patch for extended dismax query parser on solr
> 1.4.1.
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>


Re: SolrJ (Trunk) Invalid version or the data in not in 'javabin' format

2011-01-31 Thread Em

Hi,

I will give you feedback today. There occured another issue with our current
Solr-installation that I have to fix.

Thanks for your effort!

Regards
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Trunk-Invalid-version-or-the-data-in-not-in-javabin-format-tp2384421p2389343.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Patch for edismax Query Parser

2011-01-31 Thread Isan Fulia
specifically for edismax patch

On 31 January 2011 18:22, Erick Erickson  wrote:

> Do you know how to apply patches in general? Or is this specifically
> about the edismax patch?
>
> Quick response for the general "how to apply a patch" question:
> 1> get the source code for Solr
> 2> get to the point you can run "ant clean test" successfully.
> 3> apply the source patch
> 4> execute "ant dist".
>
> You should now have a war file in your /dist
>
> See: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches
>
> NOTE: I haven't applied that specific patch to 1.4.1, so I don't know what
> gremlins
> are hanging around.
>
> Best
> Erick
>
> On Mon, Jan 31, 2011 at 7:12 AM, Isan Fulia  >wrote:
>
> > Hi all,
> > I want to know how to apply patch for extended dismax query parser on
> solr
> > 1.4.1.
> >
> >
> > --
> > Thanks & Regards,
> > Isan Fulia.
> >
>



-- 
Thanks & Regards,
Isan Fulia.


Re: Patch for edismax Query Parser

2011-01-31 Thread Erick Erickson
Have you tried it? What problems are you having?

Please review: http://wiki.apache.org/solr/UsingMailingLists

Erick

On Mon, Jan 31, 2011 at 8:10 AM, Isan Fulia wrote:

> specifically for edismax patch
>
> On 31 January 2011 18:22, Erick Erickson  wrote:
>
> > Do you know how to apply patches in general? Or is this specifically
> > about the edismax patch?
> >
> > Quick response for the general "how to apply a patch" question:
> > 1> get the source code for Solr
> > 2> get to the point you can run "ant clean test" successfully.
> > 3> apply the source patch
> > 4> execute "ant dist".
> >
> > You should now have a war file in your /dist
> >
> > See: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches
> >
> > NOTE: I haven't applied that specific patch to 1.4.1, so I don't know
> what
> > gremlins
> > are hanging around.
> >
> > Best
> > Erick
> >
> > On Mon, Jan 31, 2011 at 7:12 AM, Isan Fulia  > >wrote:
> >
> > > Hi all,
> > > I want to know how to apply patch for extended dismax query parser on
> > solr
> > > 1.4.1.
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Isan Fulia.
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>


UpdateHandler-Bug or intended feature?

2011-01-31 Thread Em

Hi list,

I am not sure whether this behaviour is intended or not.

I am experimenting with the UpdateRequestProcessor-feature of Solr (V: 1.4)
and there occured something I find strange.

Well, when I send csv-data to the CSV-UpdateHandler with some fields
specified that are not part of the Schema, the input isn't passed up to the
UpdateRequestProcessor-Chain. 

Here is some code from my UpdateRequestProcessor:

@Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
 super.processAdd(cmd);
 throw new IOException("HelloWorld");
 }

Well, this processor makes truely no sense, but I wanted to see whether it
is called or not and it seems like it won't get called anyway.

My clients gets back messages like "undefined field MyID" - yes, "MyID"
isn't specified. 

For example:
If I want to build a field "hash" from "MyID" and removing "MyID" afterwards
from the InputDocument, I will never get the chance to do so if the
Processor isn't called anyway.

Is this intended or am I doing something wrong here?

Regards
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/UpdateHandler-Bug-or-intended-feature-tp2389382p2389382.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-31 Thread Gary Taylor
Can anyone shed any light on this, and whether it could be a config 
issue?  I'm now using the latest SVN trunk, which includes the Tika 0.8 
jars.


When I send a ZIP file (containing two txt files, doc1.txt and doc2.txt) 
to the ExtractingRequestHandler, I get the following log entry 
(formatted for ease of reading) :


SolrInputDocument[
{
ignored_meta=ignored_meta(1.0)={
[stream_source_info, file, stream_content_type, 
application/octet-stream, stream_size, 260, stream_name, solr1.zip, 
Content-Type, application/zip]

},
ignored_=ignored_(1.0)={
[package-entry, package-entry]
},
ignored_stream_source_info=ignored_stream_source_info(1.0)={file},

ignored_stream_content_type=ignored_stream_content_type(1.0)={application/octet-stream}, 


ignored_stream_size=ignored_stream_size(1.0)={260},
ignored_stream_name=ignored_stream_name(1.0)={solr1.zip},
ignored_content_type=ignored_content_type(1.0)={application/zip},
docid=docid(1.0)={74},
type=type(1.0)={5},
text=text(1.0)={  doc2.txtdoc1.txt}
}
]

So, the data coming back from Tika when parsing a ZIP file does not 
include the file contents, only the names of the files contained 
therein.  I've tried forcing stream.type=application/zip in the CURL 
string, but that makes no difference.  If I specify an invalid 
stream.type then I get an exception response, so I know it's being used.


When I send one of those txt files individually to the 
ExtractingRequestHandler, I get:


SolrInputDocument[
{
ignored_meta=ignored_meta(1.0)={
[stream_source_info, file, stream_content_type, text/plain, 
stream_size, 30, Content-Encoding, ISO-8859-1, stream_name, doc1.txt]

},
ignored_stream_source_info=ignored_stream_source_info(1.0)={file},

ignored_stream_content_type=ignored_stream_content_type(1.0)={text/plain},

ignored_stream_size=ignored_stream_size(1.0)={30},
ignored_content_encoding=ignored_content_encoding(1.0)={ISO-8859-1},
ignored_stream_name=ignored_stream_name(1.0)={doc1.txt},
docid=docid(1.0)={74},
type=type(1.0)={5},
text=text(1.0)={The quick brown fox  }
}
]

and we see the file contents in the "text" field.

I'm using the following requestHandler definition in solrconfig.xml:


class="org.apache.solr.handler.extraction.ExtractingRequestHandler" 
startup="lazy">



text
true
ignored_


true
links
ignored_



Is there any further debug or diagnostic I can get out of Tika to help 
me work out why it's only returning the file names and not the file 
contents when parsing a ZIP file?


Thanks and kind regards,
Gary.



On 25/01/2011 16:48, Jayendra Patil wrote:

Hi Gary,

The latest Solr Trunk was able to extract and index the contents of the zip
file using the ExtractingRequestHandler.
The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and
worked pretty well.

Tested again with sample url and works fine -
curl "
http://localhost:8080/solr/core0/update/extract?stream.file=C:/temp/extract/777045.zip&literal.id=777045&literal.title=Test&commit=true
"

You would probably need to drill down to the Tika Jars and
the apache-solr-cell-4.0-dev.jar used for Rich documents indexing.

Regards,
Jayendra





Re: Solr for noSQL

2011-01-31 Thread Estrada Groups
What are the advantages of using something like HBase over your standard Lucene 
index with Solr? It would seem to me like you'd be losing a lot of what Lucene 
has to offer!?!

Adam

On Jan 31, 2011, at 5:34 AM, Steven Noels  wrote:

> On Fri, Jan 28, 2011 at 1:30 AM, Jianbin Dai  wrote:
> 
>> Hi,
>> 
>> 
>> 
>> Do we have data import handler to fast read in data from noSQL database,
>> specifically, MongoDB I am thinking to use?
>> 
>> Or a more general question, how does Solr work with noSQL database?
>> 
> 
> 
> Can't say anything about MongoDB, but we have an integration of SOLR with
> HBase inside Lily - www.lilyproject.org. It indeed uses the 'normal' SOLR
> index update API rather than a DIH - as we had the need to have incremental
> updates. The Indexer component we wrote does mapping from Lily/HBase schema
> to SOLR, as we also felt the need that both schemas shouldn't necessarily be
> identical.
> 
> Steven.
> -- 
> Steven Noels
> http://outerthought.org/
> Scalable Smart Data
> Makers of Kauri, Daisy CMS and Lily


Re: UpdateHandler-Bug or intended feature?

2011-01-31 Thread Koji Sekiguchi

(11/01/31 22:20), Em wrote:


Hi list,

I am not sure whether this behaviour is intended or not.

I am experimenting with the UpdateRequestProcessor-feature of Solr (V: 1.4)
and there occured something I find strange.

Well, when I send csv-data to the CSV-UpdateHandler with some fields
specified that are not part of the Schema, the input isn't passed up to the
UpdateRequestProcessor-Chain.

Here is some code from my UpdateRequestProcessor:

@Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
 super.processAdd(cmd);
 throw new IOException("HelloWorld");
 }

Well, this processor makes truely no sense, but I wanted to see whether it
is called or not and it seems like it won't get called anyway.



Are you sure you have your UpdateRequestProcessor is defined in solrconfig.xml
and you set the name of UpdateRequestProcessorChain to update.processor 
parameter
when you call CSVLoader?

Koji
--
http://www.rondhuit.com/en/


Re: field=string with value: 0, 1 and 2

2011-01-31 Thread stockii

i found the problem.

DIH or i think the JDBC-Driver casting 0 and 1 to boolean, if the field in
database from type (tinyint(1)).

iam using tow fields with type of tinyint(1) and tinyint(2) -.-

-
--- System


One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores < 100.000

- Solr1 for Search-Requests - commit every Minute  - 4GB Xmx
- Solr2 for Update-Request  - delta every 2 Minutes - 4GB Xmx
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/field-string-with-value-0-1-and-2-tp2367038p2389508.html
Sent from the Solr - User mailing list archive at Nabble.com.


EmbeddedSolrServer and junit

2011-01-31 Thread dan sutton
Hi,

I have 2 cores CoreA and CoreB, when updating content on CoreB, I use
solrj and EmbeddedSolrServer to query CoreA for information, however
when I do this with my junit tests (which also use EmbeddedSolrServer
to query) I get this error

SEVERE: Previous SolrRequestInfo was not closed!

junit.framework.AssertionFailedError
[junit] at 
org.apache.solr.request.SolrRequestInfo.setRequestInfo(SolrRequestInfo.java:45)

How should I write the junit tests to test a multi-core, with
EmbeddedSolrServer used in a component during querying?

Cheers,
Dan


Re: UpdateHandler-Bug or intended feature?

2011-01-31 Thread Em
Hi Koji,

following is the solrconfig:



throwAway



  


  

Do you see any mistake here?

Regards


Re: UpdateHandler-Bug or intended feature?

2011-01-31 Thread Koji Sekiguchi

(11/01/31 23:33), Em wrote:

Hi Koji,

following is the solrconfig:

 
 
 throwAway
 
 

   
 
 
   

Do you see any mistake here?

Regards



Hmm, Looks fine. Are you sure you create your update processor instance
in your factory and return it? (if it is null, processor chain simply
ignores your processor...)

Koji
--
http://www.rondhuit.com/en/


Re: Solr Indexing Performance

2011-01-31 Thread Tomás Fernández Löbbe
Well, I would say that the best way to be sure is to benchmark different
configurations.
As far as I know, it's usually not recommended such a big RAM Buffer size,
default is 32 MB and probably won't get any improvements using more than 128
MB.
The same with the mergeFactor, I know that a larger merge factor it's better
for indexing, but 50 sounds like a lot. Anyway, as I said before, the best
thing to do is benchmark different configurations and see which one works
better for you.

Have you tried assigning less memory to the JVM? That would leave more
memory available to the OS.

Tomás

On Sun, Jan 30, 2011 at 1:54 AM, Darx Oman  wrote:

> Hi guys
>
>
>
> I'm running a solr instance (trunk)  in my dev. Server to test my
> configuration.  I'm doing a DIH full import to index 49 PDF files with
> their
> corresponding database records.  Both the PDF files and database are local
> in the server.
>
> *Server : *
>
> · Windows 2008 R2
>
> · MS SQL server 2008 R2
>
> · 16 core processor
>
> · 16 GB ram
>
> *Tomcat (7.0.5) : *
>
> · Set JAVA_OPTS = %JAVA_OPTS%  -Xms1024M  -Xmx8192M
>
> *Solrconfig:*
>
> · Main index configurations
>2048
>50
>
> *DIH configuration:*
>
> · 2 data sources defined  jdbcDataSource and BinFileDataSource
>
> · One main entity with 3 sub entities
>
> 
>
> 
>
> 
>
> 
>
> 
>
> · Total schema fields are 8, three of which are text type and
> multivalued.
>
> *My DIH import Status Messages:*
>
> · Total Requests made to DataSource = 99**
>
> · Total Rows Fetched = 2124**
>
> · Total DocumentsProcessed = 49**
>
> · Time Taken = *0:2:3:880***
>
> *
> Is this time reasonable or it can be improved?*
>


Re: DeletepkQuery is not actually deleting the records, please help

2011-01-31 Thread makeyourrules

Thanks for your reply Stefan, mysql log says query is returning those deleted
records and also the solr log has the deleted records, but for some reason
they are not actually getting deleted from the index.

[2011/01/28 16:58:00.319] Deleting document: BAAH
> [2011/01/28 17:06:50.537] Deleting document: BAAI
> [2011/01/28 17:07:28.470] Deleting document: BAAL
> [2011/01/28 17:08:13.187] Deleting document: BAAM
> [2011/01/28 17:08:27.011] Deleting document: BAAJ
> [2011/01/28 17:08:44.218] Deleting document: BAAK
> [2011/01/28 17:09:13.487] Delta Import completed successfully 

Any help would be highly appreciated.

Thanks,
Prad.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DeletepkQuery-is-not-actually-deleting-the-records-please-help-tp2368463p2389749.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search for FirstName with first Char uppercase followed by * not giving result; getting result with all lowercase and *

2011-01-31 Thread Ahmet Arslan
> I had attached the Analysis report of the query George*

Attachment didn't arrive. But I think you are referring output of analysis.jsp. 
It can be confusing because it does not do actual query parsing.
Instead you can look output of &debugQuery=on.

> When I indexed *George *it was also finally analyzed and
> stored as *george*

When you are using wildcard operator you need to think what is indexed.
In your case you need to pre-lowercase your query at client side. 
Instead of &q=George* you need to use &q=george*

In other words index time analyzer is completely ignored with wildcard and 
fuzzy queries.



  


Re: UpdateHandler-Bug or intended feature?

2011-01-31 Thread Em
Okay, I added some Logging-Stuff to both the processor and its factory.
It turned out that there IS an updateProcessor returned and it is NOT null.
However, my logging-method inside the processAdd-Method (1st line, so it
HAS to be called, if one calls the method) get never called - so the
exception will definitly get called before my processor does something.

Looking into the CSVRequestHandler shows that the CSVRequestHandler's
prepareFields()-method seems to be based on the header of the CSV-file,
not on the document itself. However, I am currently reading more of the
code to understand what really happens, because everything works fine,
if the fields of the csv are specified - no matter whether I add fields
with an UpdateRequestProcessor or not.

If you like, have a look around line 282 (prepareFields) in
CSVRequestHandler.

Regards

Am 31.01.2011 16:06, schrieb Koji Sekiguchi:
> (11/01/31 23:33), Em wrote:
>> Hi Koji,
>>
>> following is the solrconfig:
>>
>>  
>>  
>>  throwAway
>>  
>>  
>>
>>
>>  > class="solr.experiments.solr.update.processor.ThrowAwayUpdateProcessorFactory"/>
>>
>>  
>>
>>
>> Do you see any mistake here?
>>
>> Regards
>>
>
> Hmm, Looks fine. Are you sure you create your update processor instance
> in your factory and return it? (if it is null, processor chain simply
> ignores your processor...)
>
> Koji



Re: NOT operator not working

2011-01-31 Thread abhayd

thanks that helps
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/NOT-operator-not-working-tp2365831p2389803.html
Sent from the Solr - User mailing list archive at Nabble.com.


one column indexed, the other isnt

2011-01-31 Thread PeterKerk

I have below configuration. Somehow the field KVK IS indexed and the
varstatement column isnt.

I have tried everything:  reloaded schema.xml, reindex...but somehow the
varstatement column remains 'false' even though I KNOW it is true.

The KVK value IS indexed correctly. What else can it be? I dont get any
errors when I do a full-import..



My database:
KVK nvarchar(50)
varstatementbit 


schema.xml




data-config.xml






-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/one-column-indexed-the-other-isnt-tp2389819p2389819.html
Sent from the Solr - User mailing list archive at Nabble.com.


nested faceting ?

2011-01-31 Thread abhayd

hi 

We already have faceting on our site.

I am loading devices and accessories in solr index. deviceType indicates if
its a device or accessory

All other attributes are same for device and accessory. When query results
come back I would like to display someting like

Devices
+Manucaturer (100)
  - Samsung (50)
  - Sharp (50)
Accessories
+Manufacturer(1000)
 -Samsung (500)
 -Apple(500)

How would my query look like in this case? 
Is it possible with solr or do i need to implement this at applicaton level
by parsing ersponse from SOLR?

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/nested-faceting-tp2389841p2389841.html
Sent from the Solr - User mailing list archive at Nabble.com.


Adding plug-in to Solr

2011-01-31 Thread McGibbney, Lewis John
Hello list,

I am attempting to port a plug-in to my Solr implementation and would like to 
discuss best practice for doing so. The plug-in relates specifically to the 
query submitted through Solr, the idea is to provide some sort of query 
'refinement' mechanism relating t a specific domain. Some information of a 
similar type of plug-in can be found here

http://wiki.apache.org/nutch/OntologyPlugin

My question really relates to what config files I need to be consulting when 
adding plug-ins to Solr and would like to ask for users' experience with this 
type of experiment.

Any comments would be great

Lewis

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html


Re: UpdateHandler-Bug or intended feature?

2011-01-31 Thread Em
Here is what I found out:

The CSVRequestHandler gets its fields in line 240 and the following
ones. Those fieldnames come from the file's header
or from the specified params in the request.

The CSVRequestHandler calls prepareFields to create an array of
SchemaFields (see line 269) that will be
filled by schema-fields in line 282.
Here comes the problem: "MyID" does not exist as a schemaField, which
throws an Exception.
Ignoring the field "MyID" would not solve the problem, since it is
needed for my UpdateRequestProcessor.

Well, this does not seem to me like a bug but more like an exotic
situation where two concepts collidate with eachother.
The CSVRequestHandler is intended to sweep all the unneccessary stuff
out of the input to avoid exceptions for unknown fields
while my UpdateRequestProcessor needs such fields to work correctly.

I could imagine to add all expected fields to my schema.xmll with
indexed + stored = false, but this is dirty.
However, the more I think of a rewrite for my situation, the less sense
it makes since the validation is definitly neccessary.

It seems like I will end up in my first idea with adding all expected
fields to my schema.xml, if there are no other suggestions.

Thank you for your help!

Regards

Am 31.01.2011 16:58, schrieb Em:
> Okay, I added some Logging-Stuff to both the processor and its factory.
> It turned out that there IS an updateProcessor returned and it is NOT null.
> However, my logging-method inside the processAdd-Method (1st line, so it
> HAS to be called, if one calls the method) get never called - so the
> exception will definitly get called before my processor does something.
>
> Looking into the CSVRequestHandler shows that the CSVRequestHandler's
> prepareFields()-method seems to be based on the header of the CSV-file,
> not on the document itself. However, I am currently reading more of the
> code to understand what really happens, because everything works fine,
> if the fields of the csv are specified - no matter whether I add fields
> with an UpdateRequestProcessor or not.
>
> If you like, have a look around line 282 (prepareFields) in
> CSVRequestHandler.
>
> Regards
>
> Am 31.01.2011 16:06, schrieb Koji Sekiguchi:
>> (11/01/31 23:33), Em wrote:
>>> Hi Koji,
>>>
>>> following is the solrconfig:
>>>
>>>  
>>>  
>>>  throwAway
>>>  
>>>  
>>>
>>>
>>>  >> class="solr.experiments.solr.update.processor.ThrowAwayUpdateProcessorFactory"/>
>>>
>>>  
>>>
>>>
>>> Do you see any mistake here?
>>>
>>> Regards
>>>
>> Hmm, Looks fine. Are you sure you create your update processor instance
>> in your factory and return it? (if it is null, processor chain simply
>> ignores your processor...)
>>
>> Koji
>



Re: resetting stats

2011-01-31 Thread Ian Connor
Has there been any progress on this or tools people might use to capture the
average or 90% time for the last hour?

That would allow us to better match up slowness with other metrics like
CPU/IO/Memory to find bottlenecks in the system.

Thanks,
Ian.

On Wed, Mar 31, 2010 at 9:13 PM, Chris Hostetter
wrote:

>
> : Say I have 3 Cores names core0, core1, and core2, where only core1 and
> core2
> : have documents and caches.  If all my searches hit core0, and core0
> shards
> : out to core1 and core2, then the stats from core0 would be accurate for
> : errors, timeouts, totalTime, avgTimePerRequest, avgRequestsPerSecond,
> etc.
>
> Ahhh yes. (i see what you mean by "aggregating core" now ... i thought
> you ment a core just for aggregatign stats)
>
> *If* you are using distributed search, then you can gather stats from the
> core you use for collating/aggregating from the other shards, and
> reloading that core should be cheap.
>
> but if you aren't already using distributed searching, it would be a bad
> idea from a performance standpoint to add it just to take advantage of
> being able to reload the coordinator core (the overhead of searching one
> distributed shard vs doing the same query directly is usually very
> measurable, even on if the shard is the same Solr instance as your
> coordinator)
>
>
>
> -Hoss
>
>


-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor


Re: EmbeddedSolrServer and junit

2011-01-31 Thread dan sutton
Hi,

I think I've found the cause:

src/java/org/apache/solr/util/TestHarness.java,  query(String handler,
SolrQueryRequest req) calls SolrRequestInfo.setRequestInfo(new
SolrRequestInfo(req, rsp)), which my componenet also calls in the same
thread hence the error.

The fix was to override assertQ to call
queryAndResponse(String handler, SolrQueryRequest req) instead which
does not set/clear SolrRequestInfo

Regards,
Dan

On Mon, Jan 31, 2011 at 2:32 PM, dan sutton  wrote:
> Hi,
>
> I have 2 cores CoreA and CoreB, when updating content on CoreB, I use
> solrj and EmbeddedSolrServer to query CoreA for information, however
> when I do this with my junit tests (which also use EmbeddedSolrServer
> to query) I get this error
>
> SEVERE: Previous SolrRequestInfo was not closed!
>
> junit.framework.AssertionFailedError
> [junit]     at 
> org.apache.solr.request.SolrRequestInfo.setRequestInfo(SolrRequestInfo.java:45)
>
> How should I write the junit tests to test a multi-core, with
> EmbeddedSolrServer used in a component during querying?
>
> Cheers,
> Dan
>


Re: one column indexed, the other isnt

2011-01-31 Thread Erick Erickson
What is your schema definition for "varstatement"? Please include
the fieldType as well as the field definition.

How do you expect to convert from your bit type to whatever you've
defined in your schema for varstatement (which is boolean?)?

And lastly, how do you KNOW your actual select statement is
returning anything except "false"? Have you seen the results in
your SQL log in your DB or are you simply asserting that there
are true values in your DB? Because I've KNOWN more times
than I care to count that something HAS to be correct and
been wrong 

Best
Erick

On Mon, Jan 31, 2011 at 11:03 AM, PeterKerk  wrote:

>
> I have below configuration. Somehow the field KVK IS indexed and the
> varstatement column isnt.
>
> I have tried everything:  reloaded schema.xml, reindex...but somehow the
> varstatement column remains 'false' even though I KNOW it is true.
>
> The KVK value IS indexed correctly. What else can it be? I dont get any
> errors when I do a full-import..
>
>
>
> My database:
> KVK nvarchar(50)
> varstatementbit
>
>
> schema.xml
> 
> 
>
>
> data-config.xml
> 
>
>
>
>
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/one-column-indexed-the-other-isnt-tp2389819p2389819.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: nested faceting ?

2011-01-31 Thread Erick Erickson
I don't think you'll be able to do this with your present schema, the
information
isn't available in the faceting response, you'd get something like
1100 and no way to know that 1,000
of them were accessories.

You could change the values in your index to something like
"accessories_manufacturer" and "device_manufacturer" and get at the
info that way.

If you're on trunk, grouping might help too.

I admit I'm on my way out the door so this is an "off the cuff" answer,
maybe
I'm overlooking the obvious (again!)

Best
Erick

On Mon, Jan 31, 2011 at 11:13 AM, abhayd  wrote:

>
> hi
>
> We already have faceting on our site.
>
> I am loading devices and accessories in solr index. deviceType indicates if
> its a device or accessory
>
> All other attributes are same for device and accessory. When query results
> come back I would like to display someting like
>
> Devices
> +Manucaturer (100)
>  - Samsung (50)
>  - Sharp (50)
> Accessories
> +Manufacturer(1000)
>  -Samsung (500)
>  -Apple(500)
>
> How would my query look like in this case?
> Is it possible with solr or do i need to implement this at applicaton level
> by parsing ersponse from SOLR?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/nested-faceting-tp2389841p2389841.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Migration from Solr 1.2 to Solr 1.4

2011-01-31 Thread Vincent Chavelle
Hello,
I have huge numbers of data indexed in solr and I would know the best way to
migrate it ?
A simple cp of the data directory can work ?

Thanks you

Vincent Chavelle


CommonsHttpSolrServer and dynamic custom results filtering

2011-01-31 Thread Dave Troiano
Hi,

I'm implementing custom dynamic results filtering to improve fuzzy /
phonetic search support in my search application.  I use the
CommonsHttpSolrServer object to connect remotely to Solr.  I would like to
be able to index multiple fuzzy / phonetic match encodings, e.g. one of the
packaged phonetic encodings, my own phonetic encoding, my own or a packaged
q-gram encoding that will capture string overlap, etc., and then be able to
filter out the results I consider "false positives" in a dynamic, custom
way.  The general approaches I've seen for this are:

1. Use Solr's fuzzy queries.  I haven't been able to achieve acceptable
performance using fuzzy queries, and also the fuzzy queries lack the dynamic
flexibility above.  e.g. whether or not I filter a phonetic match from
results may depend on a lot of things (whether or not there were exact
matches on relevant entities, who the user is, etc), and I can't achieve
this flexibility with a fuzzy field query.

2. Create an RMI-based client/server setup so that I can use the
SolrIndexSearcher to pass in a customer Collector (as in Ch. 9 of Lucene in
Action, but add in a custom Collector).  A custom Collector seems like
exactly what I want but I don't see a way to achieve this using any of the
packaged SolrServer implementations that support a remote setup like this.
I also worry a about the stability of the remote object framework since it's
been moved over to contrib and it seems that there may be serialization
issues or other instability
(http://lucene.472066.n3.nabble.com/extending-SolrIndexSearcher-td472809.htm
l).

3. Continue to use the CommonsHttpSolrServer object for querying my index,
but add in post-processing to dynamically filter results.  This seems doable
but unnatural and potentially inefficient given that I need to worry about
supporting pagination and facet counts in such a framework.

Is there an easier way to do custom dynamic results filtering (like via a
custom Collector) while still using CommonsHttpSolrServer?  Do people have
any other suggestions or insights about the approaches summarized above?

Thanks,
Dave



CUSTOM JSP FOR APACHE SOLR

2011-01-31 Thread JOHN JAIRO GÓMEZ LAVERDE


SOLR LUCENE
DEVELOPERS

Hi i am new to solr and i like to make a custom search page for enterprise users
in JSP that takes the results of Apache Solr.

- Where i can find some useful examples for that topic ?
- Is JSP the correct approach to solve mi requirement ?
- If not what is the best solution to build a customize search page for my 
users?

Thanks
from South America

JOHN JAIRO GOMEZ LAVERDE
Bogotá - Colombia
  

Re: Solr for noSQL

2011-01-31 Thread Upayavira


On Mon, 31 Jan 2011 08:40 -0500, "Estrada Groups"
 wrote:
> What are the advantages of using something like HBase over your standard
> Lucene index with Solr? It would seem to me like you'd be losing a lot of
> what Lucene has to offer!?!

I think Steven is saying that he has an indexer app that reads from
HBase and writes to a standard Solr by hitting its Rest API.

So, nothing funky, just a little app that reads from HBase and posts to
Solr.

Upayavira

> On Jan 31, 2011, at 5:34 AM, Steven Noels 
> wrote:
> 
> > On Fri, Jan 28, 2011 at 1:30 AM, Jianbin Dai  wrote:
> > 
> >> Hi,
> >> 
> >> 
> >> 
> >> Do we have data import handler to fast read in data from noSQL database,
> >> specifically, MongoDB I am thinking to use?
> >> 
> >> Or a more general question, how does Solr work with noSQL database?
> >> 
> > 
> > 
> > Can't say anything about MongoDB, but we have an integration of SOLR with
> > HBase inside Lily - www.lilyproject.org. It indeed uses the 'normal' SOLR
> > index update API rather than a DIH - as we had the need to have incremental
> > updates. The Indexer component we wrote does mapping from Lily/HBase schema
> > to SOLR, as we also felt the need that both schemas shouldn't necessarily be
> > identical.
> > 
> > Steven.
> > -- 
> > Steven Noels
> > http://outerthought.org/
> > Scalable Smart Data
> > Makers of Kauri, Daisy CMS and Lily
> 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source



Re: CUSTOM JSP FOR APACHE SOLR

2011-01-31 Thread Tomás Fernández Löbbe
Hi John, you can use whatever you want for building your application, using
Solr on the backend (JSP included). You should find all the information you
need on Solr's wiki page:
http://wiki.apache.org/solr/

including some client libraries to easy
integrate your application with Solr:
http://wiki.apache.org/solr/IntegratingSolr

for fast prototyping you could
use Velocity:
http://wiki.apache.org/solr/VelocityResponseWriter

Anyway, I recommend you
to start with Solr's tutorial:
http://lucene.apache.org/solr/tutorial.html


Good luck,
Tomás

2011/1/31 JOHN JAIRO GÓMEZ LAVERDE 

>
>
> SOLR LUCENE
> DEVELOPERS
>
> Hi i am new to solr and i like to make a custom search page for enterprise
> users
> in JSP that takes the results of Apache Solr.
>
> - Where i can find some useful examples for that topic ?
> - Is JSP the correct approach to solve mi requirement ?
> - If not what is the best solution to build a customize search page for my
> users?
>
> Thanks
> from South America
>
> JOHN JAIRO GOMEZ LAVERDE
> Bogotá - Colombia
>


Re: deleteById throwing SocketTimeoutException

2011-01-31 Thread Ravi Kiran
I copied the whole index from our production box (which was having the
delete issue) and put it on a test server and tried deleting docs and it
works The only difference between the production server and test server
is that production server keeps getting select queries from users pretty
much all the time and the test server does not. I am totally baffled why the
deletes hang on the production system.

I would really appreciate it if somebody could tell me the what happens
inside solr when a delete request is issued !!! My head is totally numb from
all the debugging. :-) BTW Iam using Solr 1.4.1

Ravi Kiran Bhaskar

On Fri, Jan 28, 2011 at 4:23 PM, Ravi Kiran  wrote:

> Hello,
> We have a core with about 900K docs. Recently I have noticed that
> the deleteById query seems to always give me a SocketTimeoutException(stack
> trace is shown below). I cannot figure out why only deletion fails but not
> add/update. The SOLR client instance is created via spring wiring
> (configuration given below). Did anybody face the same issue ? How can I
> solve this issue ? Increasing the timeout did not help.
>
> Configuration
> --
>  class="org.apache.solr.client.solrj.impl.CommonsHttpSolrServer">
> 
> http://:8080/solr-admin/core
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
>
> Code
> -
> String filename = delfile.getName();
> String id = filename.replace("_search.xml", "");
>
> log.debug("Deleting id " + id);
> UpdateResponse response = solrServer.deleteById(id);
> log.info("Deleting response for " + id + " is " +
> response);
>
> boolean success = Util.moveFile(delfile,
> delprocessedpath);
>
> /**
>  * Now delete old successfully processed files so that
> full reindex
>  * from processed and transformed folders will not
> process unwanted/deleted documents
>  */
> File transformedFile = new File(transformedpath +
> (filename.replace("_search.xml", "_fast.xml")));
> if(transformedFile.exists()) {
> log.info("Deleting archived Transformed file: " +
> transformedFile.getAbsolutePath());
> transformedFile.delete();
> }
>
> File processedFile = new File(processedpath+filename);
> if(processedFile.exists()) {
> log.info("Deleting archived Processed file: " +
> processedFile.getAbsolutePath());
> processedFile.delete();
> }
>
> Stack Trace
> --
> 2011-01-28 15:51:18,842-0500 ERROR
> [com.search.service.topics.feedprocessor.DeleteFeedProcessor] - Error
> deleting from Solr server for -AR2011011403385_search.xml
> org.apache.solr.client.solrj.SolrServerException:
> java.net.SocketTimeoutException: Read timed out
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> at
> org.apache.solr.client.solrj.SolrServer.deleteById(SolrServer.java:102)
> at
> com.search.service.topics.feedprocessor.DeleteFeedProcessor.processDelete(DeleteFeedProcessor.java:76)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:276)
> at
> org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:260)
> at
> org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
> at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
> at
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525)
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> at
> org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
> at
> org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
>  

Re: CUSTOM JSP FOR APACHE SOLR

2011-01-31 Thread Paul Libbrecht
Tomas,

I also know velocity can be used and works well.
I would be interested to a simpler way to have the objects of SOLR available in 
a jsp than write a custom jsp processor as a request handler; indeed, this 
seems to be the way solrj is expected to be used in the wiki page.

Actually I migrated to velocity (which I like less than jsp) just because I did 
not find a response to this question.

paul


Le 31 janv. 2011 à 21:53, Tomás Fernández Löbbe a écrit :

> Hi John, you can use whatever you want for building your application, using
> Solr on the backend (JSP included). You should find all the information you
> need on Solr's wiki page:
> http://wiki.apache.org/solr/
> 
> including some client libraries to easy
> integrate your application with Solr:
> http://wiki.apache.org/solr/IntegratingSolr
> 
> for fast prototyping you could
> use Velocity:
> http://wiki.apache.org/solr/VelocityResponseWriter
> 
> Anyway, I recommend you
> to start with Solr's tutorial:
> http://lucene.apache.org/solr/tutorial.html
> 
> 
> Good luck,
> Tomás
> 
> 2011/1/31 JOHN JAIRO GÓMEZ LAVERDE 
> 
>> 
>> 
>> SOLR LUCENE
>> DEVELOPERS
>> 
>> Hi i am new to solr and i like to make a custom search page for enterprise
>> users
>> in JSP that takes the results of Apache Solr.
>> 
>> - Where i can find some useful examples for that topic ?
>> - Is JSP the correct approach to solve mi requirement ?
>> - If not what is the best solution to build a customize search page for my
>> users?
>> 
>> Thanks
>> from South America
>> 
>> JOHN JAIRO GOMEZ LAVERDE
>> Bogotá - Colombia
>> 



phrase, inidividual term, prefix, fuzzy and stemming search

2011-01-31 Thread cyang2010

My current project has the requirement to support search when user inputs any
number of terms across a few index fields (movie title, actor, director).

In order to maximize result, I plan to support all those searches listed in
the subject, phrase, individual term, prefix, fuzzy and stemming.  Of
course, score relevance in the right order is also important.

I have considered using dismax query.  However, it does not support prefix
query.  I am not sure if it supports fuzzy query, my guess is does not.

Therefore, i still need to use standard query.   For example, if someone
searches "deim moer" (typo for demi moore), i compare the phrase and terms
with each searchable fields (title, actor, director):


title_display: "deim moer"~30 actors: "deim moer"~30 directors: "deim
moer"~30<--  OR

title_display: deim<-- OR
actors: deim 
directors: deim 

title_display: deim*   <-- OR
actors: deim* 
directors: deim* 

title_display: deim~0.6   <-- OR
actors: deim~0.6 
directors: deim~0.6 

title_display: moer<-- OR
actors: moer 
directors: moer 

title_display: moer*   <-- OR
actors: moer* 
directors: moer* 

title_display: moer~0.6<-- OR
actors: moer~0.6 
directors: moer~0.6

The solr relevance score is sum for all those OR.  In that way, i can make
sure relevance score are in order.  For example, for the exact match ("deim
moer"), it will match phrase, term, prefix and fuzzy query all at the same
time.   Therefore, it will score higher than some input text only matchs
term, or prefix or fuzzy. At the same time, i can apply boost to a
particular search field if requirement needs.


Does it sound right to you?  Is there better ways to achieve the same thing? 
My concern is my query is not going to perform, since it tries to do too
much.  But isn't that what people want to get (maximize result) when they
just type in a few search words?

Another question is that:  Can i combine the result of two query together? 
For example, first i query phrase and term match, next I query for prefix
match.  Can I just append the result for prefix match to that for
phrase/term match?   I thought two queries have different queryNorm,
therefore, the score is not comparable to each other so as to combine.  Is
it correct?


Thanks.  love to hear what your thought is.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/phrase-inidividual-term-prefix-fuzzy-and-stemming-search-tp239p239.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: one column indexed, the other isnt

2011-01-31 Thread PeterKerk

Haha, I KNOW that to be very true: "I have done everything correct, its this
stupid computer that doesnt understand me" ;)

Anyway:





The reason I'm  astonished the correct value isnt returned, is because the
correct KVK number IS returned.
So in this query: select KVK,varstatement FROM companies c INNER JOIN
aspnet_users au on au.companyid=c.id WHERE au.userid =
'${artist_owner.userid}'

I see that the correct KVM number is indexed, but the varstatement value
remains false even though in the DB it is true..

On top of that I have successfully used the boolean fieldtype for other
fields as well...


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/one-column-indexed-the-other-isnt-tp2389819p2392732.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: one column indexed, the other isnt

2011-01-31 Thread Erick Erickson
On a very quick test, it looks like every integer value except 1 is
converted to "false" (I haven't looked at the underlying code, but
this sure makes sense).

So my guess is that what's being sent to Solr isn't what you think,
that is the varstatement you get back is something other than 1. I
have no real clue how DIH would return a bit type, perhaps the actual
value transmitted to Solr is...er...different

So I'd look at two things:
1> Can you coerce the select statement to insure that an int is returned
for "varstatement".
2> Look at the DIH debug console to see what you can see. This is a
little-advertised page, see solr/admin/dataimport.jsp

Sorry I can't be more help..
Erick

On Mon, Jan 31, 2011 at 5:49 PM, PeterKerk  wrote:

>
> Haha, I KNOW that to be very true: "I have done everything correct, its
> this
> stupid computer that doesnt understand me" ;)
>
> Anyway:
>
>  omitNorms="true"/>
>
> 
>
> The reason I'm  astonished the correct value isnt returned, is because the
> correct KVK number IS returned.
> So in this query: select KVK,varstatement FROM companies c INNER JOIN
> aspnet_users au on au.companyid=c.id WHERE au.userid =
> '${artist_owner.userid}'
>
> I see that the correct KVM number is indexed, but the varstatement value
> remains false even though in the DB it is true..
>
> On top of that I have successfully used the boolean fieldtype for other
> fields as well...
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/one-column-indexed-the-other-isnt-tp2389819p2392732.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: match count per shard and across shards

2011-01-31 Thread Chris Hostetter

: Interesting idea. I must investigate if this is a possibility - eg. how often
: will a document be reindexed from one shard to another - this is actually a
: possibility as a consequence of the way we configure our shards :-/
: 
: Thanks for the input! I was still hoping for a way to get that info from
: Solr. The idea is the same: facet the Solr-shard position of each
: document... 

you could configure this field with a 'default' attribute in the 
schema.xml which is differnet per shard and then never worry about it -- 
whatever machine it indexed on it will get that vlaue.

managing the different schema.xml's might be a pain (does system property 
substitution work on schema.xml? i can't remember) but the same thing 
could be done with a simple little UpdateProcessor.


-Hoss


Re: UpdateHandler-Bug or intended feature?

2011-01-31 Thread Chris Hostetter

: Well, this does not seem to me like a bug but more like an exotic
: situation where two concepts collidate with eachother.
: The CSVRequestHandler is intended to sweep all the unneccessary stuff
: out of the input to avoid exceptions for unknown fields
: while my UpdateRequestProcessor needs such fields to work correctly.

Agreed, this is an interesting edge case ... i don't actaully see any 
reason why CSVRequestHandler needs the SchemaField for each field name -- 
all it ever seems to use it for is determining hte field name, so it would 
probably be easy to rip out.

i think even if CSVRequestHandler has some reason for wanting the 
SchemaField object, it should gracefully handle the case where it can't be 
found (there's a version of the method it calls that returns null instead 
of throwing an exception) and just passing the fieldname=val pairs into 
the SolrInputDocument for the UpdateProcessor to deal with -- if there 
really is a problem (and nothing ever removes/maps that field) the 
underlying "add" code will eventually fail with the same exception.

Please feel free to open a Jira issue for this -- it would help in 
particular if you could mention the gist of your usecase (why you include 
columns that don't map directly to fields and what your UpdateProcessor 
does with them) so people better understand the goal.

-Hoss


Re: Sending binary data as part of a query

2011-01-31 Thread Chris Hostetter

: I have successfully created a QueryComponent class that, assuming it
: has the integer bitset, can turn that into the necessary DocSetFilter
: to pass to the searcher, get back the facets, etc. That part all works
...
: What I'm unsure how to do is actually send this compressed bitset from
: a client to solr as part of the query. From what I can tell, the Solr
: API classes that are involved in handling binary data as part of a
: request assume that the data is a document to be added. For instance,
: extending ContentStreamHandlerBase requires implementing some kind of
: document loader and an UpdateRequestProcessorChain and a bunch of

that class should probably have been named ContentStreamUpdateHandlerBase 
or something like that -- it tries to encapsulate the logic that most 
RequestHandlers using COntentStreams (for updating) need to worry about.

Your QueryComponent (as used by SearchHandler) should be able to access 
the ContentStreams the same way that class does ... call 
req.getContentStreams().

Sending a binary stream from a remote client depends on how the client is 
implemented -- you can do it via HTTP using the POST body (with or w/o 
multi-part mime) in any langauge you want. If you are using SolrJ you may 
again run into an assumption that using ContentStreams means you are doing 
an "Update" but that's just a vernacular thing ... something like a 
ContentStreamUpdateRequest should work just as well for a query (as long 
as you set the neccessary params and/or request handler path)


-Hoss


Re: nested faceting ?

2011-01-31 Thread Chris Hostetter

: I am loading devices and accessories in solr index. deviceType indicates if
: its a device or accessory
: 
: All other attributes are same for device and accessory. When query results
: come back I would like to display someting like
: 
: Devices
: +Manucaturer (100)
:   - Samsung (50)
:   - Sharp (50)
: Accessories
: +Manufacturer(1000)
:  -Samsung (500)
:  -Apple(500)
: 
: How would my query look like in this case? 

what you are describing sounds like Pivot Faceting, which is currently 
under development...

https://issues.apache.org/jira/browse/SOLR-792

...the majority of it has been commited to the trunk, but the issue is 
still open as some of the kinks are worked out.

-Hoss


Re: Migration from Solr 1.2 to Solr 1.4

2011-01-31 Thread Chris Hostetter


: I have huge numbers of data indexed in solr and I would know the best way to
: migrate it ?
: A simple cp of the data directory can work ?

if you don't have any custom components, you can probably just use 
your entire solr home dir as is -- just change the solr.war.  (you can't 
just copy the data dir though, you need to use the same configs)

test it out, and note the "Upgrading" notes in the CHANGES.txt for the 
1.3, 1.4, and 1.4.1 releases for "gotchas" that you might wnat to watch 
out for.

-Hoss


Re: Http Connection is hanging while deleteByQuery

2011-01-31 Thread Ravi Kiran
Hello Shan,
 I was able to delete without hanging by making the
following changes to the solrconfig.xml in the mainIndex section and
reloading the core. BTW Iam using 1.4.1...Hope you get your deletes working
as well. Let us know if it works for you or if you find any other solution

 
 true



1

0
 


Ravi Kiran Bhaskar


On Mon, Jan 31, 2011 at 6:19 AM, shan2812  wrote:

>
> This is the log trace..
>
> 2011-01-31 10:07:18,837 ERROR (main)[SearchBusinessControllerImpl] Solr
> connecting to url: http://10.145.10.154:8081/solr
> 2011-01-31 10:07:18,873 DEBUG (main)[DefaultHttpParams] Set parameter
> http.useragent = Jakarta Commons-HttpClient/3.1
> 2011-01-31 10:07:18,880 DEBUG (main)[DefaultHttpParams] Set parameter
> http.protocol.version = HTTP/1.1
> 2011-01-31 10:07:18,881 DEBUG (main)[DefaultHttpParams] Set parameter
> http.connection-manager.class = class
> org.apache.commons.httpclient.SimpleHttpConnectionManager
> 2011-01-31 10:07:18,881 DEBUG (main)[DefaultHttpParams] Set parameter
> http.protocol.cookie-policy = default
> 2011-01-31 10:07:18,881 DEBUG (main)[DefaultHttpParams] Set parameter
> http.protocol.element-charset = US-ASCII
> 2011-01-31 10:07:18,881 DEBUG (main)[DefaultHttpParams] Set parameter
> http.protocol.content-charset = ISO-8859-1
> 2011-01-31 10:07:18,881 DEBUG (main)[DefaultHttpParams] Set parameter
> http.method.retry-handler =
> org.apache.commons.httpclient.DefaultHttpMethodRetryHandler@15299647
> 2011-01-31 10:07:18,882 DEBUG (main)[DefaultHttpParams] Set parameter
> http.dateparser.patterns = [EEE, dd MMM  HH:mm:ss zzz, , dd-MMM-yy
> HH:mm:ss zzz, EEE MMM d HH:mm:ss , EEE, dd-MMM- HH:mm:ss z, EEE,
> dd-MMM- HH-mm-ss z, EEE, dd MMM yy HH:mm:ss z, EEE dd-MMM- HH:mm:ss
> z, EEE dd MMM  HH:mm:ss z, EEE dd-MMM- HH-mm-ss z, EEE dd-MMM-yy
> HH:mm:ss z, EEE dd MMM yy HH:mm:ss z, EEE,dd-MMM-yy HH:mm:ss z,
> EEE,dd-MMM- HH:mm:ss z, EEE, dd-MM- HH:mm:ss z]
> 2011-01-31 10:07:18,882 DEBUG (main)[DefaultHttpParams] Set parameter
> http.connection.timeout = 1
> 2011-01-31 10:07:18,882 DEBUG (main)[DefaultHttpParams] Set parameter
> http.connection-manager.max-total = 10
> 2011-01-31 10:07:18,883 DEBUG (main)[HttpClient] Java version: 1.5.0_22
> 2011-01-31 10:07:18,883 DEBUG (main)[HttpClient] Java vendor: Sun
> Microsystems Inc.
> 2011-01-31 10:07:18,883 DEBUG (main)[HttpClient] Java class path:
>
> :/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/activation-1.0.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/apache-solr-cell-1.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/apache-solr-clustering-1.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/apache-solr-core-1.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/apache-solr-dataimporthandler-1.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/apache-solr-dataimporthandler-extras-1.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/apache-solr-solrj-1.4.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/castor-1.0.5-xml.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-cache-0.1-dev.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-codec-1.3.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-collections-3.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-configuration-1.3.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-dbcp-1.2.1.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-dbutils-1.0.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-fileupload-1.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-httpclient-3.1.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-lang-2.1.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-logging-1.0.4.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/commons-pool-1.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/dom4j-1.6.1.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/google-api-translate-java-0.6.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/jakarta-oro-2.0.6.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/log4j-1.2.15.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/mail-1.3.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/mysql-connector-java-5.0.7-bin.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/netrics-likeit-4.1.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/ojdbc14-10.2.0.3.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/oro-2.0.8.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/slf4j-api-1.4.2.jar:/tui/cmd/test/cmdadapter/bulk-batchjob-loader/program/lib/slf4j-log4j12-1.4.2.jar:/tui/cmd/test/cmdadapter/bulk

Re: Solr for noSQL

2011-01-31 Thread Steven Noels
On Mon, Jan 31, 2011 at 9:38 PM, Upayavira  wrote:

>
>
> On Mon, 31 Jan 2011 08:40 -0500, "Estrada Groups"
>  wrote:
> > What are the advantages of using something like HBase over your standard
> > Lucene index with Solr? It would seem to me like you'd be losing a lot of
> > what Lucene has to offer!?!
>
> I think Steven is saying that he has an indexer app that reads from
> HBase and writes to a standard Solr by hitting its Rest API.
>
> So, nothing funky, just a little app that reads from HBase and posts to
> Solr.
>


We're doing something like offering a relational-database-like experience
(i.e. a schema language, storing typed data instead of byte[]s, secondary
indexing facilities), with some content management features (versioning,
blob storage), combined with SOLR as a search index (with mapping between
our schema and that of SOLR), the index being maintained incrementally and
through map/reduce (for reindexing). We keep multiple versions of the index
if you want, with state management and we do text extraction with Tika. All
this happens fully distributed, so you can play with different boxes serving
as HBase datanode, or index feeder, SOLR search node, etc etc.

All that sits behind a Java API that uses Avro underneath, and a REST
interface as well (searches go directly to SOLR). For future versions, we
will integrate a recommendation engine and some analytics tools as well.

So yes, we do more (or rather: different things) than what Lucene/SOLR does,
as we offer a full-featured data storage environment, stuffing your data in
HBase (which scales better than MySQL), and make it searchable through SOLR.

The 'funky app' you're referring at now sits at about 3 manyears of fulltime
development, BTW. ;-)

Steven.
-- 
Steven Noels
http://outerthought.org/
Scalable Smart Data
Makers of Kauri, Daisy CMS and Lily


chaning schema

2011-01-31 Thread Dennis Gearon
Anyone got a great little script for changing a schema?

i.e., after changing:
  database,
  the view in the database for data import
  the data-config.xml file
  the schema.xml file

I BELIEVE that I have to run:
  a delete command for the whole index *:*
  a full import and optimize

This all sound right?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



Terms and termscomponent questions

2011-01-31 Thread openvictor Open
Dear Solr users,

I am currently using SolR and TermsComponents to make an auto suggest for my
website.

I have a field called p_field indexed and stored with type="text" in the
schema xml. Nothing out of the usual.
I feed to Solr a set of words separated by a coma and a space such as (for
two documents) :

Document 1:
word11, word12, word13. word14

Document 2:
word21, word22, word23. word24


When I use my newly designed field I get things for the prefix "word1" :
word11, word12, word13. word14 word11word12 word11word13 etc...
Is it normal to have the concatenation of words and not only the words
indexed ? Did I miss something about Terms ?

Thank you very much,
Best regards all,
Victor


Re: UpdateHandler-Bug or intended feature?

2011-01-31 Thread Em

Hi Hoss,

actually I thought this would be neccessary for the SolrInputDocument to map
against a special FieldType, but this isn't true. The mapping comes
sometimes after the UpdateProcessor finished its work.
So yes, there is no reason to force the CSVRequestHandler to throw an
Exception if the field does not exist.

I will register at the Jira and open an issue for that today.

Regards


Chris Hostetter-3 wrote:
> 
> 
> : Well, this does not seem to me like a bug but more like an exotic
> : situation where two concepts collidate with eachother.
> : The CSVRequestHandler is intended to sweep all the unneccessary stuff
> : out of the input to avoid exceptions for unknown fields
> : while my UpdateRequestProcessor needs such fields to work correctly.
> 
> Agreed, this is an interesting edge case ... i don't actaully see any 
> reason why CSVRequestHandler needs the SchemaField for each field name -- 
> all it ever seems to use it for is determining hte field name, so it would 
> probably be easy to rip out.
> 
> i think even if CSVRequestHandler has some reason for wanting the 
> SchemaField object, it should gracefully handle the case where it can't be 
> found (there's a version of the method it calls that returns null instead 
> of throwing an exception) and just passing the fieldname=val pairs into 
> the SolrInputDocument for the UpdateProcessor to deal with -- if there 
> really is a problem (and nothing ever removes/maps that field) the 
> underlying "add" code will eventually fail with the same exception.
> 
> Please feel free to open a Jira issue for this -- it would help in 
> particular if you could mention the gist of your usecase (why you include 
> columns that don't map directly to fields and what your UpdateProcessor 
> does with them) so people better understand the goal.
> 
> -Hoss
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/UpdateHandler-Bug-or-intended-feature-tp2389382p2395656.html
Sent from the Solr - User mailing list archive at Nabble.com.