Re: DIH problem with multiple (types of) resources

2016-11-15 Thread Peter Blokland
hi,

On Tue, Nov 15, 2016 at 02:54:49AM +1100, Alexandre Rafalovitch wrote:

>> 
>> 
 
> Attribute names are case sensitive as far as I remember. Try
> 'dataSource' for the second definition.

oh wow... that's sneaky. in the old version the case didn't seem to matter,
but now it certainly does. thx :)

-- 
CUL8R, Peter.

www.desk.nl

Your excuse is: It is a layer 8 problem


DIH problem with multiple (types of) resources

2016-11-14 Thread Peter Blokland
hi,

I'm porting an old data-import configuratie from 4.x to 6.3.0. a minimal config
is this :


  

  

  


  http://site/nl/${page.pid}"; format="text">

  



  


  




when I try to do a full import with this, I get :

2016-11-14 12:31:52.173 INFO  (Thread-68) [   x:meulboek] 
o.a.s.u.p.LogUpdateProcessorFactory [meulboek]  webapp=/solr path=/dataimport 
params={core=meulboek&optimize=false&indent=on&commit=true&clean=true&wt=json&command=full-import&_=1479122291861&verbose=true}
 status=0 QTime=11{deleteByQuery=*:* 
(-1550976769832517632),add=[ed99517c-ece9-40c6-9682-c9ec74173241 
(1550976769976172544), 9283532a-2395-43eb-bcb8-fd30c5ebfd08 
(1550976770348417024), 87b75d5c-a12a-4538-bc29-ceb13d6a9d1c 
(1550976770455371776), 476b5da3-3752-4867-bdb3-4264403c5c2d 
(1550976770787770368), 71cdaadb-62ba-4753-ad1b-01ba7fd75bfa 
(1550976770875850752), 02f41269-4a28-4001-aab9-7b1feb51e332 
(1550976770954493952), 6216ec48-2abd-465b-8d6b-60907c7f49db 
(1550976771047817216), 4317b308-dc88-47e1-9240-0d7d94646de6 
(1550976771136946176), 159ee092-2f72-45f6-970e-9dfd6d635bdf 
(1550976771221880832), bdfa48c4-23e2-483f-9b63-e0c5753d60a5 
(1550976771336175616)]} 0 1465
2016-11-14 12:31:52.173 ERROR (Thread-68) [   x:meulboek] 
o.a.s.h.d.DataImporter Full Import failed:java.lang.RuntimeException: 
java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in 
invoking url null Processing Document # 11
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:475)
at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:458)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in 
invoking url null Processing Document # 11
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
... 4 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Exception in invoking url null Processing Document # 11
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at 
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:89)
at 
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:38)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:244)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 6 more
Caused by: java.net.MalformedURLException: no protocol: nullselect edition from 
editions
at java.net.URL.(URL.java:593)
at java.net.URL.(URL.java:490)
at java.net.URL.(URL.java:439)
at 
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:81)
... 12 more


note that this failure occurrs with the second entity, and judging from this
line :

Caused by: java.net.MalformedURLException: no protocol: nullselect edition from 
editions

it seems solr tries to use the datasource named "web" (the BinURLDataSource)
instead of the configured "db" datasource (the JdbcDataSource). am I doing
something wrong, or is this a bug ? 

-- 
CUL8R, Peter.

www.desk.nl

Your excuse is: Communist revolutionaries taking over the server room and 
demanding all the computers in the building or they shoot the sysadmin. Poor 
misguided fools.


Re: ranged and boolean query

2010-11-23 Thread Peter Blokland
hi,

On Wed, Nov 17, 2010 at 04:39:00PM +0100, Peter Blokland wrote:

> i'm using solr and am trying to limit my resultset to documents
> that either have a publication date in the range * to now, or
> have no publication date set at all (field is not present). 
> however, using this :
> 
> (pubdate:[* TO NOW]) OR ( NOT pubdate:*)
> 
> gives me only the documents in the range * to now (reversing the
> two clauses has no effect). 

answering my own question : the above expresseion was a filter-query,
where the main query was (e.g.)

type:page

when only using the left-hand expression, this evaluates to

type:page NOT pubdate:*

which is a valid query. however, using the full expression seems to
make lucene evaluate 

NOT pubdate:*

as a query, which is not legal, and returns an empty result. so, re-
writing the filter-query as 

(type:page AND pubdate:[* TO NOW]) OR (type:page NOT pubdate:*)

solved my problem... took me long enough...

-- 
CUL8R, Peter.

www.desk.nl --- Sent from my NetBSD-powered Talkie Toaster™


Re: ranged and boolean query

2010-11-18 Thread Peter Blokland
hi,

On Wed, Nov 17, 2010 at 05:00:04PM +0100, Peter Blokland wrote:
 
>>> pubdate:([* TO NOW] OR (NOT *))

i've gone back to the examples provided with solr 1.4.1. the
standard example has 19 documents, one of which has a date-field
called 'incubationdate_dt'. so the query 

incubationdate_dt:[* TO NOW]

is expected to return 1 document, which it does. the query

-incubationdate_dt:* 

is expected to return 18 documents, which it does. however,

incubationdate_dt:[* TO NOW] (-incubationdate_dt:*)

which should (imho) return all 19 documents just returns the
one document that has such a field.

can anyone confirm whether or not this is expected behavior, and
if so, why ?

-- 
CUL8R, Peter.

www.desk.nl --- Sent from my NetBSD-powered Talkie Toaster™


Re: ranged and boolean query

2010-11-17 Thread Peter Blokland
hi,

On Wed, Nov 17, 2010 at 10:54:48AM -0500, Ken Stanley wrote:

> > pubdate:([* TO NOW] OR (NOT *))
 
> Instead of using NOT, try simply prefixing the field name with a minus
> sign. This tells SOLR to exclude the field. Otherwise, the word NOT
> would be treated as a term, and would be applied against your default
> field (which may or may not affect your results). So instead of
> (pubdate:[* TO NOW]) OR ( NOT pubdate:*), you would write (pubdate:[*
> TO NOW]) OR ( -pubdate:*).

tried that, it gives me exactly the same result... I can't really
figure out what's going on.

-- 
CUL8R, Peter.

www.desk.nl --- Sent from my NetBSD-powered Talkie Toaster™


ranged and boolean query

2010-11-17 Thread Peter Blokland
hi.

i'm using solr and am trying to limit my resultset to documents
that either have a publication date in the range * to now, or
have no publication date set at all (field is not present). 
however, using this :

(pubdate:[* TO NOW]) OR ( NOT pubdate:*)

gives me only the documents in the range * to now (reversing the
two clauses has no effect). using only 

NOT pubdate:*

gives me the correct set of documents (those not having a pubddate).
any reason the OR does not work in this case ?

ps: also tried it like this :

pubdate:([* TO NOW] OR (NOT *))

which gives the same result.


-- 
CUL8R, Peter.

www.desk.nl --- Sent from my NetBSD-powered Talkie Toaster™


Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?

2010-10-12 Thread Peter Blokland
hi,

On Mon, Oct 11, 2010 at 01:03:07AM -0400, Israel Ekpo wrote:
 
> If you are using Solr via PHP and would like to see any new features in the
> extension please feel free to send me a note.

I'm currently testing a setup with Solr via PHP, and was wondering if
support for the ExtractingRequestHandler is planned ? It may be that I 
missed something in the documentation, but for now it looks like I need
to build my own POST's to the /solr/update/extract handler.

-- 
CUL8R, Peter.

www.desk.nl --- Sent from my NetBSD-powered Talkie Toaster™


TikaEntityProcessor and metadata

2010-10-07 Thread Peter Blokland
hi,

I'm using Solr to index document both through a combination of
DataImportHandler/TikaEntityProcessor and Solr's ExtractingRequestHandler.
The latter gives me the option of dynamically mapping metadata to
fields using "uprefix='attr_'" in the configuration. Is it possible
to do the same thing from DIH _without_ exhaustively mapping all
(possible) fields myself ?

-- 
CUL8R, Peter.

www.desk.nl --- Sent from my NetBSD-powered Talkie Toaster™