Re: ExtractingRequestHandler and Solr 3.1

2011-04-14 Thread Liam O'Boyle
Hi Grant,

After comparing the differences between my solrconfig.xml and that used by
the example, the key difference is that I didn't have str
name=captureAttrtrue/str in the defaults for the ERH.  Commenting out
this line in the example configuration causes the example to display the
same behaviour as I'm seeing.

I've added the option back in and it all works as expected, but seem to be a
change in the configuration.  I didn't have captureAttr enabled because I
don't have it enabled in my 1.4 production environment (I'm just checking
the upgrade process at the moment) and this problem doesn't happen for me
there.  Is the change deliberate?

Thanks,
Liam

On 13 April 2011 23:25, Grant Ingersoll grant.ingers...@gmail.com wrote:


 On Apr 13, 2011, at 12:06 AM, Liam O'Boyle wrote:

  Afternoon,
 
  After an upgrade to Solr 3.1 which has largely been very smooth and
  painless, I'm having a minor issue with the ExtractingRequestHandler.
 
  The problem is that it's inserting metadata into the extracted
  content, as well as mapping it to a dynamic field.  Previously the
  same configuration only mapped it to a dynamic field and I'm not sure
  how it's managing to add it into my content as well.
 
  The requestHandler configuration is as follows
 
  requestHandler name=/update/extract
   startup=lazy
   class=solr.extraction.ExtractingRequestHandler 
  lst name=defaults
   !-- All the main content goes into text... if you need to return
the extracted text or do highlighting, use a stored field.
  --
  str name=fmap.content_typeattr_source_content_type/str
  str name=lowernamestrue/str
  str name=uprefixignored_/str
  /lst
  /requestHandler
 
  The schema has a dynamic field for attr_*, dynamicField name=attr_*
  type=textgen indexed=true stored=true multiValued=true /.
 
  The request being submitted is (reformatted for readability, extracted
  from the catalina log)
 
  literal.ib_extension=blarg
  literal.ib_date=2010-09-09T21:41:30Z
  literal.ib_custom2=custom2
  resource.name=test.txt
  literal.ib_custom3=custom3
  literal.ib_authorid=1
  literal.ib_custom1=custom1
  literal.ib_custom6=custom6
  literal.ib_custom7=custom7
  literal.ib_custom4=custom4
  literal.ib_linkid=1
  literal.ib_custom5=custom5
  literal.ib_tags=foo
  literal.ib_tags=bar
  literal.ib_tags=blarg
  commit=true
  literal.ib_permissionid=1
  literal.ib_filters=1
  literal.ib_filters=2
  literal.ib_filters=3
  literal.ib_description=My+Description
  literal.ib_title=My+Title
  json.nl=map
  wt=json
  literal.ib_realid=1
  literal.ib_custom9=custom9
  literal.ib_id=fb1
  fmap.content=ib_content
  literal.ib_custom8=custom8
  literal.ib_type=foobar
  uprefix=attr_
  literal.ib_clientid=1
 
  After indexing, the ib_content field contains the contents of the
  file, prefixed with stream_content_type application/octet-stream
  stream_size 971 Content-Encoding UTF-8 Content-Type text/plain
  resourceName test.txt.  These have all been mapped to the dynamic
  field, so I have attr_content_encoding, attr_source_content_type,
  attr_stream_content_type and attr_stream_size all with their correct
  values as well.
 
  There are no copyField parameters to add content from attr_* fields
  into anything else and I've had no luck tracking down where this is
  coming from.  Has there been some option added which controls this
  behaviour?



 I'm not aware of anything changing here, other than we upgraded Tika.  Can
 you isolate the problem and share the test?  I tried it on trunk (I can get
 3.1.0 if needed, but they should be the same in regards to the ERH) using
 the examples on the http://wiki.apache.org/solr/ExtractingRequestHandlerpage 
 and I don't see the behavior.

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem docs using Solr/Lucene:
 http://www.lucidimagination.com/search




-- 
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

*Awarded 2010 Best New Business and Business of the Year - Business3000
Awards*

This email and any attachments are confidential and may contain legally
privileged information or copyright material. If you are not an intended
recipient, please contact us at once by return email and then delete both
messages. We do not accept liability in connection with transmission of
information using the internet.


ExtractingRequestHandler and Solr 3.1

2011-04-13 Thread Liam O'Boyle
Afternoon,

After an upgrade to Solr 3.1 which has largely been very smooth and
painless, I'm having a minor issue with the ExtractingRequestHandler.

The problem is that it's inserting metadata into the extracted
content, as well as mapping it to a dynamic field.  Previously the
same configuration only mapped it to a dynamic field and I'm not sure
how it's managing to add it into my content as well.

The requestHandler configuration is as follows

  requestHandler name=/update/extract
  startup=lazy
  class=solr.extraction.ExtractingRequestHandler 
lst name=defaults
  !-- All the main content goes into text... if you need to return
   the extracted text or do highlighting, use a stored field.
--
 str name=fmap.content_typeattr_source_content_type/str
 str name=lowernamestrue/str
 str name=uprefixignored_/str
/lst
  /requestHandler

The schema has a dynamic field for attr_*, dynamicField name=attr_*
type=textgen indexed=true stored=true multiValued=true /.

The request being submitted is (reformatted for readability, extracted
from the catalina log)

literal.ib_extension=blarg
literal.ib_date=2010-09-09T21:41:30Z
literal.ib_custom2=custom2
resource.name=test.txt
literal.ib_custom3=custom3
literal.ib_authorid=1
literal.ib_custom1=custom1
literal.ib_custom6=custom6
literal.ib_custom7=custom7
literal.ib_custom4=custom4
literal.ib_linkid=1
literal.ib_custom5=custom5
literal.ib_tags=foo
literal.ib_tags=bar
literal.ib_tags=blarg
commit=true
literal.ib_permissionid=1
literal.ib_filters=1
literal.ib_filters=2
literal.ib_filters=3
literal.ib_description=My+Description
literal.ib_title=My+Title
json.nl=map
wt=json
literal.ib_realid=1
literal.ib_custom9=custom9
literal.ib_id=fb1
fmap.content=ib_content
literal.ib_custom8=custom8
literal.ib_type=foobar
uprefix=attr_
literal.ib_clientid=1

After indexing, the ib_content field contains the contents of the
file, prefixed with stream_content_type application/octet-stream
stream_size 971 Content-Encoding UTF-8 Content-Type text/plain
resourceName test.txt.  These have all been mapped to the dynamic
field, so I have attr_content_encoding, attr_source_content_type,
attr_stream_content_type and attr_stream_size all with their correct
values as well.

There are no copyField parameters to add content from attr_* fields
into anything else and I've had no luck tracking down where this is
coming from.  Has there been some option added which controls this
behaviour?

Cheers,
Liam


Re: Solr and Permissions

2011-04-12 Thread Liam O'Boyle
ManifoldCF sounds like it might be the right solution, so long as it's
not secretly building a filter query in the back end, otherwise it
will hit the same limits.

In the meantime, I have made a minor improvement to my filter query;
it now scans the permitted IDs and attempts to build a filter query
using ranges (e.g. instead of 1 OR 2 OR 3 it will filter using [1 TO
3]) which will hopefully keep me going in the meantime.

Liam

On 12 March 2011 01:46, go canal goca...@yahoo.com wrote:
 Thank you Jan, I will take a look at the MainfoldCF.
 So it seems that the solution is basically to implement something outside of
 Solr for permission control.
 thanks,
 canal




 
 From: Jan Høydahl jan@cominvent.com
 To: solr-user@lucene.apache.org
 Sent: Fri, March 11, 2011 4:17:22 PM
 Subject: Re: Solr and Permissions

 Hi,

 Talk to the ManifoldCF guys - they have successfully implemented support for
 document level security for many repositories including CMC/ECMs and may have
 some hints for you to write your own Authority connector against your system,
 which will fetch the ACL for the document and index it with the document 
 itself.
 This eliminates long query-time filters.

 Re-indexing content for which ACLs have changed is a very common way of doing
 this, and you should not worry too much about performance implications before
 there is a real issue. In real world, you don't change folder permissions very
 often, and that will be a cost you'll have to live with. If you worry that 
 this
 lag between repository state and index state may cause people to see content
 they are not entitled to, it is possible to do late binding filtering of the
 result set as well, but I would avoid that if possible.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 On 11. mars 2011, at 06.48, go canal wrote:

 To be fair, I think there is a slight difference between a Content Management
 and a Search Engine.

 Access control at per document level, per type level, supporting dynamic role
 changes, etc.are more like  content management use cases; where search 
 solution

 like Solr focuses on different set of use cases;

 But in real world, any content management systems need full text search; so 
 the

 question is to how to support search with permission control.

 JackRabbit integrated with Lucene/Tika, this could be one solution but I do 
 not

 know its performance and scalability;

 CouchDB also integrates with Lucene/Tika, another option?

 I have yet to see a Search Engine that provides some sort of Content 
 Management

 features like we are discussing here (Solr, Elastic Search ?)


 Then the last option is probably to build an application that works with a
 document repository with all necessary content management features and Solr
 which provides search capability;  and handling the permissions outside Solr?
 thanks,
 canal




 
 From: Liam O'Boyle liam.obo...@intelligencebank.com
 To: solr-user@lucene.apache.org
 Cc: go canal goca...@yahoo.com
 Sent: Fri, March 11, 2011 2:28:19 PM
 Subject: Re: Solr and Permissions

 As Canal points out,  grouping into types is not always possible.

 In our case, permissions are not on a per-type level, but either on a per
 folder (of which there can be hundreds) or per item in some cases (of
 which there can be... any number at all).

 Reindexing is also to slow to really be an option; some of the items use
 Tika to extract content, which means that we need to reextract the content
 (variable length of time; average is about half a second, but on some
 documents it will sit there until the connection times out) .  Querying it,
 modifying then resubmitting without rerunning content extraction is still
 faster, but involves sending even more data over the network; either way is
 relatively slow.

 Liam

 On 11 March 2011 16:24, go canal goca...@yahoo.com wrote:

 I have similar requirements.

 Content type is one solution; but there are also other use cases where this
 not
 enough.

 Another requirement is, when the access permission is changed, we need to
 update
 the field - my understanding is we can not unless re-index the whole
 document
 again. Am I correct?
 thanks,
 canal




 
 From: Sujit Pal sujit@comcast.net
 To: solr-user@lucene.apache.org
 Sent: Fri, March 11, 2011 10:39:27 AM
 Subject: Re: Solr and Permissions

 How about assigning content types to documents in the index, and map
 users to a set of content types they are allowed to access? That way you
 will pass in fewer parameters in the fq.

 -sujit

 On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
 Morning,

 We use solr to index a range of content to which, within our application,
 access is restricted by a system of user groups and permissions.  In
 order
 to ensure that search results don't reveal information about items which
 the
 user doesn't have access to, we need

Re: New PHP API for Solr (Logic Solr API)

2011-03-10 Thread Liam O'Boyle
How about the Solr PHP Client (http://code.google.com/p/solr-php-client/)?
 We use this and have been quite happy with it, and it seems that it
addresses all of the concerns you expressed.

What advantages does yours offer?

Liam

On 8 March 2011 17:02, Burak burak...@gmail.com wrote:

 On 03/07/2011 12:43 AM, Stefan Matheis wrote:

 Burak,

 what's wrong with the existing PHP-Extension
 (http://php.net/manual/en/book.solr.php)?

 I think wrong is not the appropriate word here. But if I had to summarize
 why I wrote this API:

 * Not everybody is enthusiastic about adding another item to an already
 long list of server dependencies. I just wanted a pure PHP option.
 * I am not a C programmer either so the ability to understand the source
 code and modify it according to my needs is another advantage.
 * Yes, a PECL package would be faster. However, in 99% of the cases, after
 everything is said, coded, and byte-code cached, my biggest bottlenecks end
 up being the database and network.
 * Last of all, choice is what open source means to me.

 Burak











-- 
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

*Awarded 2010 Best New Business and Business of the Year - Business3000
Awards*

This email and any attachments are confidential and may contain legally
privileged information or copyright material. If you are not an intended
recipient, please contact us at once by return email and then delete both
messages. We do not accept liability in connection with transmission of
information using the internet.


Solr and Permissions

2011-03-10 Thread Liam O'Boyle
Morning,

We use solr to index a range of content to which, within our application,
access is restricted by a system of user groups and permissions.  In order
to ensure that search results don't reveal information about items which the
user doesn't have access to, we need to somehow filter the results; this
needs to be done within Solr itself, rather than after retrieval, so that
the facet and result counts are correct.

Currently we do this by creating a filter query which specifies all of the
items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)),
but this has definite scalability issues - we're starting to run into
issues, as this can be a set of ORs of potentially unlimited size (and
practically, we're hitting the low thousands sometimes).  While we can
adjust maxBooleanClauses upwards, I understand that this has performance
implications...

So, has anyone had to implement something similar in the past?  Any
suggestions for a more scalable approach?  Any advice on safe and sensible
limits on how far I can push maxBooleanClauses?

Thanks for your advice,

Liam


Re: Solr and Permissions

2011-03-10 Thread Liam O'Boyle
As Canal points out,  grouping into types is not always possible.

In our case, permissions are not on a per-type level, but either on a per
folder (of which there can be hundreds) or per item in some cases (of
which there can be... any number at all).

Reindexing is also to slow to really be an option; some of the items use
Tika to extract content, which means that we need to reextract the content
(variable length of time; average is about half a second, but on some
documents it will sit there until the connection times out) .  Querying it,
modifying then resubmitting without rerunning content extraction is still
faster, but involves sending even more data over the network; either way is
relatively slow.

Liam

On 11 March 2011 16:24, go canal goca...@yahoo.com wrote:

 I have similar requirements.

 Content type is one solution; but there are also other use cases where this
 not
 enough.

 Another requirement is, when the access permission is changed, we need to
 update
 the field - my understanding is we can not unless re-index the whole
 document
 again. Am I correct?
  thanks,
 canal




 
 From: Sujit Pal sujit@comcast.net
 To: solr-user@lucene.apache.org
 Sent: Fri, March 11, 2011 10:39:27 AM
 Subject: Re: Solr and Permissions

 How about assigning content types to documents in the index, and map
 users to a set of content types they are allowed to access? That way you
 will pass in fewer parameters in the fq.

 -sujit

 On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
  Morning,
 
  We use solr to index a range of content to which, within our application,
  access is restricted by a system of user groups and permissions.  In
 order
  to ensure that search results don't reveal information about items which
 the
  user doesn't have access to, we need to somehow filter the results; this
  needs to be done within Solr itself, rather than after retrieval, so that
  the facet and result counts are correct.
 
  Currently we do this by creating a filter query which specifies all of
 the
  items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
 ...)),
  but this has definite scalability issues - we're starting to run into
  issues, as this can be a set of ORs of potentially unlimited size (and
  practically, we're hitting the low thousands sometimes).  While we can
  adjust maxBooleanClauses upwards, I understand that this has performance
  implications...
 
  So, has anyone had to implement something similar in the past?  Any
  suggestions for a more scalable approach?  Any advice on safe and
 sensible
  limits on how far I can push maxBooleanClauses?
 
  Thanks for your advice,
 
  Liam







-- 
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

*Awarded 2010 Best New Business and Business of the Year - Business3000
Awards*

This email and any attachments are confidential and may contain legally
privileged information or copyright material. If you are not an intended
recipient, please contact us at once by return email and then delete both
messages. We do not accept liability in connection with transmission of
information using the internet.


Re: How to Update Value of One Field of a Document in Index?

2010-09-10 Thread Liam O'Boyle
Hi Savannah,

You can only reindex the entire document; if you only have the ID,
then do a search to retrieve the rest of the data, then reindex.  This
assumes that all of the fields you need to index are stored (so that
you can retrieve them) and not just indexed.

Liam

On Fri, Sep 10, 2010 at 3:29 PM, Savannah Beckett
savannah_becket...@yahoo.com wrote:

 I use nutch to crawl and index to Solr.  My code is working.  Now, I want to
 update the value of one of the fields of a document in the solr index after 
 the
 document was already indexed, and I have only the document id.  How do I do
 that?

 Thanks.





Date faceting +1MONTH problem

2010-09-09 Thread Liam O'Boyle
Evening,

I'm trying to break down the data over a year into facets by month; to avoid
overlap, I'm using -1MILLI on the start and end dates and using a gap of
+1MONTH.

However, it seems like February completely breaks my monthly cycles, leading
to incorrect counts further down the line; facets that are after February
only go to the 28th of the month, and items in the other two or three days
get pushed into the next facet.  What's the correct way to do this?

An example is shown below, the facet periods go 2008-12-31, 2009-01-31,
2009-02-28 and then from then on only hit 28.

[2008-12-31T23:59:59.999Z] = 0
[2009-01-31T23:59:59.999Z] = 0
[2009-02-28T23:59:59.999Z] = 0
[2009-03-28T23:59:59.999Z] = 0
[2009-04-28T23:59:59.999Z] = 0
[2009-05-28T23:59:59.999Z] = 0
[2009-06-28T23:59:59.999Z] = 0
[2009-07-28T23:59:59.999Z] = 0
[2009-08-28T23:59:59.999Z] = 13
[2009-09-28T23:59:59.999Z] = 6
[2009-10-28T23:59:59.999Z] = 2
[2009-11-28T23:59:59.999Z] = 7
[gap] = +1MONTH
[end] = 2009-12-28T23:59:59.999Z

Thanks for your help,

Liam


Re: Date faceting +1MONTH problem

2010-09-09 Thread Liam O'Boyle
Hi Chris,

Yes, I saw the facet.range.include feature and briefly tried to implement it
before realising that it was Solr 3.1 only :)  I agree that it seems like
the best solution to problem.

Reindexing with a +1MILLI hack had occurred to me and I guess that's what
I'll do in the meantime; it just seemed like something that people must have
run into before!  I suppose it depends on the granularity of your
timestamps; all of my values are actually just dates, so I've been putting
them in as the date with T00:00:00.000Z, which makes the overlap problem
very obvious.

If anyone else has come across a solution for this, feel free to suggest
another approach, otherwise it's reindexing time.

Cheers,
Liam


On Fri, Sep 10, 2010 at 8:38 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:

 : I'm trying to break down the data over a year into facets by month; to
 avoid
 : overlap, I'm using -1MILLI on the start and end dates and using a gap of
 : +1MONTH.
 :
 : However, it seems like February completely breaks my monthly cycles,
 leading

 Yep.

 Everything you posted makes sense to me in how DateMath works - Jan 31 @
 23:59.999 + 1 MONTH results in Feb 28 @ 23:59.999 ... at which point
 adding 1 MONTH to that results in Mar 28 @ ... because there is no
 context of what the initial starting point was.

 It's not a situation i've ever personally run into ... one workarround
 would be to use a +1MILLI fudge factor at indexing time, instead of a
 -1MILLI fudge factor at query time ... that shouldn't have this problem.

 If you'd like to open a bug to trak this, I think it might be possible to
 fix this behavior (there are some features in the Java calendaring code
 that make things like Jan 31 + 2 Months do the right thing) but
 personally I think working on SOLR-1896 (combined with the new
 facet.range.include param) is a more effective use of time so
 we can eliminate the need for this type of hack completely in future Solr
 releases.

 -Hoss

 --
 http://lucenerevolution.org/  ...  October 7-8, Boston
 http://bit.ly/stump-hoss  ...  Stump The Chump!




Re: Date Facets

2010-02-24 Thread Liam O'Boyle
In response to myself,

The problem occurs because the date ranges are inclusive.  I can fix
this by making facet.date.gap = +1MONTH-1SECOND, but is there a way to
specify that the upper bound is exclusive, rather than inclusive?

Liam

On Wed, 2010-02-24 at 16:54 +1100, Liam O'Boyle wrote:
 Afternoon,
 
 I have a strange problem occurring with my date faceting.  I seem to
 have more results in my facets than in my actual result set.
 
 The query filters by date to show results for one year, i.e.
 ib_date:[2000-01-01T00:00:00Z TO 2000-12-31T23:59:59Z], then uses date
 faceting to break up the dates by month, using the following
 parameters
 
 facet=true
 facet.date=ib_date
 facet.date.start=2000-01-01T00:00:00Z
 facet.date.end=2000-12-31T23:59:59Z
 facet.date.gap=+1MONTH
 
 However, I end up with more numbers in the facets than there are
 documents in the response, including facets for dates that aren't
 matched. See below for a summary of the results pulled out
 through /solr/select.
 
 result name=response numFound=4 start=0
 -
 doc
 date name=ib_date2000-12-01T00:00:00Z/date
 /doc
 −
 doc
 date name=ib_date2000-08-01T00:00:00Z/date
 /doc
 −
 doc
 date name=ib_date2000-06-01T00:00:00Z/date
 /doc
 −
 doc
 date name=ib_date2000-11-01T00:00:00Z/date
 /doc
 /result
 −
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields/
 −
 lst name=facet_dates
 −
 lst name=ib_date
 int name=2000-01-01T00:00:00Z0/int
 int name=2000-02-01T00:00:00Z0/int
 int name=2000-03-01T00:00:00Z0/int
 int name=2000-04-01T00:00:00Z0/int
 int name=2000-05-01T00:00:00Z1/int
 int name=2000-06-01T00:00:00Z1/int
 int name=2000-07-01T00:00:00Z1/int
 int name=2000-08-01T00:00:00Z1/int
 int name=2000-09-01T00:00:00Z0/int
 int name=2000-10-01T00:00:00Z1/int
 int name=2000-11-01T00:00:00Z2/int
 int name=2000-12-01T00:00:00Z1/int
 str name=gap+1MONTH/str
 date name=end2001-01-01T00:00:00Z/date
 /lst
 /lst
 /lst
 
 Is there something I'm missing here?
 
 Thanks,
 Liam




Date Facets

2010-02-23 Thread Liam O'Boyle
Afternoon,

I have a strange problem occurring with my date faceting.  I seem to
have more results in my facets than in my actual result set.

The query filters by date to show results for one year, i.e.
ib_date:[2000-01-01T00:00:00Z TO 2000-12-31T23:59:59Z], then uses date
faceting to break up the dates by month, using the following parameters

facet=true
facet.date=ib_date
facet.date.start=2000-01-01T00:00:00Z
facet.date.end=2000-12-31T23:59:59Z
facet.date.gap=+1MONTH

However, I end up with more numbers in the facets than there are
documents in the response, including facets for dates that aren't
matched. See below for a summary of the results pulled out
through /solr/select.

result name=response numFound=4 start=0
-
doc
date name=ib_date2000-12-01T00:00:00Z/date
/doc
−
doc
date name=ib_date2000-08-01T00:00:00Z/date
/doc
−
doc
date name=ib_date2000-06-01T00:00:00Z/date
/doc
−
doc
date name=ib_date2000-11-01T00:00:00Z/date
/doc
/result
−
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields/
−
lst name=facet_dates
−
lst name=ib_date
int name=2000-01-01T00:00:00Z0/int
int name=2000-02-01T00:00:00Z0/int
int name=2000-03-01T00:00:00Z0/int
int name=2000-04-01T00:00:00Z0/int
int name=2000-05-01T00:00:00Z1/int
int name=2000-06-01T00:00:00Z1/int
int name=2000-07-01T00:00:00Z1/int
int name=2000-08-01T00:00:00Z1/int
int name=2000-09-01T00:00:00Z0/int
int name=2000-10-01T00:00:00Z1/int
int name=2000-11-01T00:00:00Z2/int
int name=2000-12-01T00:00:00Z1/int
str name=gap+1MONTH/str
date name=end2001-01-01T00:00:00Z/date
/lst
/lst
/lst

Is there something I'm missing here?

Thanks,
Liam


signature.asc
Description: This is a digitally signed message part


Re: Upgrading Tika in Solr

2010-02-17 Thread Liam O'Boyle
I just copied in the newer .jars and got rid of the old ones and
everything seemed to work smoothly enough.

Liam

On Tue, 2010-02-16 at 13:11 -0500, Grant Ingersoll wrote:
 I've got a task open to upgrade to 0.6.  Will try to get to it this week.  
 Upgrading is usually pretty trivial.
 
 
 On Feb 14, 2010, at 12:37 AM, Liam O'Boyle wrote:
 
  Afternoon,
  
  I've got a large collections of documents which I'm attempting to add to
  a Solr index using Tika via the ExtractingRequestHandler, but there are
  a large number that it has problems with (PDFs, PPTX and XLS documents
  mainly).  
  
  I've tried them with the most recent stand alone version of Tika and it
  handles most of the failing documents correctly.  I tried using a recent
  nightly build of Solr, but the same problems seem to occur.
  
  Are there instructions somewhere on installing a more recent Tika build
  into Solr?
  
  Thanks,
  Liam
  
  
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem using Solr/Lucene: 
 http://www.lucidimagination.com/search
 




Upgrading Tika in Solr

2010-02-13 Thread Liam O'Boyle
Afternoon,

I've got a large collections of documents which I'm attempting to add to
a Solr index using Tika via the ExtractingRequestHandler, but there are
a large number that it has problems with (PDFs, PPTX and XLS documents
mainly).  

I've tried them with the most recent stand alone version of Tika and it
handles most of the failing documents correctly.  I tried using a recent
nightly build of Solr, but the same problems seem to occur.

Are there instructions somewhere on installing a more recent Tika build
into Solr?

Thanks,
Liam