[jira] Assigned: (CONNECTORS-16) JCIFS connector's document fingerprinting feature is not general enough

2010-03-18 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-16:
-

Assignee: Karl Wright

 JCIFS connector's document fingerprinting feature is not general enough
 ---

 Key: CONNECTORS-16
 URL: https://issues.apache.org/jira/browse/CONNECTORS-16
 Project: Lucene Connector Framework
  Issue Type: Improvement
  Components: Framework agents process, Framework crawler agent, GTS 
 connector, JCIFS connector, LiveLink connector, Lucene/SOLR connector, 
 Meridio connector, RSS connector, SharePoint connector, Web connector
Reporter: Karl Wright
Assignee: Karl Wright
Priority: Minor

 The JCIFS connector has a feature, called fingerprinting, which allows it 
 to classify documents according to ability of the back-end to index that 
 content.  Right at the moment, this fingerprinter is capable of recognizing 
 PDFs, Microsoft Office files, and text files as being indexable.  One could 
 imagine, though, that different SOLR plugins, etc. might have more capability 
 than that.  Also, other connectors could potentially benefit from similar 
 technology, specifically any connector that deals with binary documents.
 One approach to solving this problem would be to remove the feature entirely, 
 and allow whatever pipeline exists in SOLR determine the indexability after 
 the fact.  The reason this feature was added at MetaCarta, however, is that 
 it may be possible to exclude an un-useful document without having to fetch 
 the whole thing, and (at least for MetaCarta clients) the number of 
 unindexable files of gigantic size was a big concern.
 Another approach might be to tie the functionality in with the output 
 connector interface, so that an output connector would (somehow) determine 
 applicability of a document.  This would require some care to make it 
 possible to fingerprint without having to download the entire document, but 
 would otherwise have the correct overall structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (CONNECTORS-24) SOLR connector needs the ability to ingest metadata

2010-03-18 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-24.
---

Resolution: Fixed

Oops, I'd forgotten that this was actually already done.


 SOLR connector needs the ability to ingest metadata
 ---

 Key: CONNECTORS-24
 URL: https://issues.apache.org/jira/browse/CONNECTORS-24
 Project: Lucene Connector Framework
  Issue Type: Improvement
  Components: Lucene/SOLR connector
Reporter: Karl Wright

 The SOLR connector is pretty bare-bones at the moment, and even lacks the 
 ability to transmit metadata to SOLR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.