[jira] [Created] (TIKA-730) WriteOutContentHandler concatenates title tag and body text.

2011-09-25 Thread Raimund Merkert (JIRA)
WriteOutContentHandler concatenates title tag and body text.


 Key: TIKA-730
 URL: https://issues.apache.org/jira/browse/TIKA-730
 Project: Tika
  Issue Type: Bug
  Components: general, parser
Affects Versions: 0.9
Reporter: Raimund Merkert


I just noticed that the WriteOutContentHandler concatenates strings that it 
should not concatenate. I noticed this in case of a title tag which was 
combined with the first text in a body, e.g.: 
ab
results in "ab" and not "a b" (or something else with a break). Interestingly, 
"ab" does get broken into separate words. 

I'm not aware of a better way to extract text only with an out-of-the-box tika.

I've added a small unit test here:
{code}
package tika;

import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.io.StringWriter;
import java.nio.charset.Charset;

import junit.framework.Assert;

import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.WriteOutContentHandler;
import org.junit.Test;

public class WriteOutContentHandler_JUnit {

private static final String HTML = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\";>"
+ "http://www.w3.org/1999/xhtml\";>title  
a";

public static String processStream(String str) throws Exception {

InputStream in = new ByteArrayInputStream(str.getBytes(Charset
.forName("UTF-8")));

AutoDetectParser parser = new AutoDetectParser();
ParseContext context = new ParseContext();
org.apache.tika.metadata.Metadata m = new 
org.apache.tika.metadata.Metadata();
StringWriter out = new StringWriter();
WriteOutContentHandler ctHandler = new 
WriteOutContentHandler(out);

try {
parser.parse(in, ctHandler, m, context);
return out.toString();
} finally {
out.flush();
}
}

@Test
public void testParse() throws Exception {
String data = processStream(HTML);
data = data.trim();
System.err.println("Extracted:\n" + data);
Assert.assertFalse(data.equals("titlea"));
}
}
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TIKA-730) WriteOutContentHandler concatenates title tag and body text.

2011-09-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114295#comment-13114295
 ] 

Uwe Schindler commented on TIKA-730:


The content handler should be used after BodyContentHandler. The 
XHTMLRequestHandler used as input before only inserts ignorableWhitespace 
between body tags, not in head. The  section is not intended to pass 
through WriteOutContentHandler, therefore use a BodyContentHandler as input to 
WriteOutContentHandler.

> WriteOutContentHandler concatenates title tag and body text.
> 
>
> Key: TIKA-730
> URL: https://issues.apache.org/jira/browse/TIKA-730
> Project: Tika
>  Issue Type: Bug
>  Components: general, parser
>Affects Versions: 0.9
>Reporter: Raimund Merkert
>
> I just noticed that the WriteOutContentHandler concatenates strings that it 
> should not concatenate. I noticed this in case of a title tag which was 
> combined with the first text in a body, e.g.: 
> ab
> results in "ab" and not "a b" (or something else with a break). 
> Interestingly, "ab" does get broken into separate words. 
> I'm not aware of a better way to extract text only with an out-of-the-box 
> tika.
> I've added a small unit test here:
> {code}
> package tika;
> import java.io.ByteArrayInputStream;
> import java.io.InputStream;
> import java.io.StringWriter;
> import java.nio.charset.Charset;
> import junit.framework.Assert;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> import org.apache.tika.sax.WriteOutContentHandler;
> import org.junit.Test;
> public class WriteOutContentHandler_JUnit {
>   private static final String HTML = " XHTML 1.0 Transitional//EN\" 
> \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\";>"
>   + " xmlns=\"http://www.w3.org/1999/xhtml\";>title  
> a";
>   public static String processStream(String str) throws Exception {
>   InputStream in = new ByteArrayInputStream(str.getBytes(Charset
>   .forName("UTF-8")));
>   AutoDetectParser parser = new AutoDetectParser();
>   ParseContext context = new ParseContext();
>   org.apache.tika.metadata.Metadata m = new 
> org.apache.tika.metadata.Metadata();
>   StringWriter out = new StringWriter();
>   WriteOutContentHandler ctHandler = new 
> WriteOutContentHandler(out);
>   try {
>   parser.parse(in, ctHandler, m, context);
>   return out.toString();
>   } finally {
>   out.flush();
>   }
>   }
>   @Test
>   public void testParse() throws Exception {
>   String data = processStream(HTML);
>   data = data.trim();
>   System.err.println("Extracted:\n" + data);
>   Assert.assertFalse(data.equals("titlea"));
>   }
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TIKA-593) Tika network server

2011-09-25 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-593:
---

Affects Version/s: 0.10
Fix Version/s: (was: 0.10)
   1.0
 Assignee: Chris A. Mattmann  (was: Maxim Valyanskiy)

- pushing to 1.0. assign to me. I'd like to shepherd this through with CXF. 
I'll make time in the next week.

> Tika network server
> ---
>
> Key: TIKA-593
> URL: https://issues.apache.org/jira/browse/TIKA-593
> Project: Tika
>  Issue Type: New Feature
>  Components: general
>Affects Versions: 0.10
>Reporter: Jukka Zitting
>Assignee: Chris A. Mattmann
> Fix For: 1.0
>
>
> It would be cool to be able to run Tika as a network service that accepts a 
> binary document as input and produces the extracted content (as XHTML, text, 
> or just metadata) as output. A bit like TIKA-169, but without the dependency 
> to a servlet container.
> I'd like to be able to set up and run such a server like this:
> $ java -jar tika-app.jar --port 1234
> We should also add a NetworkParser class that acts as a local client for such 
> a service. This way a lightweight client could use the full set of Tika 
> parsing functionality even with just the tika-core jar within its classpath.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TIKA-715) Some parsers produce non-well-formed XHTML SAX events

2011-09-25 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-715:
---

Affects Version/s: 0.10
Fix Version/s: (was: 0.10)
   1.0

- pushing out: rolling 0.10 RC today.

> Some parsers produce non-well-formed XHTML SAX events
> -
>
> Key: TIKA-715
> URL: https://issues.apache.org/jira/browse/TIKA-715
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.10
>Reporter: Michael McCandless
> Fix For: 1.0
>
> Attachments: TIKA-715.patch
>
>
> With TIKA-683 I committed simple, commented out code to
> SafeContentHandler, to verify that the SAX events produced by the
> parser have valid (matched) tags.  Ie, each startElement("foo") is
> matched by the closing endElement("foo").
> I only did basic nesting test, plus checking that  is never
> embedded inside another ; we could strengthen this further to check
> that all tags only appear in valid parents...
> I was able to use this to fix issues with the new RTF parser
> (TIKA-683), but I was surprised that some other parsers failed the new
> asserts.
> It could be these are relatively minor offenses (eg closing a table
> w/o closing the tr) and we need not do anything here... but I think
> it'd be cleaner if all our parsers produced matched, well-formed XHTML
> events.
> I haven't looked into any of these... it could be they are easy to fix.
> Failures:
> {noformat}
> testOutlookHTMLVersion(org.apache.tika.parser.microsoft.OutlookParserTest)  
> Time elapsed: 0.032 sec  <<< ERROR!
> java.lang.AssertionError: end tag=body with no startElement
>   at 
> org.apache.tika.sax.SafeContentHandler.verifyEndElement(SafeContentHandler.java:224)
>   at 
> org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:275)
>   at 
> org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:210)
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:242)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129)
>   at 
> org.apache.tika.parser.microsoft.OutlookParserTest.testOutlookHTMLVersion(OutlookParserTest.java:158)
> testParseKeynote(org.apache.tika.parser.iwork.IWorkParserTest)  Time elapsed: 
> 0.116 sec  <<< ERROR!
> java.lang.AssertionError: mismatched elements open=tr close=table
>   at 
> org.apache.tika.sax.SafeContentHandler.verifyEndElement(SafeContentHandler.java:226)
>   at 
> org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:275)
>   at 
> org.apache.tika.sax.XHTMLContentHandler.endElement(XHTMLContentHandler.java:252)
>   at 
> org.apache.tika.sax.XHTMLContentHandler.endElement(XHTMLContentHandler.java:287)
>   at 
> org.apache.tika.parser.iwork.KeynoteContentHandler.endElement(KeynoteContentHandler.java:136)
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>   at 
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:601)
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1782)
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2938)
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
>   at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
>   at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
>   at 
> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
>   at 
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
>   at 
> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
>   at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
>   at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
>   at 
> org.apache.tika.parser.iwork.IWorkPackageParser.parse(IWorkPackageParser.java:190)
>   at 
> org.apache.tika.parser.iwork.IWorkParserTest.testParseKeynote(IWorkParserTest.ja

[jira] [Updated] (TIKA-565) Improved OSGi bundling

2011-09-25 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-565:
---

Affects Version/s: 0.10
Fix Version/s: (was: 0.10)
   1.0

- pushing out: rolling .10 RC today

> Improved OSGi bundling
> --
>
> Key: TIKA-565
> URL: https://issues.apache.org/jira/browse/TIKA-565
> Project: Tika
>  Issue Type: Improvement
>  Components: packaging
>Affects Versions: 0.10
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.0
>
> Attachments: core-bundle-fix.diff
>
>
> I'd like to add proper integration tests for tika-bundle and expose the Tika 
> facade object as a service so other bundles could access it easily like this:
> @Reference
> private Tika tika;
> It would also be nice to allow other OSGi bundles to expose their Parser 
> implementations as pluggable services and have the Tika bundle automatically 
> pick up and use them along with all the embedded parsers it contains.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TIKA-539) Encoding detection is too biased by encoding in meta tag

2011-09-25 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-539:
---

Affects Version/s: 0.10
Fix Version/s: (was: 0.10)
   1.0

- pushing out: rolling 0.10 RC today.

> Encoding detection is too biased by encoding in meta tag
> 
>
> Key: TIKA-539
> URL: https://issues.apache.org/jira/browse/TIKA-539
> Project: Tika
>  Issue Type: Bug
>  Components: metadata, parser
>Affects Versions: 0.8, 0.9, 0.10
>Reporter: Reinhard Schwab
>Assignee: Ken Krugler
> Fix For: 1.0
>
> Attachments: TIKA-539.patch, TIKA-539_2.patch
>
>
> if the encoding in the meta tag is wrong, this encoding is detected,
> even if there is the right encoding set in metadata before(which can be  from 
> http response header).
> test code to reproduce:
> static String content = "\n"
>   + " content=\"application/xhtml+xml; charset=iso-8859-1\" />"
>   + "Über den Wolken\n";
>   /**
>* @param args
>* @throws IOException
>* @throws TikaException
>* @throws SAXException
>*/
>   public static void main(String[] args) throws IOException, SAXException,
>   TikaException {
>   Metadata metadata = new Metadata();
>   metadata.set(Metadata.CONTENT_TYPE, "text/html");
>   metadata.set(Metadata.CONTENT_ENCODING, "UTF-8");
>   System.out.println(metadata.get(Metadata.CONTENT_ENCODING));
>   InputStream in = new 
> ByteArrayInputStream(content.getBytes("UTF-8"));
>   AutoDetectParser parser = new AutoDetectParser();
>   BodyContentHandler h = new BodyContentHandler(1);
>   parser.parse(in, h, metadata, new ParseContext());
>   System.out.print(h.toString());
>   System.out.println(metadata.get(Metadata.CONTENT_ENCODING));
>   }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TIKA-605) Tika GDAL parser

2011-09-25 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-605:
---

Attachment: TIKA-605.Mattmann.092511.patch.txt

- totally incomplete patch, but attaching so I can clean my local workspace, 
and get back to this later. Need to get GDAL bindings jar up on Maven Central 
too.

> Tika GDAL parser
> 
>
> Key: TIKA-605
> URL: https://issues.apache.org/jira/browse/TIKA-605
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
> Environment: indep. of env.
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
>  Labels: gdal, integration, tika
> Attachments: TIKA-605.Mattmann.092511.patch.txt
>
>
> Leverage the GDAL toolkit and its Java SWIG bindings to create a Tika parser 
> around GDAL. See here: 
> http://trac.osgeo.org/gdal/browser/trunk/gdal/swig/java/apps/gdalinfo.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[NOTICE} 0.10 RC likely this evening PDT

2011-09-25 Thread Mattmann, Chris A (388J)
Hey Guys,

I'll spin the 0.10 RC this evening, PDT. 

Headed to Disneyland with the fam right now :-)

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



[jira] [Updated] (TIKA-605) Tika GDAL parser

2011-09-25 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated TIKA-605:
---

Fix Version/s: 1.0

- i'll try and get this in for 1.0

> Tika GDAL parser
> 
>
> Key: TIKA-605
> URL: https://issues.apache.org/jira/browse/TIKA-605
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
> Environment: indep. of env.
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
>  Labels: gdal, integration, tika
> Fix For: 1.0
>
> Attachments: TIKA-605.Mattmann.092511.patch.txt
>
>
> Leverage the GDAL toolkit and its Java SWIG bindings to create a Tika parser 
> around GDAL. See here: 
> http://trac.osgeo.org/gdal/browser/trunk/gdal/swig/java/apps/gdalinfo.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TIKA-605) Tika GDAL parser

2011-09-25 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114338#comment-13114338
 ] 

Chris A. Mattmann commented on TIKA-605:


The other tricky thing about this is that GDAL seems to have its own MIME 
identification system, that is based on file name, or glob pattern. So, when I 
used TikaInputStream.getFile() which returns a temp file name as well, GDAL was 
complaining that it didn't understand that file type. I think I specifically 
request a file extension for the temp file to get, or if I can't, then I'll 
update TikaInputStream.getFile() to allow this.


> Tika GDAL parser
> 
>
> Key: TIKA-605
> URL: https://issues.apache.org/jira/browse/TIKA-605
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
> Environment: indep. of env.
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
>  Labels: gdal, integration, tika
> Fix For: 1.0
>
> Attachments: TIKA-605.Mattmann.092511.patch.txt
>
>
> Leverage the GDAL toolkit and its Java SWIG bindings to create a Tika parser 
> around GDAL. See here: 
> http://trac.osgeo.org/gdal/browser/trunk/gdal/swig/java/apps/gdalinfo.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [PROPOSAL] Any23 to join the incubator

2011-09-25 Thread Mattmann, Chris A (388J)
Hi All,

OK, since the chatter about this proposal has died down and since 
I've agreed to champion it, I'll call a formal VOTE tomorrow afternoon
and let it run through the rest of the week. The Tika PMC has not 
registered any objections to sponsoring the proposal, so I will go 
ahead and update it to reflect Tika PMC as the sponsor and we will 
look forward to helping to shepherd and mentor Any23 through 
the Incubator.

Thanks for your input!

Cheers,
Chris

On Sep 22, 2011, at 7:43 AM, Simone Tripodi wrote:

> Hi Lewis!
> thanks a lot for your interest on Any23 and welcome aboard!! I'm going
> to put you in the initial committers list!
> All the best, have a nice day!
> Simo
> 
> http://people.apache.org/~simonetripodi/
> http://www.99soft.org/
> 
> 
> 
> On Thu, Sep 22, 2011 at 4:39 PM, lewis john mcgibbney
>  wrote:
>> Hi everyone,
>> 
>> Further to the previous threads on this topic, I would like to express my
>> interest in becoming a committer for the project. Coming from an academic
>> background I am working extensively with the mapping of static legislative
>> document resources to RDF datasets and then using these datasets across
>> platforms such as Kasabi [1], and various projects closely linked to Jena,
>> E.g. Joseki and Fuseki. Also I've found other tools such as eyeball reall
>> helpful during my journey.
>> 
>> I was voted in by the Apache Nutch PMC around three months ago as PMC member
>> and Committer, and was thankfully directed to this thread by Chris Mattmann.
>> The idea of extending the functionality of Any23 as a Nutch plugin is
>> something which interests me, and which could also benefit academic/research
>> users of Nutch such as myself. At this stage I don't have a strong opinion
>> on whether Any23 should be a sub-project of Tika, but think it is very
>> encouraging that it seems like a probable direction the project is/could
>> move towards.
>> 
>> Thanks very much.
>> 
>> Lewis
>> 
>> [1]
>> http://beta.kasabi.com/dataset/wombra-scottish-technical-standards-section-6-energy
>> 
>> --
>> *Lewis*
>> 
> 
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



[VOTE] Apache Tika 0.10 release rc #1

2011-09-25 Thread Mattmann, Chris A (388J)
Hi Folks,

A first release candidate for the Tika 0.10 release is available at:

http://people.apache.org/~mattmann/apache-tika-0.10/rc1/

The release candidate is a zip archive of the sources in:

http://svn.apache.org/repos/asf/tika/tags/0.10/

The SHA1 checksum of the archive is 355d0b2fa0de232672e4760941ea0dcf641a82ad.

A staged Maven repository is available at:

https://repository.apache.org/content/repositories/orgapachetika-100/

Please vote on releasing this package as Apache Tika 0.10.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Tika PMC votes are cast.

[ ] +1 Release this package as Apache Tika 0.10
[ ] -1 Do not release this package because...

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Tika 0.10 release rc #1

2011-09-25 Thread Oleg Tikhonov
In favor of releasing the Tika 0.10, +1



On Mon, Sep 26, 2011 at 9:50 AM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hi Folks,
>
> A first release candidate for the Tika 0.10 release is available at:
>
>http://people.apache.org/~mattmann/apache-tika-0.10/rc1/
>
> The release candidate is a zip archive of the sources in:
>
>http://svn.apache.org/repos/asf/tika/tags/0.10/
>
> The SHA1 checksum of the archive is
> 355d0b2fa0de232672e4760941ea0dcf641a82ad.
>
> A staged Maven repository is available at:
>
> https://repository.apache.org/content/repositories/orgapachetika-100/
>
> Please vote on releasing this package as Apache Tika 0.10.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
>[ ] +1 Release this package as Apache Tika 0.10
>[ ] -1 Do not release this package because...
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>