Re: svn commit: r1658847 - /tika/trunk/tika-server/pom.xml

2015-02-11 Thread Nick Burch

On Wed, 11 Feb 2015, tpalsul...@apache.org wrote:

+  
+
+  miredot
+  MireDot Releases
+  http://nexus.qmino.com/content/repositories/miredot
+
+  


I'm not sure we're allowed to have other repositories defined in our poms? 
We're certainly strongly encouraged not to, as part of the Maven Central 
sync rules:

http://maven.apache.org/guides/mini/guide-central-repository-upload.html#FAQ_and_common_mistakes

Is there an alternative?

Nick


Re: svn commit: r1658847 - /tika/trunk/tika-server/pom.xml

2015-02-11 Thread Konstantin Gribov
Maybe move both pluginRepository and plugin execution to profile will help?

-- 
Best regards,
Konstantin Gribov

Wed Feb 11 2015 at 14:21:12, Nick Burch :

> On Wed, 11 Feb 2015, tpalsul...@apache.org wrote:
> > +  
> > +
> > +  miredot
> > +  MireDot Releases
> > +  http://nexus.qmino.com/content/repositories/miredot
> > +
> > +  
>
> I'm not sure we're allowed to have other repositories defined in our poms?
> We're certainly strongly encouraged not to, as part of the Maven Central
> sync rules:
> http://maven.apache.org/guides/mini/guide-central-
> repository-upload.html#FAQ_and_common_mistakes
>
> Is there an alternative?
>
> Nick
>


RE: svn commit: r1658847 - /tika/trunk/tika-server/pom.xml

2015-02-11 Thread Allison, Timothy B.
I'm working behind a proxy and getting a new proxy error ("proxy 
unacknowledged") with r1658847 on tika-server package.

-Original Message-
From: Nick Burch [mailto:apa...@gagravarr.org] 
Sent: Wednesday, February 11, 2015 6:18 AM
To: dev@tika.apache.org
Subject: Re: svn commit: r1658847 - /tika/trunk/tika-server/pom.xml

On Wed, 11 Feb 2015, tpalsul...@apache.org wrote:
> +  
> +
> +  miredot
> +  MireDot Releases
> +  http://nexus.qmino.com/content/repositories/miredot
> +
> +  

I'm not sure we're allowed to have other repositories defined in our poms? 
We're certainly strongly encouraged not to, as part of the Maven Central 
sync rules:
http://maven.apache.org/guides/mini/guide-central-repository-upload.html#FAQ_and_common_mistakes

Is there an alternative?

Nick


[jira] [Resolved] (TIKA-1544) empty lines are not preserved

2015-02-11 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-1544.
---
   Resolution: Fixed
Fix Version/s: 1.8

added fix similar to that proposed by [~almson] on TIKA-1309 in r1658947.

> empty lines are not preserved
> -
>
> Key: TIKA-1544
> URL: https://issues.apache.org/jira/browse/TIKA-1544
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.6
> Environment: Windows 8, Java 1.8
>Reporter: mortee
>Priority: Minor
> Fix For: 1.8
>
> Attachments: preserve_new_lines_in_rtf.patch, testRTFNewlines.rtf
>
>
> I'm trying to extract the text content from RTF documents. The files contain 
> empty lines (two or more consecutive paragraph-end marks), on which the 
> further processing relies to tell apart different parts of the text. But 
> unfortuantely Tika (with --text switch) eliminates all those empty lines, 
> instead of preserving them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TIKA-1309) RTF TextExtractor ignores consecutive linebreaks

2015-02-11 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-1309.
---
   Resolution: Fixed
Fix Version/s: 1.8

Fixed on duplicate TIKA-1544 in r1658947.  Thank you for the patch, [~almson]!

> RTF TextExtractor ignores consecutive linebreaks
> 
>
> Key: TIKA-1309
> URL: https://issues.apache.org/jira/browse/TIKA-1309
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5, 1.6
>Reporter: Aleksandr Dubinsky
> Fix For: 1.8
>
> Attachments: 0001-fix-RTF-ignores-consecutive-newlines.patch, test.rtf
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> RTF files (such as those produced by WordPad) often encode consecutive 
> linebreaks as consecutive \par commands. However, 
> org.apache.tika.parser.rtf.TextExtractor ignores the second \par. Solution is 
> simple, see attached patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1544) empty lines are not preserved

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316211#comment-14316211
 ] 

Hudson commented on TIKA-1544:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #485 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/485/])
TIKA-1544 consecutive new lines not preserved in rtf (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1658947)
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/rtf/TextExtractor.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/rtf/RTFParserTest.java
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testRTFNewlines.rtf


> empty lines are not preserved
> -
>
> Key: TIKA-1544
> URL: https://issues.apache.org/jira/browse/TIKA-1544
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.6
> Environment: Windows 8, Java 1.8
>Reporter: mortee
>Priority: Minor
> Fix For: 1.8
>
> Attachments: preserve_new_lines_in_rtf.patch, testRTFNewlines.rtf
>
>
> I'm trying to extract the text content from RTF documents. The files contain 
> empty lines (two or more consecutive paragraph-end marks), on which the 
> further processing relies to tell apart different parts of the text. But 
> unfortuantely Tika (with --text switch) eliminates all those empty lines, 
> instead of preserving them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1548) System property added while catching exception on parsing PDF encrypted doc

2015-02-11 Thread David Pilato (JIRA)
David Pilato created TIKA-1548:
--

 Summary: System property added while catching exception on parsing 
PDF encrypted doc
 Key: TIKA-1548
 URL: https://issues.apache.org/jira/browse/TIKA-1548
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.7
 Environment: Mac OS 10.10.2
java version "1.7.0_60"
Reporter: David Pilato


I'm using Tika 1.7. I'm parsing an encrypted PDF document which raise an 
exception. So far, so good.

My concern is that after that I have a new System property set 
{{sun.font.CFontManager}}. 

Code to reproduce the error:

{code:java}
@Test
public void testSystem() {
Properties props = System.getProperties();
assertThat(props.get("sun.font.fontmanager"), nullValue());
try {
tika().parseToString(new 
URL("https://github.com/elasticsearch/elasticsearch-mapper-attachments/raw/master/src/test/resources/org/elasticsearch/index/mapper/xcontent/encrypted.pdf";));
} catch (Throwable e) {
}
assertThat(props.get("sun.font.fontmanager"), nullValue());
}
{code}


With Tika 1.7:

{code}
[2015-02-11 16:43:36,166][INFO ][org.apache.pdfbox.pdfparser.PDFParser] 
Document is encrypted
[2015-02-11 16:43:36,837][ERROR][org.apache.pdfbox.filter.FlateFilter] 
FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-02-11 16:43:36,837][ERROR][org.apache.pdfbox.filter.FlateFilter] 
FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-02-11 16:43:36,838][ERROR][org.apache.pdfbox.filter.FlateFilter] 
FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-02-11 16:43:36,838][ERROR][org.apache.pdfbox.filter.FlateFilter] 
FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-02-11 16:43:36,839][ERROR][org.apache.pdfbox.filter.FlateFilter] 
FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-02-11 16:43:36,840][ERROR][org.apache.pdfbox.filter.FlateFilter] 
FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-02-11 16:43:36,840][ERROR][org.apache.pdfbox.filter.FlateFilter] 
FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-02-11 16:43:36,841][ERROR][org.apache.pdfbox.filter.FlateFilter] 
FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-02-11 16:43:36,841][ERROR][org.apache.pdfbox.filter.FlateFilter] 
FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-02-11 16:43:36,842][ERROR][org.apache.pdfbox.filter.FlateFilter] 
FlateFilter: stop reading corrupt stream due to a DataFormatException

java.lang.AssertionError: 
Expected: null
 but: was "sun.font.CFontManager"
 
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
at 
org.elasticsearch.plugin.mapper.attachments.test.TikaSystemTest.testSystem(TikaSystemTest.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
{code}


With Ti

Re: svn commit: r1658847 - /tika/trunk/tika-server/pom.xml

2015-02-11 Thread Tyler Palsulich
Hi All,
Responses inline.

On Wed, Feb 11, 2015 at 7:35 AM, Allison, Timothy B. 
wrote:

> I'm working behind a proxy and getting a new proxy error ("proxy
> unacknowledged") with r1658847 on tika-server package.
>

That seems odd... Would adding another pluginRepository cause that?


>
> -Original Message-
> From: Nick Burch [mailto:apa...@gagravarr.org]
> Sent: Wednesday, February 11, 2015 6:18 AM
> To: dev@tika.apache.org
> Subject: Re: svn commit: r1658847 - /tika/trunk/tika-server/pom.xml
>
> On Wed, 11 Feb 2015, tpalsul...@apache.org wrote:
> > +  
> > +
> > +  miredot
> > +  MireDot Releases
> > +  http://nexus.qmino.com/content/repositories/miredot
> > +
> > +  
>
> I'm not sure we're allowed to have other repositories defined in our poms?
> We're certainly strongly encouraged not to, as part of the Maven Central
> sync rules:
>
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html#FAQ_and_common_mistakes
>

I see your point. But, Miredot (and its parent company, qmino) is fairly
established. Maybe we can reach out to the company and see if they could
publish to Central? Since it's not OSS, I'm not sure if/how that would work.


> Is there an alternative?
>

As in an alternate documentation generating service? There is Apiary, but
the generated docs wouldn't be Apache hosted. Miredot still seems to be the
best -- formatting, usability, etc.


>
> Nick
>


Re: svn commit: r1658847 - /tika/trunk/tika-server/pom.xml

2015-02-11 Thread Sergey Beryozkin

On 11/02/15 16:03, Tyler Palsulich wrote:

Hi All,
Responses inline.

On Wed, Feb 11, 2015 at 7:35 AM, Allison, Timothy B. 
wrote:


I'm working behind a proxy and getting a new proxy error ("proxy
unacknowledged") with r1658847 on tika-server package.



That seems odd... Would adding another pluginRepository cause that?




-Original Message-
From: Nick Burch [mailto:apa...@gagravarr.org]
Sent: Wednesday, February 11, 2015 6:18 AM
To: dev@tika.apache.org
Subject: Re: svn commit: r1658847 - /tika/trunk/tika-server/pom.xml

On Wed, 11 Feb 2015, tpalsul...@apache.org wrote:

+  
+
+  miredot
+  MireDot Releases
+  http://nexus.qmino.com/content/repositories/miredot
+
+  


I'm not sure we're allowed to have other repositories defined in our poms?
We're certainly strongly encouraged not to, as part of the Maven Central
sync rules:

http://maven.apache.org/guides/mini/guide-central-repository-upload.html#FAQ_and_common_mistakes



I see your point. But, Miredot (and its parent company, qmino) is fairly
established. Maybe we can reach out to the company and see if they could
publish to Central? Since it's not OSS, I'm not sure if/how that would work.

Is it even open source, their plugin ? If not then IMHO it is not right 
to get it included (it appears to be a fine project but...).



Is there an alternative?



As in an alternate documentation generating service? There is Apiary, but
the generated docs wouldn't be Apache hosted. Miredot still seems to be the
best -- formatting, usability, etc.

Swagger ?
We have a CXF demo. The only thing I don't fancy about Swagger is that 
one has to put a lot of Swagger annotations


Cheers. Sergey





Nick






--
Sergey Beryozkin

Talend Community Coders
http://coders.talend.com/

Blog: http://sberyozkin.blogspot.com


[jira] [Commented] (TIKA-1548) System property added while catching exception on parsing PDF encrypted doc

2015-02-11 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316458#comment-14316458
 ] 

Tim Allison commented on TIKA-1548:
---

Thank you for raising this. I'll look into it. [~tilman], any ideas?

> System property added while catching exception on parsing PDF encrypted doc
> ---
>
> Key: TIKA-1548
> URL: https://issues.apache.org/jira/browse/TIKA-1548
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.7
> Environment: Mac OS 10.10.2
> java version "1.7.0_60"
>Reporter: David Pilato
>
> I'm using Tika 1.7. I'm parsing an encrypted PDF document which raise an 
> exception. So far, so good.
> My concern is that after that I have a new System property set 
> {{sun.font.CFontManager}}. 
> Code to reproduce the error:
> {code:java}
> @Test
> public void testSystem() {
> Properties props = System.getProperties();
> assertThat(props.get("sun.font.fontmanager"), nullValue());
> try {
> tika().parseToString(new 
> URL("https://github.com/elasticsearch/elasticsearch-mapper-attachments/raw/master/src/test/resources/org/elasticsearch/index/mapper/xcontent/encrypted.pdf";));
> } catch (Throwable e) {
> }
> assertThat(props.get("sun.font.fontmanager"), nullValue());
> }
> {code}
> With Tika 1.7:
> {code}
> [2015-02-11 16:43:36,166][INFO ][org.apache.pdfbox.pdfparser.PDFParser] 
> Document is encrypted
> [2015-02-11 16:43:36,837][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,837][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,838][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,838][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,839][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,840][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,840][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,841][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,841][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,842][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> java.lang.AssertionError: 
> Expected: null
>  but: was "sun.font.CFontManager"
>  
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
>   at 
> org.elasticsearch.plugin.mapper.attachments.test.TikaSystemTest.testSystem(TikaSystemTest.java:41)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
>   at 
> com.int

Re: Build failure of OSGi bundle, Gsoc2015.

2015-02-11 Thread Abhinav Gupta
Hello Tyler,

Thanks for the link, it makes more sense now :) I wasn't aware of the
archives at marklogic.

I have attached  the mvn version [0].

Deleting the files of \.m2 didn't really help.

But executing "mvn -Dmaven.test.skip=true install" does work, as Chris had
suggested on Feb 8th (I had missed the earlier replies) and I got a
successful build.

Also I'm interested in participating in gsoc2015. And I would like to work
on Tika-1456. Could you please guide me on how to participate, fix patches
and write a proposal ?

[0] : http://pastebin.com/MZPq2dji

On Tue, Feb 10, 2015 at 11:36 PM, Tyler Palsulich 
wrote:

> HI Abhinav,
>
> Whoops, yes, I forgot to add the link to the previous thread! My mistake --
>
> http://apache.markmail.org/search/?q=Error%20installation%20(Build%20Failure%20at%20OSGi%20Bundle)#query:Error%20installation%20(Build%20Failure%20at%20OSGi%20Bundle)%20list%3Aorg.apache.lucene.tika-dev+page:1+mid:36lcp3wrobvy2e3x+state:results
>
> Feel free to reply directly to this thread, instead of starting a new one.
> That way, we can keep context in the conversation.
>
> There is some other information (like OS and Java version) that is printed
> when you run `mvn --version`. So, I'm curious what the rest of the output
> is.
>
> I know you tried building with the `mvn -U` option, but can you try
> deleting `C:\Documents and Settings\{user}\.m2` then running `mvn clean
> install`? I don't know if that will actually help in your situation, but
> it's worth a shot.
>
> Hope that helps,
> Tyler
>
> On Tue, Feb 10, 2015 at 11:30 AM, Abhinav Gupta <
> abhinavgupta2...@gmail.com>
> wrote:
>
> > Hi Tyler,
> >
> > Yes I tried clearing the Maven cache by running "mvn -U clean install"
> but
> > that didn't help. The version of Apache Maven is 3.2.5.
> >
> > Initially I was trying to build the source that I got from the svn
> > repository, so I suppose that would be trunk. But when the built was
> > unsuccessful, I even tried to built the source from one of the download
> > mirrors.
> >
> > Are you referring to the archives by mentioning other thread ? Since I'm
> > new to the open source and there wasn't a link for the other threads that
> > you mentioned.
> >
> > I've attached the log [0] :).
> >
> > Thank you very much,
> >
> > Sincerely,
> > Abhinav.
> >
> > [0] : http://pastebin.com/Rwj9fNDV
> >
>


Re: Error installation (Build Failure at OSGi Bundle)

2015-02-11 Thread Abhinav Gupta
Hi Myrna,

Thanks for the help :) As Bryan had suggested I'm able to execute "ant
junit-system-mini" and "ant junit-all".

I am new to the open source and somehow I had managed to miss the earlier
replies,which I recently realized while going through the archives at
MarkLogic.

Also I intend to participate in gsoc2015. Apache Derby interests me and I
would like to work on Derby-6791. Could you please guide me on how to
participate, fix patches and write a proposal ?

Thank you very much for your time :)

Regards,
Abhinav.

On Sun, Feb 8, 2015 at 4:28 AM, Abhinav Gupta 
wrote:

> Hello everyone,
>
> I was installing tika and I got a build failure. The error occurred while
> installing the OSGi bundle. I am not very sure on how to solve this.
> I have attached the complete log of "*mvn clean install*"
>
> Thank you very much for your time.
>
> Abhinav
>


Re: Build failure of OSGi bundle, Gsoc2015.

2015-02-11 Thread Tyler Palsulich
Can you try installing Java 1.7? I have a hunch Java 1.8 is causing the
issue of not passing tests.

I'm not as experienced with GSOC. But, feel free to comment on the issue
that you're interested. Then, you'll want to come up with a thorough
description of how the Parser will fit into Tika, what the Parser will do
exactly, and your plan of attack of how to implement it.

Tyler

On Wed, Feb 11, 2015 at 1:08 PM, Abhinav Gupta 
wrote:

> Hello Tyler,
>
> Thanks for the link, it makes more sense now :) I wasn't aware of the
> archives at marklogic.
>
> I have attached  the mvn version [0].
>
> Deleting the files of \.m2 didn't really help.
>
> But executing "mvn -Dmaven.test.skip=true install" does work, as Chris had
> suggested on Feb 8th (I had missed the earlier replies) and I got a
> successful build.
>
> Also I'm interested in participating in gsoc2015. And I would like to work
> on Tika-1456. Could you please guide me on how to participate, fix patches
> and write a proposal ?
>
> [0] : http://pastebin.com/MZPq2dji
>
> On Tue, Feb 10, 2015 at 11:36 PM, Tyler Palsulich 
> wrote:
>
> > HI Abhinav,
> >
> > Whoops, yes, I forgot to add the link to the previous thread! My mistake
> --
> >
> >
> http://apache.markmail.org/search/?q=Error%20installation%20(Build%20Failure%20at%20OSGi%20Bundle)#query:Error%20installation%20(Build%20Failure%20at%20OSGi%20Bundle)%20list%3Aorg.apache.lucene.tika-dev+page:1+mid:36lcp3wrobvy2e3x+state:results
> >
> > Feel free to reply directly to this thread, instead of starting a new
> one.
> > That way, we can keep context in the conversation.
> >
> > There is some other information (like OS and Java version) that is
> printed
> > when you run `mvn --version`. So, I'm curious what the rest of the output
> > is.
> >
> > I know you tried building with the `mvn -U` option, but can you try
> > deleting `C:\Documents and Settings\{user}\.m2` then running `mvn clean
> > install`? I don't know if that will actually help in your situation, but
> > it's worth a shot.
> >
> > Hope that helps,
> > Tyler
> >
> > On Tue, Feb 10, 2015 at 11:30 AM, Abhinav Gupta <
> > abhinavgupta2...@gmail.com>
> > wrote:
> >
> > > Hi Tyler,
> > >
> > > Yes I tried clearing the Maven cache by running "mvn -U clean install"
> > but
> > > that didn't help. The version of Apache Maven is 3.2.5.
> > >
> > > Initially I was trying to build the source that I got from the svn
> > > repository, so I suppose that would be trunk. But when the built was
> > > unsuccessful, I even tried to built the source from one of the download
> > > mirrors.
> > >
> > > Are you referring to the archives by mentioning other thread ? Since
> I'm
> > > new to the open source and there wasn't a link for the other threads
> that
> > > you mentioned.
> > >
> > > I've attached the log [0] :).
> > >
> > > Thank you very much,
> > >
> > > Sincerely,
> > > Abhinav.
> > >
> > > [0] : http://pastebin.com/Rwj9fNDV
> > >
> >
>


Re: Error installation (Build Failure at OSGi Bundle)

2015-02-11 Thread Tyler Palsulich
Wrong thread/list?

Cheers,
Tyler

On Wed, Feb 11, 2015 at 1:14 PM, Abhinav Gupta 
wrote:

> Hi Myrna,
>
> Thanks for the help :) As Bryan had suggested I'm able to execute "ant
> junit-system-mini" and "ant junit-all".
>
> I am new to the open source and somehow I had managed to miss the earlier
> replies,which I recently realized while going through the archives at
> MarkLogic.
>
> Also I intend to participate in gsoc2015. Apache Derby interests me and I
> would like to work on Derby-6791. Could you please guide me on how to
> participate, fix patches and write a proposal ?
>
> Thank you very much for your time :)
>
> Regards,
> Abhinav.
>
> On Sun, Feb 8, 2015 at 4:28 AM, Abhinav Gupta 
> wrote:
>
> > Hello everyone,
> >
> > I was installing tika and I got a build failure. The error occurred while
> > installing the OSGi bundle. I am not very sure on how to solve this.
> > I have attached the complete log of "*mvn clean install*"
> >
> > Thank you very much for your time.
> >
> > Abhinav
> >
>


[jira] [Commented] (TIKA-1548) System property added while catching exception on parsing PDF encrypted doc

2015-02-11 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316723#comment-14316723
 ] 

Tilman Hausherr commented on TIKA-1548:
---

Sorry, no. We're not setting that one. It isn't in our code.

> System property added while catching exception on parsing PDF encrypted doc
> ---
>
> Key: TIKA-1548
> URL: https://issues.apache.org/jira/browse/TIKA-1548
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.7
> Environment: Mac OS 10.10.2
> java version "1.7.0_60"
>Reporter: David Pilato
>
> I'm using Tika 1.7. I'm parsing an encrypted PDF document which raise an 
> exception. So far, so good.
> My concern is that after that I have a new System property set 
> {{sun.font.CFontManager}}. 
> Code to reproduce the error:
> {code:java}
> @Test
> public void testSystem() {
> Properties props = System.getProperties();
> assertThat(props.get("sun.font.fontmanager"), nullValue());
> try {
> tika().parseToString(new 
> URL("https://github.com/elasticsearch/elasticsearch-mapper-attachments/raw/master/src/test/resources/org/elasticsearch/index/mapper/xcontent/encrypted.pdf";));
> } catch (Throwable e) {
> }
> assertThat(props.get("sun.font.fontmanager"), nullValue());
> }
> {code}
> With Tika 1.7:
> {code}
> [2015-02-11 16:43:36,166][INFO ][org.apache.pdfbox.pdfparser.PDFParser] 
> Document is encrypted
> [2015-02-11 16:43:36,837][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,837][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,838][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,838][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,839][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,840][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,840][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,841][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,841][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,842][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> java.lang.AssertionError: 
> Expected: null
>  but: was "sun.font.CFontManager"
>  
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
>   at 
> org.elasticsearch.plugin.mapper.attachments.test.TikaSystemTest.testSystem(TikaSystemTest.java:41)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
>   at 
> com.int

[jira] [Commented] (TIKA-1548) System property added while catching exception on parsing PDF encrypted doc

2015-02-11 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316787#comment-14316787
 ] 

Tyler Palsulich commented on TIKA-1548:
---

I'm not seeing any mentions in Tika, either ({{grep -R "CFontManager" .}}).

> System property added while catching exception on parsing PDF encrypted doc
> ---
>
> Key: TIKA-1548
> URL: https://issues.apache.org/jira/browse/TIKA-1548
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.7
> Environment: Mac OS 10.10.2
> java version "1.7.0_60"
>Reporter: David Pilato
>
> I'm using Tika 1.7. I'm parsing an encrypted PDF document which raise an 
> exception. So far, so good.
> My concern is that after that I have a new System property set 
> {{sun.font.CFontManager}}. 
> Code to reproduce the error:
> {code:java}
> @Test
> public void testSystem() {
> Properties props = System.getProperties();
> assertThat(props.get("sun.font.fontmanager"), nullValue());
> try {
> tika().parseToString(new 
> URL("https://github.com/elasticsearch/elasticsearch-mapper-attachments/raw/master/src/test/resources/org/elasticsearch/index/mapper/xcontent/encrypted.pdf";));
> } catch (Throwable e) {
> }
> assertThat(props.get("sun.font.fontmanager"), nullValue());
> }
> {code}
> With Tika 1.7:
> {code}
> [2015-02-11 16:43:36,166][INFO ][org.apache.pdfbox.pdfparser.PDFParser] 
> Document is encrypted
> [2015-02-11 16:43:36,837][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,837][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,838][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,838][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,839][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,840][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,840][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,841][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,841][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,842][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> java.lang.AssertionError: 
> Expected: null
>  but: was "sun.font.CFontManager"
>  
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
>   at 
> org.elasticsearch.plugin.mapper.attachments.test.TikaSystemTest.testSystem(TikaSystemTest.java:41)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
>