[ 
https://issues.apache.org/jira/browse/TIKA-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539958#comment-17539958
 ] 

Dan Coldrick commented on TIKA-3523:
------------------------------------

[~tallison] 

that error is probably because the command needs to be in quotes? Probably look 
for C:\Program Files\xxxxxx where Program Files has a space in so the parameter 
is being split into "C:\Program" and "Files\xxxxx"

"C:\Program Files\xxxx"

Not interested in this ticket but randomly saw the error.

> A replacement for enableFileUrl or Support for Google Cloud
> -----------------------------------------------------------
>
>                 Key: TIKA-3523
>                 URL: https://issues.apache.org/jira/browse/TIKA-3523
>             Project: Tika
>          Issue Type: Wish
>          Components: tika-server
>    Affects Versions: 2.0.0
>            Reporter: Fatih Pazarbasi
>            Priority: Minor
>
> Hello,
> I have a setup where users upload their files to a cloud bucket and I forward 
> the fileUrl to make ocr on them in a serverless cloud instance. I do it this 
> way so the users do not contact with the Tika Server and I have a copy of 
> what they've sent to process it. Also they have nothing to do with the 
> unprocessed response.
> Now that you've removed the enableFileUrl... I have to download the files to 
> the backend instance from the cloud bucket they have uploaded their files to, 
> and put them to /tika server back again...
> I tried the following config.xml to work around the situation but it was in 
> vain...
>   For the made up url: 
> [https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf|https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/]
> {code:java}
> <fetchers> 
>  <fetcher class="org.apache.tika.pipes.fetcher.fs.FileSystemFetcher"> 
>   <params> 
>    <name>fsf</name> 
>    
> <basePath>https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o</basePath>
>  
>   </params> 
>  </fetcher> 
> </fetchers> 
> <emitters> 
>  <emitter class="org.apache.tika.pipes.emitter.fs.FileSystemEmitter"> 
>   <params> 
>    <name>fse</name> 
>    <basePath>gs://abcd-efgh.appspot.com/users</basePath> 
>   </params> 
>  </emitter> 
> </emitters> 
> <server> 
>  <params> 
>   <enableUnsecureFeatures>true</enableUnsecureFeatures> 
>  </params> 
> </server> 
> <pipes> 
>  <params> 
>   <tikaConfig>/path/to/tika-config.xml</tikaConfig> 
>  </params> 
> </pipes>{code}
> {code:java}
> headers: {         
> Accept: 'text/plain',         
> 'User-Agent': 'Firebase Functions',         
> fetcherName: 'fsf',         
> fetchKey: 'somefilethatdoesnotexist.pdf',   
> },{code}
> It doesn't support the gs:// Google Storage bucket either. I have all the 
> necessary permissions but it didn't help. I'm using a dockerized version of 
> tika server, so the file System does not seem to be my concern...
>   
>  In the golden times of 1.2x Iwas simply using:
>   
> {code:java}
> headers: {               
> Accept: 'text/plain',               
> 'User-Agent': 'Firebase Functions',               
> fileUrl: 
> 'https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf',
>              
> },{code}
>  
>   
>  Am I missing something? If not my wish is that can you please make it so 
> that fetchName is the definitive  first part of the old fileUrl and fetchKey 
> is the specific pointer to a file?
> This way I have control over the urls that's been sent to tika server to some 
> extend, unlike enableFileUrl and also eat my cake without creating extra 
> traffic on the backend by downloading from the bucket and uploading to tika. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to