I agree with George. Files larger than 2GB are a serious pain. However, some 
users insist on giving us these files, and we would rather take the data than 
reject it. 

Even before you near 2GB, it is likely that something in your system will 
reject the upload. You must ensure that all of the pieces of your installation 
are configured correctly. This includes:
* Apache -- LimitRequestBody 0 
* Tomcat -- maxPostSize="0"
* Cocoon -- see http://sourceforge.net/mailarchive/message.php?msg_id=28478227
* Any security software you are running

Even once everything is set correctly, users with poor bandwidth will have 
trouble transferring large files directly through HTTP.

Our normal process is for end-users to supply the metadata and upload a dummy 
file. This way, the user ensures the correct metadata is associated with the 
correct file, which is important when they are giving us several files at once. 
Once the dummy file is in place and associated with the correct metadata, the 
user sends us the large file out-of-band. Depending on the size of the file, we 
may have them upload via a third party site like WeTransfer.com, or we may open 
an FTP server.

Once we have the file in hand, we either use the import/export tools OR make 
the appropriate changes directly in the database. Both processes are ugly and 
need improvement, but here are our current instructions:
http://wiki.datadryad.org/Large_File_Technology

--- Ryan Scherle
--- Data Repository Architect
--- Dryad Digital Repository

On Aug 30, 2012, at 12:20 PM, George S Kozak wrote:

> Bill:
>  
> Normally, at Cornell, we discourage files that are greater than 2GB.  The 
> problem isn’t that DSPace can’t handle it, the problem is in the time in 
> uploading the file at submission time and downloading by a user.  A lot of 
> times, people’s browsers just time out.
>  
> That said, we do have some large files and what I usually do with them is 
> submit them using the item import application.  I upload the file to DSpace 
> using SFTP (as a background job).  Once it’s on the server, I create an 
> import directory with the Dublin core needed and then run the batch importer. 
>  
> George Kozak
> Digital Library Specialist
> Cornell University Library Information Technologies (CUL-IT)
> 501 Olin Library
> Cornell University
> Ithaca, NY 14853
> 607-255-8924
>  
> From: Ingram, William A [mailto:wingr...@illinois.edu] 
> Sent: Thursday, August 30, 2012 11:54 AM
> To: dspace-tech@lists.sourceforge.net
> Subject: [Dspace-tech] Ingesting large data set
>  
> I apologize if a similar questuon has been answered in a prior thread. 
>  
> We have a student needing to submit a 150 GB data set into DSpace. Is this 
> even possible? Are there any tips or workarounds I should try?
>  
> Cheers,
> Bill
>  
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. 
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/_______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to