Presumably some sort of system call is required to list the files in the directory -- there presumably is slight overhead in storing those once and then calling the file initializer on stored filenames. That being said, I agree that the overhead there is likely minuscule.
-- Karthik Sarma UCLA Medical Scientist Training Program Class of 20?? Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation to the House of Delegates of the American Medical Association [email protected] gchat: [email protected] linkedin: www.linkedin.com/in/ksarma On Tue, May 7, 2013 at 12:44 PM, Finan, Sean < [email protected]> wrote: > I don't think that File instantiation is more slow than the ae process, > and Tim is talking about tens of thousands of files in the directory tree. > > The only filesystem call that should exist in any new File(..) is a > normalize(..) or resolve(..) on the passed parameter(s), which should just > be string manipulation and no actual io calls, native or otherwise. In > other words, new File(..) should be fast. > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of Karthik > Sarma > Sent: Tuesday, May 07, 2013 3:26 PM > To: [email protected] > Subject: Re: files vs strings in collection reader > > Hmm, without having actually reviewed the code in cTAKES (I'm not on my > work computer), my understanding of the "correct" way of doing this is to > use the listFiles method on the directory File to get an array of Files; > this should be implemented natively by the JVM and could be faster than > individual initialization. > > > > > > -- > Karthik Sarma > UCLA Medical Scientist Training Program Class of 20?? > Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation to > the House of Delegates of the American Medical Association > [email protected] > gchat: [email protected] > linkedin: www.linkedin.com/in/ksarma > > > On Tue, May 7, 2013 at 12:17 PM, Tim Miller < > [email protected]> wrote: > > > The FilesInDirectoryCollectionRead**er creates an arraylist of > > java.io.File objects when it is initialized. For large datasets (~50k > > files) this is substantial time overhead and probably memory as well. > > Seems like it would be more efficient to use Strings instead of Files > > there and just open the File object when getNext() is called. It is > > pretty easy to implement, any downside to making this switch? > > Tim > > >
