Presumably some sort of system call is required to list the files in the
directory -- there presumably is slight overhead in storing those once and
then calling the file initializer on stored filenames. That being said, I
agree that the overhead there is likely minuscule.







--
Karthik Sarma
UCLA Medical Scientist Training Program Class of 20??
Member, UCLA Medical Imaging & Informatics Lab
Member, CA Delegation to the House of Delegates of the American Medical
Association
[email protected]
gchat: [email protected]
linkedin: www.linkedin.com/in/ksarma


On Tue, May 7, 2013 at 12:44 PM, Finan, Sean <
[email protected]> wrote:

> I don't think that File instantiation is more slow than the ae process,
> and Tim is talking about tens of thousands of files in the directory tree.
>
> The only filesystem call that should exist in any new File(..) is a
> normalize(..) or resolve(..) on the passed parameter(s), which should just
> be string manipulation and no actual io calls, native or otherwise.  In
> other words, new File(..) should be fast.
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Karthik
> Sarma
> Sent: Tuesday, May 07, 2013 3:26 PM
> To: [email protected]
> Subject: Re: files vs strings in collection reader
>
> Hmm, without having actually reviewed the code in cTAKES (I'm not on my
> work computer), my understanding of the "correct" way of doing this is to
> use the listFiles method on the directory File to get an array of Files;
> this should be implemented natively by the JVM and could be faster than
> individual initialization.
>
>
>
>
>
> --
> Karthik Sarma
> UCLA Medical Scientist Training Program Class of 20??
> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation to
> the House of Delegates of the American Medical Association
> [email protected]
> gchat: [email protected]
> linkedin: www.linkedin.com/in/ksarma
>
>
> On Tue, May 7, 2013 at 12:17 PM, Tim Miller <
> [email protected]> wrote:
>
> > The FilesInDirectoryCollectionRead**er creates an arraylist of
> > java.io.File objects when it is initialized. For large datasets (~50k
> > files) this is substantial time overhead and probably memory as well.
> > Seems like it would be more efficient to use Strings instead of Files
> > there and just open the File object when getNext() is called. It is
> > pretty easy to implement, any downside to making this switch?
> > Tim
> >
>

Reply via email to