Hi Gabe.  I’d add basic logging  before you start each image and after you 
complete each image to see how much each is taking on each of problem tests so 
you can see the extent of how slow it is on your problem platforms.

Then you can add more logging to expose the problems and start to address them 
once you see where the bottlenecks are.

I wonder if there is a method to load the EXIF data out of the files without 
opening them completely.  That would seem like the ideal approach.

Cheers,
Alex Zavatone

> On Jan 7, 2023, at 12:36 PM, Gabriel Zachmann <z...@cs.uni-bremen.de> wrote:
> 
> Hi Alex, hi everyone,
> 
> thanks a lot for the many suggestions!
> And sorry for following up on this so late!
> I hope you are still willing to engage in this discussion.
> 
> Yes, Alex, I agree in that the main question is:
> how can I get the metadata of a large amount of images (say, 100k-300k) 
> *without* actually loading the whole image files.
> (For your reference: I am interested in the date tags embedded in the EXIF 
> dictionary, and those dates will be read just once per image, then cached in 
> a dictionary containing filename & dates, and that dictionary will get stored 
> on disk for future use by the app.)
> 
>> CGImageSourceRef imageSourceRef = 
>> CGImageSourceCreateWithURL((CFURLRef)imageUrl, NULL);
> 
> I have tried this:
> 
>   for ( NSString* filename in imagefiles ) 
>   {
>      NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
>     CGImageSourceRef sourceref = CGImageSourceCreateWithURL( (__bridge 
> CFURLRef) imgurl, NULL );
>   }
> 
> This takes 1 minute for around 300k images stored on my internal SSD.
> That would be OK.
> 
> However! .. if performed on a folder stored on an external hard disk, I get 
> the following timings:
> 
>          - 20 min for 150k images (45 GB) 
>          - 12 min for 150k images (45 GB), second time
>          - 150 sec for 25k images (18 GB)
>          - 170 sec for 25k images (18 GB), with the lines below (*)
>          - 80 sec for 22k (3 GB) images
>          - 80 sec for 22k (3 GB) images, with the lines below (*)
> 
> All experiments were done on different folders on the same hard disk, WD 
> MyPassport Ultra, 1 TB, USB-A connector to Macbook Air M2.
> Timings with the same number of files/GB were the same folders, resp.
> 
> (*): these were timings where I added the following lines to the loop:
> 
>        CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image, 
> 0, NULL );
>        bool success = CFDictionaryGetValueIfPresent( fileProps, 
> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
>        CFDictionaryGetValueIfPresent( exif_dict, 
> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
>        iso_date = [isoDateFormatter_ dateFromString: (__bridge NSString * 
> _Nonnull)(dateref) ];
>        [datesAndTimes_ addObject: iso_date ];
> 
> (Plus some error checking, which I omit here.)
> 
> First of all, we can see that the vast majority of time is spent on 
> CGImageSourceCreateWithURL().
> Second, there seem to be some caching effects, although I have a hard time 
> understanding that, but that is not the point.
> Third, the durations are not linear; I guess it might have something to do 
> with the sizes of the files, too, but again, didn't investigate further.
> 
> So, it looks to me like CGImageSourceCreateWithURL() really loads the 
> complete image file.
> 
> I don't see why Ole Begemann (ref'ed in Alex' post) can claim his approach 
> does not load the whole image.
> 
> 
> Some people suggested parallelizing the whole task, using 
> dispatch_queue_create or NSOperationQueue.
> (Thanks Steve, Gary, Jack!)
> Before restructuring my code for that, I would like to better understand why 
> you think that will speed up things.
> The code above pretty much does no computations, so most of the time is, I 
> guess, spent on waiting for the data to arrive from hard disk.
> So, why would would several threads loading those images in parallel help 
> here? In my thinking, they will just compete for the same resource, i.e., 
> hard disk.
> 
> 
> I also googled quite a bit, to no avail.
> 
> Any and all hints, suggestions, and insights will be highly appreciated!
> Best, Gab
> 
> 
>> 
> 
> 
>> if (!imageSourceRef)
>> return;
>> 
>> CFDictionaryRef props = CGImageSourceCopyPropertiesAtIndex(imageSourceRef, 
>> 0, NULL);
>> 
>> NSDictionary *properties = (NSDictionary*)CFBridgingRelease(props);
>> 
>> if (!properties) {
>> return;
>> }
>> 
>> NSNumber *height = [properties objectForKey:@"PixelHeight"];
>> NSNumber *width = [properties objectForKey:@"PixelWidth"];
>> int height = 0;
>> int width = 0;
>> 
>> if (height) {
>> height = [height intValue];
>> }
>> if (width) {
>> width = [width intValue];
>> }
>> 
>> 
>> Or this link by Ole Bergmann?
>> 
>> https://oleb.net/blog/2011/09/accessing-image-properties-without-loading-the-image-into-memory/
>> 
>> I love these questions.  I find out more about iOS programming by 
>> researching other people’s problems than the ones that I’m currently faced 
>> with.
>> 
>> Hopefully some of these will help.
>> 
>> Cheers,
>> Alex Zavatone
> 

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to