Re: [geometry] IO Modules

Matt Juntunen Mon, 25 Jan 2021 04:40:07 -0800

Hello,

I have two main goals for the IO modules here:


  1.  Provide a simple, high-level API (i.e. IO3D) for reading and writing 
geometry with a minimum of fuss.
  2.  Provide a low-level, extensible API specific to each data format that can 
be used to access addition format-specific information while reading and 
provide greater control over the output while writing.

So, there are actually two different APIs in question here. Users could use the 
high-level API when only the geometry itself is of interest and the low-level 
API when additional metadata is required. Useful examples of this metadata are 
the object and group names from the OBJ format (which can be used to store 
separate geometries in a single file) and the facet attribute bytes in binary 
STL files (which are sometimes used to store color information or other 
values). This information does not map directly to any data structures in 
commons-geometry but it is certainly useful to be able to access it (I will 
want to do so in my day job, for instance).

> Such customization could also be handled at the application level through
a (handler-specific) property file.

I'd rather not deal with configuration files and keep things simple and 
lightweight.

> Then the case for the "enum" is moot (IIUC).

Yes, it might be. I would like to allow format names to be mapped to more than 
one file extension, though.

> User-code should be in charge of associating input (e.g. file name) with how 
> to handle it (e.g. the instantiation of the read handler).

This would be the case for the low-level API, but I want the high-level API to 
be able to handle this itself, based on its configuration. I want to be able to 
call 'IO3D.read(Paths.get("cube.obj"))' just as I might call 'ImageIO.read(new 
File("image.png"))'.

Regards,
Matt J

________________________________
From: Gilles Sadowski <gillese...@gmail.com>
Sent: Saturday, January 23, 2021 9:40 AM
To: Commons Developers List <dev@commons.apache.org>
Subject: Re: [geometry] IO Modules

Hi.

Le ven. 22 janv. 2021 à 03:38, Matt Juntunen
<matt.juntu...@hotmail.com> a écrit :
>
> Hi Gilles,
>
> > Really, the main point is to separate format (contents) from filename 
> > (container).
>
> This makes sense. What would you think of the approach below?

I have no strong objections, as I do not graps all the requirements.
[Maybe, IO-related stuff is always bound to be messy (cf. "java.io" vs
"java.nio").]

> This would separate the format name from the file extension(s) and provide an 
> enum containing default format information and handlers. Usage of the enum 
> would be optional since there would still be overloads that accept a simple 
> format name string.

It reminds me of a discussion concerning "Bloom filters", about identifiers
for a hash function that could user-defined.
IIRC, one idea (proposed by Alex) was to maintain a text file of (unique)
identifiers.

> For the BoundaryIOManager methods that accept a Path or URL, the format would 
> still be determined by the file extension.

I'm uncomfortable with having that kind of assumption in a low-level library
(bad reminiscence of M$-DOS days).  User-code should be in charge of
associating input (e.g. file name) with how to handle it (e.g. the instantiation
of the read handler).

> If users want to use a non-standard file extension, they can open the IO 
> stream themselves and use the read/write methods that accept an IO stream and 
> format string name or Format instance.

What is "standard"/"non-standard"?  You use "txt", but the most standard
meaning of this extension is that the contents is ASCII-encoded...
And "csv" is also not sufficient to convery that contents is actually much
more constrained than a comma-separated list of strings.

Couldn't a file be used to define which read/writer the library should
instantiate, and to which extension it could be associated?

>
>     interface Format {
>         String getName();
>         List<String> getFileExtensions();
>     }
>
>     class BoundaryIOManager {
>         void register(BoundaryFormat fmt, BoundaryReadHandler rh, 
> BoundaryWriteHandler wh) {
>             register(fmt.getName(), fmt.getFileExtensions(), rh, wh);
>         }
>         void register(String formatName, List<String> extensions, 
> BoundaryReadHandler rh, BoundaryWriteHandler wh) {...}
>
>         // ...
>
>         void write(BoundarySource src, OutputStream out, Format fmt) {
>             write(src, in, fmt.getName());
>         }
>         void write(BoundarySource src, OutputStream out, String formatName) 
> {...}
>
>         // similar read methods ...
>     }
>
>     enum StandardFormat3D implements Format {
>         OBJ(...),
>         TXT(...),
>         CSV(...);
>
>         public String getName() {...}
>         public List<String> getFileExtensions() {...}
>         public BoundaryReadHandler3D readHandler() { (execute a supplier 
> function)... }
>         public BoundaryWriteHandler3D writeHandler() { (execute a supplier 
> function)... }
>     }
>
> > The "enum" is for natively supported formats to allow for simple API, while 
> > "hiding" the actual implementations (as in "RandomSource" from "Commons 
> > RNG").
>
> I'd prefer to not hide the format-specific classes, at least not completely.

Then the case for the "enum" is moot (IIUC).

> For example, the OBJ file format can contain a lot more information than just 
> pure geometry, such as object names (more than one geometry can be contained 
> in a single file), material information (for use in rendering), free-form 
> curve definitions, etc. This information is not used to produce 
> BoundarySource3D or Mesh instances but it can be accessed easily by extending 
> AbstractOBJParser or PolygonOBJParser. Also, additional information such as 
> comments and object names can be included in output files if the OBJWriter 
> class is used directly, as opposed to IO3D or BoundaryIOManager3D. It seems 
> like a waste to completely hide this functionality.

I agree to not waste functionality.  But how is the additional contents
handled currently?  It seems that it simply discarded, and someone
wanting to retrieve it would then discard the current functionality that
only return a "BoundarySource3D".
Sorry if I'm missing something because of my not having read the code
but this makes me think that a parser generator would have allowed
for extending the support of a given format.

>
> Another reason to keep these classes public is that they may need to be 
> accessed in order to configure them. For example, the txt, csv, and obj 
> formats use a default format pattern for writing floating point numbers as 
> text. If this needs to be modified, for example to increase or decrease the 
> number of minimum fraction digits, then the format-specific type will need to 
> be accessed. The code below shows how to set a custom decimal format for OBJ 
> files (using the current code).
>
>     OBJBoundaryWriteHandler3D wh = new OBJBoundaryWriteHandler3D();
>     wh.setDecimalFormatPattern("0.0##");

Such customization could also be handled at the application level through
a (handler-specific) property file.

It would be interesting to ask for more opinions about how to handle
configurations and files (posting a message to "[All]").

>
>     IO3D.getDefaultManager().registerWriteHandler("obj", wh);
>
> One additional question that I thought of while looking at your example code: 
> what is our convention for class names that contain acronyms or other 
> sequences of capitalized letters? In other words, should it be OBJWriter or 
> ObjWriter?

I'd say "Obj..." (because only initials should be capitalized).
But "ObjWriter" in Java code could mean anything...
Perhaps "ObjFormatWriter"?

Best,
Gilles

>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] IO Modules

Reply via email to