Re: Platform dependent code placement (was: Re: repo layout again)
On 3/9/06, Oliver Deakin [EMAIL PROTECTED] wrote: Time to resurrect this thread again :) We'll have to try to kill it properly this time. ;-) With the work that Mark and I have been doing in HARMONY-183/155/144/171 we will be at a point soon where all the shared code has been taken out of the native-src/win.IA32 and native-src/linux.IA32 directories and combined into native-src/shared. Once completed we will be in a good position to reorganise the code into whatever layout we choose, and refactor the makefiles/scripts to use gmake/ant across both platforms. I dont think previous posts on this thread really reached a conclusion, so Ill reiterate the previous suggestions: 1) Hierarchy of source - two suggestions put forward so far: - Keep architecture and OS names solely confined to directory names. So, for example, we could have: src\main\native\ shared\ unix\ windows\ windows_x86\ solaris_x86\ All windows_x86 specific code will be contained under that directory, any generic windows code will be under windows\, and code common to all platforms will be under shared\ (or whatever name). So when looking for a source/header file on, for example, windows x86 the compiler would first look in windows_x86, then windows, then common. - Alternatively, have directory names as above, but also allow the OS and arch to be mixed into file names. To quote Andreys previous mail [1]: Files in the source tree are selected for compilation based on the OS or ARCH attribute values which may (or may not appear) in a file or directory name. Some examples are: src\main\native\solaris\foo.cpp means file is applicable for whatever system running Solaris; src\main\native\win\foo_ia32.cpp file is applicable only for Windows / IA32; src\main\native\foo_ia32_em64t.cpp file can be compiled for whatever OS on either IA32 or EM64T architecture, but nothing else. Files will be selected using a regex expression involving the OS and arch descriptors. This is intended to cut down duplication between source directories. Wont some modules have another level after native? Since there are currently more sub-directories in native-src/linux.iA32 and native-src/win.IA32 than there are modules? Personally I prefer the first system as it is simple to maintain, keeps file names consistent and concise and allows developers to easily keep track of function location. For example, as Graeme pointed out in [2], the developer will always know that hyfile_open() is defined in hyfile.c. In addition, I don't believe that the second system will give us much of a decrease in the number of duplicated files. For example, if a piece of code is unique to only linux and windows on x86, will the file be named foo_linux_windows_x86.c? How will the build scripts be able to determine whether this means all linux platforms plus windows_x86 or windows and linux only on x86? In these cases we will either end up duplicating foo_x86.c in the windows and linux directories or creating an extra directory called x86 which contains foo_windows_linux.c. Potentially we will either get similar amounts of duplication, or more directories than the first method, and because there is no hard rule on the layout (you can choose directory or filenames to include OS/arch) there is no guarantee where a developer will choose to put their code in these situations. I don't think we should worry so much. I think we should simply make it as complicated as it needs to be for what we have today and let it evolve when a clear requirement to change comes along. That means for today, we might just have: linux windows shared We shouldn't even split by arch until we know we have too - most of the current code should be usable on most architectures without changes or at least easily fixable without duplicating entire files. (Thread .asm/.s files being an exception.) We can decide what to do when something concrete comes up. If nothing else it is much easier to reason about a concrete example, than trying to beat an issue to death when we are all probably envisioning different future situations. 2) Build tools - there have been two previous suggestions: - Use gmake and VPATH to complement the first layout described above. This could lead to platform independent makefiles stored in the shared\ directory of each module that include platform specifics (such as build file lists, compiler flags etc) from a centralised set of resources. - Alternatively, use Ant to select the set of files to be compiled by employing regex expressions. This sits well with the second layout described above
Re: Platform dependent code placement (was: Re: repo layout again)
Time to resurrect this thread again :) With the work that Mark and I have been doing in HARMONY-183/155/144/171 we will be at a point soon where all the shared code has been taken out of the native-src/win.IA32 and native-src/linux.IA32 directories and combined into native-src/shared. Once completed we will be in a good position to reorganise the code into whatever layout we choose, and refactor the makefiles/scripts to use gmake/ant across both platforms. I dont think previous posts on this thread really reached a conclusion, so Ill reiterate the previous suggestions: 1) Hierarchy of source - two suggestions put forward so far: - Keep architecture and OS names solely confined to directory names. So, for example, we could have: src\main\native\ shared\ unix\ windows\ windows_x86\ solaris_x86\ All windows_x86 specific code will be contained under that directory, any generic windows code will be under windows\, and code common to all platforms will be under shared\ (or whatever name). So when looking for a source/header file on, for example, windows x86 the compiler would first look in windows_x86, then windows, then common. - Alternatively, have directory names as above, but also allow the OS and arch to be mixed into file names. To quote Andreys previous mail [1]: Files in the source tree are selected for compilation based on the OS or ARCH attribute values which may (or may not appear) in a file or directory name. Some examples are: src\main\native\solaris\foo.cpp means file is applicable for whatever system running Solaris; src\main\native\win\foo_ia32.cpp file is applicable only for Windows / IA32; src\main\native\foo_ia32_em64t.cpp file can be compiled for whatever OS on either IA32 or EM64T architecture, but nothing else. Files will be selected using a regex expression involving the OS and arch descriptors. This is intended to cut down duplication between source directories. Personally I prefer the first system as it is simple to maintain, keeps file names consistent and concise and allows developers to easily keep track of function location. For example, as Graeme pointed out in [2], the developer will always know that hyfile_open() is defined in hyfile.c. In addition, I don't believe that the second system will give us much of a decrease in the number of duplicated files. For example, if a piece of code is unique to only linux and windows on x86, will the file be named foo_linux_windows_x86.c? How will the build scripts be able to determine whether this means all linux platforms plus windows_x86 or windows and linux only on x86? In these cases we will either end up duplicating foo_x86.c in the windows and linux directories or creating an extra directory called x86 which contains foo_windows_linux.c. Potentially we will either get similar amounts of duplication, or more directories than the first method, and because there is no hard rule on the layout (you can choose directory or filenames to include OS/arch) there is no guarantee where a developer will choose to put their code in these situations. 2) Build tools - there have been two previous suggestions: - Use gmake and VPATH to complement the first layout described above. This could lead to platform independent makefiles stored in the shared\ directory of each module that include platform specifics (such as build file lists, compiler flags etc) from a centralised set of resources. - Alternatively, use Ant to select the set of files to be compiled by employing regex expressions. This sits well with the second layout described above (although could also be applied to the first) and a regex expression has been described by Nikolay in [3]. I prefer the use of gmake here. We can use generic makefiles across platforms and pointing the compiler at the right files in the first layout above is as easy as setting VPATH to, for example, windows_x86:windows:shared. I think that complex regex expressions will be harder to maintain (and initially understand!). Opinions? Once we agree on ideas, perhaps we could put together a Wiki/website(?) page describing layout, tools and a list of OS/arch names to use. Oliver Deakin IBM United Kingdom Limited [1] http://mail-archives.apache.org/mod_mbox/incubator-harmony-dev/200602.mbox/[EMAIL PROTECTED] [2] http://mail-archives.apache.org/mod_mbox/incubator-harmony-dev/200602.mbox/[EMAIL PROTECTED] [3] http://mail-archives.apache.org/mod_mbox/incubator-harmony-dev/200602.mbox/[EMAIL PROTECTED]
Re: Platform dependent code placement (was: Re: repo layout again)
Mark Hindess [EMAIL PROTECTED] wrote on 02/23/2006 04:06:16 AM: snip I'd suggest that file is considered platform dependent if it contains any of magic platform keywords (like ia32, linux, e.t.c.) in it's full name. Directory name may or may not contain a leading name. For example, file */linux/*.c should be considered as linux specific as well. Another example, file */*_linux_solaris_*/*.c is considered as shared between linux and solaris, but not applicable for win, e.t.c. I have a few concerns about this plan. First, that we'll end up renaming relatively simple file names like foo_linux_solaris.c to unmanageable things like foo_linux_solaris_aix_plan9_freebsd_osx_openbsd_ecos.c. snip Andrey's earlier statement about allowing the component to choose the names for specializations sounds exactly right. If you're developing the JIT you would want to split along processor lines (e.g. /ia32 /ppc) whereas the file-system interface will likely follow operating-system lines (e.g. /win32, /linux, /posix). I'm not convinced about embedding the axis of specialization (OS/ARCH) in filenames. It seems like a every new component that comes along could demand a token in the filename. Based on our experience with J9 we've also seen real value in keeping file names consistent (e.g function foo() lives in file bar.c). This helps developers form a mental map of where a given piece of functionality resides, and ultimately makes navigating a large codebase easier. For example, if a function hyfile_open() is always defined in the hyfile.c then your task is simply navigating to the correct version of hyfile.c in the directory tree. If you play tricks with the filename by appending suffixes, you become more dependent on external tools like grep or ctags to locate the right file. My vote is for consistent file names, in directories whose names are selected by the component owner. A list of 'blessed' OS and ARCH values would go a long way to helping component owners select the right directory name. snip Second, if we define everything in terms of high level concepts, such as os and arch, then we will be losing the important information about the real distinction. This would make it harder for new developers to understand the choices and reasoning embodied in the code. To avoid this, we should really be defining (and using) the actual concept that is important in making the distinction about which code to pick up. snip Regardless of where you start defining configuration flags (OS and ARCH seem like a good start to me) a few simple techniques can make your life easier: i) Choose names that are unlikely to conflict with system headers by adopting a suitable prefix. For example if you like: HY_ARCH or HY_OS, your code would read like: #define HY_ARCH_IA32 1 #define HY_OS_LINUX1/* this flag is turned on */ #define HY_OS_WIN320/* this flag is turned off */ ii) Produce a list of the blessed names, and the pattern for declaring new ones. This helps anyone new to the project understand what exists already. iIi) Consider using #define with values (either one or zero) so that you can use #if tests in the code. We've found this is slightly cleaner than the #ifdef(FLAGX) or defined(FLAGX) flavours. For example: #if HY_ARCH_IA32 HY_OS_LINUX /* do something */ #else /* do something else */ #endif my $0.02 Graeme Johnson J9 VM Team, IBM Canada.
Re: Platform dependent code placement (was: Re: repo layout again)
On 22/02/06, Andrey Chernyshev [EMAIL PROTECTED] wrote: On 2/23/06, Matt Benson [EMAIL PROTECTED] wrote: Are these just sample names? Could there be shared/foo_linux.c whatever/bar_linux.c foo_ia32/bar.c bar_linux/baz.c baz_linux_ia32/more.c Yes, they could. The pattern for identifying architecture or OS dependence for a file is like [\W_]${attr}[\W_] where ${attr} stands for either specific OS or architecture. If so, will a directory always have no more than one leading name, i.e. not OS or architecture? I'd suggest that file is considered platform dependent if it contains any of magic platform keywords (like ia32, linux, e.t.c.) in it's full name. Directory name may or may not contain a leading name. For example, file */linux/*.c should be considered as linux specific as well. Another example, file */*_linux_solaris_*/*.c is considered as shared between linux and solaris, but not applicable for win, e.t.c. I have a few concerns about this plan. First, that we'll end up renaming relatively simple file names like foo_linux_solaris.c to unmanageable things like foo_linux_solaris_aix_plan9_freebsd_osx_openbsd_ecos.c. Second, if we define everything in terms of high level concepts, such as os and arch, then we will be losing the important information about the real distinction. This would make it harder for new developers to understand the choices and reasoning embodied in the code. To avoid this, we should really be defining (and using) the actual concept that is important in making the distinction about which code to pick up. For example, defining properties for concepts like os provides a getcwd function, os provides a clock_gettime function, processor provides a fast implementation of a quuxo operation, etc. Third, fixing my second problem means we end up implementing autoconf (and associate tools) in ant. We might also end up having to implement new versions of code management tools so that people can find the right bit of code to edit. Fourth, if we don't use conventional tools and appropriate, meaningful code layout, then we will be raising the bar significantly for new developers considering contributing to the project. Regards, Mark. -- Mark Hindess [EMAIL PROTECTED] IBM Java Technology Centre, UK.
Re: Platform dependent code placement (was: Re: repo layout again)
On 2/21/06, Matt Benson [EMAIL PROTECTED] wrote: I have tried to reconstruct the gist of this discussion from the archives (wasn't paying enough attention the first time through), without much luck. :) Since the discussion has evolved this far, I wonder if anyone could restate the Ant-specific part of the problem in concise terms, with the example directory structure and desired selection... ? in case I might tersify the expression at all, I'd like to help Harmony in this small way as I've not yet found time to do more... Hi Matt, Thanks for your attention to this. I'd like to have a selector in Ant FileSet, which would select file names based on a regular expression. The regexp needs to be matched with the string which consists of a path relative to the base dir of a fileset, plus file name. For example, suppose we have a set of files like this: shared\test_linux_ia32.c shared\test_shared.c shared\test_win.c shared\test_win_ia32.c test_ia32\test1.c test_linux\test2.c test_win_ia32\test4.c Then, for linux/ia32 configuration the selector should take: shared\test_linux_ia32.c shared\test_shared.c test_ia32\test1.c test_linux\test2.c Ideally, I'd wish to do that with a code something like this: fileset dir=. includes=**/*.c and or filenameregex expression=[\W_]${env.OS}[\W_]/ not filenameregex expression=[\W_](win|linux|solaris)[\W_]/ /not /or or filenameregex expression=[\W_]${env.ARCH}[\W_]/ not filenameregex expression=[\W_](ia32|sparc|ipf)[\W_]/ /not /or /and /fileset /cc The above logic exactly describes the layout of a platform dependent code that I suggested for Harmony. I've tried to use standard filename and containsregex selectors, but they didn't appear suitable for that purpose. Thank you, Andrey Chernyshev Intel Middleware Products Division -Matt --- Andrey Chernyshev [EMAIL PROTECTED] wrote: (a bunch of stuff I snipped ;) __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Platform dependent code placement (was: Re: repo layout again)
--- Andrey Chernyshev [EMAIL PROTECTED] wrote: On 2/21/06, Matt Benson [EMAIL PROTECTED] wrote: [SNIP] wonder if anyone could restate the Ant-specific part of the problem in concise terms, with the example directory structure and desired selection... ? in case I might tersify the expression at all, I'd like to help Harmony in this small way as I've not yet found time to do more... Hi Matt, Thanks for your attention to this. I'd like to have a selector in Ant FileSet, which would select file names based on a regular expression. The regexp needs to be matched with the string which consists of a path relative to the base dir of a fileset, plus file name. For example, suppose we have a set of files like this: shared\test_linux_ia32.c shared\test_shared.c shared\test_win.c shared\test_win_ia32.c test_ia32\test1.c test_linux\test2.c test_win_ia32\test4.c Then, for linux/ia32 configuration the selector should take: shared\test_linux_ia32.c shared\test_shared.c test_ia32\test1.c test_linux\test2.c Are these just sample names? Could there be shared/foo_linux.c whatever/bar_linux.c foo_ia32/bar.c bar_linux/baz.c baz_linux_ia32/more.c If so, will a directory always have no more than one leading name, i.e. not OS or architecture? Thanks, Matt Ideally, I'd wish to do that with a code something like this: fileset dir=. includes=**/*.c and or filenameregex expression=[\W_]${env.OS}[\W_]/ not filenameregex expression=[\W_](win|linux|solaris)[\W_]/ /not /or or filenameregex expression=[\W_]${env.ARCH}[\W_]/ not filenameregex expression=[\W_](ia32|sparc|ipf)[\W_]/ /not /or /and /fileset /cc The above logic exactly describes the layout of a platform dependent code that I suggested for Harmony. I've tried to use standard filename and containsregex selectors, but they didn't appear suitable for that purpose. Thank you, Andrey Chernyshev Intel Middleware Products Division -Matt --- Andrey Chernyshev [EMAIL PROTECTED] wrote: (a bunch of stuff I snipped ;) __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Platform dependent code placement (was: Re: repo layout again)
On 2/23/06, Matt Benson [EMAIL PROTECTED] wrote: Are these just sample names? Could there be shared/foo_linux.c whatever/bar_linux.c foo_ia32/bar.c bar_linux/baz.c baz_linux_ia32/more.c Yes, they could. The pattern for identifying architecture or OS dependence for a file is like [\W_]${attr}[\W_] where ${attr} stands for either specific OS or architecture. If so, will a directory always have no more than one leading name, i.e. not OS or architecture? I'd suggest that file is considered platform dependent if it contains any of magic platform keywords (like ia32, linux, e.t.c.) in it's full name. Directory name may or may not contain a leading name. For example, file */linux/*.c should be considered as linux specific as well. Another example, file */*_linux_solaris_*/*.c is considered as shared between linux and solaris, but not applicable for win, e.t.c. Thank you, Andrey Chernyshev Intel Middleware Products Division Thanks, Matt Ideally, I'd wish to do that with a code something like this: fileset dir=. includes=**/*.c and or filenameregex expression=[\W_]${env.OS}[\W_]/ not filenameregex expression=[\W_](win|linux|solaris)[\W_]/ /not /or or filenameregex expression=[\W_]${env.ARCH}[\W_]/ not filenameregex expression=[\W_](ia32|sparc|ipf)[\W_]/ /not /or /and /fileset /cc The above logic exactly describes the layout of a platform dependent code that I suggested for Harmony. I've tried to use standard filename and containsregex selectors, but they didn't appear suitable for that purpose. Thank you, Andrey Chernyshev Intel Middleware Products Division -Matt --- Andrey Chernyshev [EMAIL PROTECTED] wrote: (a bunch of stuff I snipped ;) __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Platform dependent code placement (was: Re: repo layout again)
--- Andrey Chernyshev [EMAIL PROTECTED] wrote: On 2/23/06, Matt Benson [EMAIL PROTECTED] wrote: Are these just sample names? Could there be shared/foo_linux.c whatever/bar_linux.c foo_ia32/bar.c bar_linux/baz.c baz_linux_ia32/more.c Yes, they could. The pattern for identifying architecture or OS dependence for a file is like [\W_]${attr}[\W_] where ${attr} stands for either specific OS or architecture. If so, will a directory always have no more than one leading name, i.e. not OS or architecture? I'd suggest that file is considered platform dependent if it contains any of magic platform keywords (like ia32, linux, e.t.c.) in it's full name. Directory name may or may not contain a leading name. For example, file */linux/*.c should be considered as linux specific as well. Another example, file */*_linux_solaris_*/*.c is considered as shared between linux and solaris, but not applicable for win, e.t.c. Ah... I hadn't extrapolated the linux_solaris possibility. The reason I asked my last question--i.e. will there always be foo_os, foo_arch, foo_os_arch as opposed to foo_bar_os, foo_bar_arch, foo_bar_os_arch--is to learn more about how to differentiate between foo_ia32 and foo_win_ia32. The reason being that the combination of linux/ia32 can't just blindly include any file/dir with ia32 in the name or it could pick up e.g. foo_win_ia32... can you confirm there would be no reason for foo_bar_(os_arch|os|arch)? -Matt Thank you, Andrey Chernyshev Intel Middleware Products Division Thanks, Matt Ideally, I'd wish to do that with a code something like this: fileset dir=. includes=**/*.c and or filenameregex expression=[\W_]${env.OS}[\W_]/ not filenameregex expression=[\W_](win|linux|solaris)[\W_]/ /not /or or filenameregex expression=[\W_]${env.ARCH}[\W_]/ not filenameregex expression=[\W_](ia32|sparc|ipf)[\W_]/ /not /or /and /fileset /cc The above logic exactly describes the layout of a platform dependent code that I suggested for Harmony. I've tried to use standard filename and containsregex selectors, but they didn't appear suitable for that purpose. Thank you, Andrey Chernyshev Intel Middleware Products Division -Matt --- Andrey Chernyshev [EMAIL PROTECTED] wrote: (a bunch of stuff I snipped ;) __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Platform dependent code placement (was: Re: repo layout again)
Hello, team, I've tried to simplify construction below, which is sample of Andrey's ant script, and end up with following regular expression which matches string containing particular OS identifier or strings w/o any OS identifiers: .*([_\\W]${env.OS}[_\\W].*$|.*(?!.*[\\W_](win|lin|sol)[\\W_].*)$) This would work fine with regex package from HARMONY-39 contribution, but fail to compile with SUN's classes (PatternSyntaxException: Look-behind group does not have an obvious maximum length, I would appreciate if someone point me to the place in any regex specification, stating that it's valid behavior). From the compatibility point of view this enhancement is no good, but to give a hint how to implement negative assertions in terms of regex negative look behind/ahead is the solution. propertyregex property=OS.match input=@{file} regexp=[\W_]${env.OS}[\W_] override=yes defaultValue=no select=yes/ propertyregex property=OS.any.match input=@{file} regexp=[\W_](win|linux|solaris)[\W_] override=yes defaultValue=no select=yes/ istrue value=${OS.match}/ not istrue value=${OS.any.match}/ /not Thank you. Nikolay Kuznetsov Intel Middleware Products Division
Re: Platform dependent code placement (was: Re: repo layout again)
On 2/17/06, Tim Ellison [EMAIL PROTECTED] wrote: do you have any examples (i.e. snippets of a non-trivial Ant script) that show what it would end up like? I'm trying to figure out in my own head whether it would be a few general regex selectors, or a load of them! I think you may be do it with just a few, right? Actually not as few as I would wish. The selectors themselves are 4 lines, plus ~15 lines of logic ops to combine them. However, the rest of code which does necessary conversions is big. Finally, the code snippet that worked for me is like this: !-- Translate fileset into plain string first -- pathconvert property=source.files pathsep=, path fileset dir=${basedir} includes=**/*.c/ /path map from=${basedir} to=/ /pathconvert !-- Filter plain string using regex -- for list=${source.files} param=file sequential propertyregex property=OS.match input=@{file} regexp=[\W_]${env.OS}[\W_] override=yes defaultValue=no select=yes/ propertyregex property=OS.any.match input=@{file} regexp=[\W_](win|linux|solaris)[\W_] override=yes defaultValue=no select=yes/ propertyregex property=ARCH.match input=@{file} regexp=[\W_]${env.ARCH}[\W_] override=yes defaultValue=no select=yes/ propertyregex property=ARCH.any.match input=@{file} regexp=[\W_](ia32|sparc|ipf)[\W_] override=yes defaultValue=no select=yes/ if and or istrue value=${OS.match}/ not istrue value=${OS.any.match}/ /not /or or istrue value=${ARCH.match}/ not istrue value=${ARCH.any.match}/ /not /or /and then var name=filetered.files value=${filetered.files},@{file}/ /then /if /sequential /for !-- Convert string back to fileset -- path id=filetered.files.path filelist dir=. files=${filetered.files}/ /path pathtofileset name=filetered.files.fileset pathrefid=filetered.files.path dir=${basedir}/ !-- Call C compile task with the resulting fileset -- cc objdir=out fileset refid=filetered.files.fileset/ /cc /target It doesn't look too small because the original ant fileset regexp selectors don't work with the directory names, hence one has to apply some magic (convert fileset to string, filter it using regexp, and then convert back to files). I imagine that the above code should look much simpler if write a custom regexp selector class. The reason why it doesn't look too simple is that we allow an arbitrary delimiter like [\W_] for OS/ARCH attributes, and we catch partially shared file names like linux_solaris. I didn't try yet to implement the same logic with make, would it be much simpler? Thanks, Andrey. Regards, Tim Using the names consistently will definitely help, but choosing whether to create a separate copy of the file in a platform-specific sub-directory, or to use #define's within a file in a shared-family sub-directory will likely come down to a case by case decision. For example, 32-bit vs. 64-bit code may be conveniently #ifdef'ed in some .c files, but a .h file that defines pointer types etc. may need different versions of the entire file to keep things readable. Yes, I agree. This is why I suggest to keep both selection mechanisms - sometimes #define is more efficient, and sometimes dir/filename is much more clear. Finally, I'd suggest that the platform dependent code can be organized in 3 different ways: (1) Explicitly, via defining the appropriate file list. For example, Ant xml file may choose either one or another fileset, depending on the current OS and ARCH property values. This approach is most convenient, for example, whenever a third-party code is compiled or the file names could not be changed for some reason. Ant ?! ;-) or platform-specific makefile #includes? Let's consider both for now :) There will be files that it makes sense to share for sure (like vmi.h and jni.h etc.) but they should be stable-API types that can be refreshed across the boundary as required if necessary. Agreed. I think it would be great if we can keep our inter-component interfaces (like vmi.h) platform independent. Thank you, Andrey Chernyshev Intel Middleware Products Division Hence, the most efficient (in terms of code sharing and readability) code placement would require a maximum flexibility, though preserving some well-defined rules. The scheme based on file dir/name matching seems to be flexible enough. How does the above proposal sound? Cool, perhaps
Re: Platform dependent code placement (was: Re: repo layout again)
Tim Ellison wrote: Andrey Chernyshev wrote: snip On the other hand, having a separate source trees like linux32.sparc, solaris64.sparc, win.IA32 for each specific platform combination may lead to a huge code duplication. We may need to be able to share the code through the certain, but not through all platform combinations. Agreed. The existing code layout for the classlib natives is certainly not a viable way to scale across multiple platforms. (The 'in-house' mechanism for managing multi-platform code is particular to IBM so not of great interest here, suffice to say that the win.IA32 and linux.IA32 trees in classlib/trunk/native-code are the product of that mechanism with some manual tidy-up). Also agree. The current layout will not scale well when we move to a broader range of platforms. To address that issue, I can suggest a pretty straightforward scheme for platform-dependent code placement which looks as follows: 1. There is a fixed set of attributes which denotes a specific target configuration. As a starter set, we may have OS (for operating system) and, say ARCH (for architecture) attributes. This set can be extended later, but, as it was suggested, let's don't cross that bridge if we come to it. Yes, the principal distinction is probably on OS ARCH. 2. Files in the source tree are selected for compilation based on the OS or ARCH attribute values which may (or may not appear) in a file or directory name. Some examples are: src\main\native\solaris\foo.cpp - means file is applicable for whatever system running Solaris; yep (that was foo.c, right ;-) -- only teasing) src\main\native\win\foo_ia32.cpp - file is applicable only for Windows / IA32; why has the ARCH flipped onto the file name? why not win_ia32 ? src\main\native\foo_ia32_em64t.cpp - file can be compiled for whatever OS on either IA32 or EM64T architecture, but nothing else. I agree with the approach, but left wondering why it is not something like: src\main\native\ common\ unix\ windows\ zos\ solaris\ solaris_x86\ solaris_sparc\ windows_ifp\ i.e. a taxonomy covering families of code (common, unix-like, windows-like) and increasingly specific discriminators. The idea is good, however I think including both the OS and arch in the directory name is preferable. It is just as simple a convention, gives the coder an at-a-glance view of which OS/arch's have platform specific code associated with them and keeps the actual source filenames consistent across platforms. Was there a particular reason for attaching the architecture to the filename and not the directory Andrey? The formal file selection rule may look like: (1) File is applicable for the given OS value if its pathname contains regexp [\W_]${OS}[\W_], or pathname doesn't contain any OS value; (2) File is applicable for the given ARCH value if its pathname contains regexp [\W_]${ARCH}[\W_], or pathname doesn't contain any ARCH value; (3) File is selected for a compilation if it satisfies both (1) and (2) criteria. If we restrict the OS and ARCH identifiers to directories then it will allow us to use the gmake VPATH functionality to select the right file, so compiling on solaris x86 will have a VPATH='solaris_x86:solaris:unix:common' and so on. I agree that is a perfect scenario to use VPATH for. I think this would probably be a simpler solution than using ant (as suggested later) and also would not require you to have a JVM to build the native code. One can see that this naming convention gives developers enough freedom to layout their code in a most convenient way (actually, experience shows that the meaning of convenient may differ significantly depending on a component type :). On the other hand, it gives well defined (and hopefully intuitive enough) rule showing whether the particular file is picked up by the compiler or not, depending on a configuration. I like the idea -- if we agree to use gmake throughout then I think we get this functionality 'for free'. In addition to the above, the code could also be selected for compilation by means of #defines directives in C/C++ files (it is convenient when the most of a file is platform-independent, with the exception of just a few lines of code). The building system could set up the OS and ARCH attributes as appropriate defines for the C/C++ code. For example, for Windows/IA32 config, the following defines could be set: #define OS WIN #define WIN #define ARCH IA32 #define IA32 Then the platform-dependent code sections may look like: #ifdef WIN …. #endif which is essentially same as: #if OS == WIN …. #endif It is important that OS/ARCH (or whatever additional) attribute names and values are used consistently in the file names and
Re: Platform dependent code placement (was: Re: repo layout again)
src\main\native\win\foo_ia32.cpp - file is applicable only for Windows / IA32; why has the ARCH flipped onto the file name? why not win_ia32 ? Well, let's see - if I have a file which is shared between Windows and Linux, but it is IA32 specific, then I'll have to duplicate it in win_ia32 and linux_ia32 dirs. It means having both ARCH and OS in a file name isn't always convenient. Another case, if I have only one file in my component which is IA32-specific, it could be more convenient just to rename it like foo.c - foo_ia32.c and keep it at the same location, rather than move to some other directory. One sort of problems coming here is that every additional directory may need to be registered appropriately as a source search path in development / debugging tools (you'd face this if you try to debug with MSVC, for example). I just thought that giving a freedom to choose between naming files or directories will help people to work in the most convenient way. src\main\native\foo_ia32_em64t.cpp - file can be compiled for whatever OS on either IA32 or EM64T architecture, but nothing else. I agree with the approach, but left wondering why it is not something like: src\main\native\ common\ unix\ windows\ zos\ solaris\ solaris_x86\ solaris_sparc\ windows_ifp\ i.e. a taxonomy covering families of code (common, unix-like, windows-like) and increasingly specific discriminators. Well, this directory structure does fit to the scheme I proposed, it is a particular case of it. Some people probably will want also to play with the file names within a single directory in the same style: foo_solaris.c, foo_solaris_sparc.c, ... I guess if a component contains only 3 platform dependent c files, someone would be frustrated to create 3 different directories for them. The formal file selection rule may look like: (1) File is applicable for the given OS value if its pathname contains regexp [\W_]${OS}[\W_], or pathname doesn't contain any OS value; (2) File is applicable for the given ARCH value if its pathname contains regexp [\W_]${ARCH}[\W_], or pathname doesn't contain any ARCH value; (3) File is selected for a compilation if it satisfies both (1) and (2) criteria. If we restrict the OS and ARCH identifiers to directories then it will allow us to use the gmake VPATH functionality to select the right file, so compiling on solaris x86 will have a VPATH='solaris_x86:solaris:unix:common' and so on. I see. Possibly vpath (small letters) could address the filenames? Perhaps something like this should work: vpath %solaris%.c vpath %.c soalris:unix:common I like the idea -- if we agree to use gmake throughout then I think we get this functionality 'for free'. I guess the same could be done relatively easy for Ant as well, with help of filesets and containsregex selectors. Using the names consistently will definitely help, but choosing whether to create a separate copy of the file in a platform-specific sub-directory, or to use #define's within a file in a shared-family sub-directory will likely come down to a case by case decision. For example, 32-bit vs. 64-bit code may be conveniently #ifdef'ed in some .c files, but a .h file that defines pointer types etc. may need different versions of the entire file to keep things readable. Yes, I agree. This is why I suggest to keep both selection mechanisms - sometimes #define is more efficient, and sometimes dir/filename is much more clear. Finally, I'd suggest that the platform dependent code can be organized in 3 different ways: (1) Explicitly, via defining the appropriate file list. For example, Ant xml file may choose either one or another fileset, depending on the current OS and ARCH property values. This approach is most convenient, for example, whenever a third-party code is compiled or the file names could not be changed for some reason. Ant ?! ;-) or platform-specific makefile #includes? Let's consider both for now :) There will be files that it makes sense to share for sure (like vmi.h and jni.h etc.) but they should be stable-API types that can be refreshed across the boundary as required if necessary. Agreed. I think it would be great if we can keep our inter-component interfaces (like vmi.h) platform independent. Thank you, Andrey Chernyshev Intel Middleware Products Division Hence, the most efficient (in terms of code sharing and readability) code placement would require a maximum flexibility, though preserving some well-defined rules. The scheme based on file dir/name matching seems to be flexible enough. How does the above proposal sound? Cool, perhaps we can discuss if it should be gmake + vpath or ant. Thanks for resurrecting this thread. Regards, Tim Maybe in some components we
Re: Platform dependent code placement (was: Re: repo layout again)
The idea is good, however I think including both the OS and arch in the directory name is preferable. It is just as simple a convention, gives the coder an at-a-glance view of which OS/arch's have platform specific code associated with them and keeps the actual source filenames consistent across platforms. Was there a particular reason for attaching the architecture to the filename and not the directory Andrey? I think there is no particular reason except just a convenience. People may not wish to create extra directory and go there each time because of a few platform-dependent files. Also, as I mentioned in my previous message, every extra source directory may additionally complicate the setup of debugging tools and IDE's. I agree that is a perfect scenario to use VPATH for. I think this would probably be a simpler solution than using ant (as suggested later) and also would not require you to have a JVM to build the native code. We have a lots of Java code in a source base, right?:) Therefore one will need a JVM anyways to be able to build something runnable. This is a tricky one. I think in most cases the difference between 32/64bit code should be minor and mostly confined to header defines as Tim suggests. For this ifdef's will Let's don't forget about the different OS'es and architectures. For example, I'd expect significant difference in implementations for AWT on Win/X11 and JIT compiler on IA32/Sparc/IPF. The whole design of the code could be different, not just implementations of certain functions or classes. Using #defines only could be almost nightmare in this case... be sufficient. I would simply suggest that we adopt a policy of always marking all #else and #endif's clearly to indicate which condition they relate to. However, there may be instances where using ifdef's obfuscates the code. I think most of the time this will be a judgement call on the part of the coder - if you look at a piece of code and cannot tell what the preprocessor is going to give you on a particular platform, you're probably looking at a candidate for code separation. I agree, this seems to be a good criteria for choosing between defines and dir/file names. Thank you, Andrey Chernyshev Intel Middleware Products Division Finally, I'd suggest that the platform dependent code can be organized in 3 different ways: (1) Explicitly, via defining the appropriate file list. For example, Ant xml file may choose either one or another fileset, depending on the current OS and ARCH property values. This approach is most convenient, for example, whenever a third-party code is compiled or the file names could not be changed for some reason. Ant ?! ;-) or platform-specific makefile #includes? (2) Via the file path naming convention. This is the preferred approach and works well whenever distinctive files for different platforms can be identified. yep (modulo discussion of filenames vs. dir names to enable vpath) (3) By means of the preprocessor directives. This could be convenient if only few lines of code need to vary across the platforms. However, preprocessor directives would make the code less readable, hence this should be used with care. In terms of building process, it means that the code has to pass all 3 stages of filtering before it is selected for the compilation. I like it. Let's just discuss what tools do the selection -- but I agree with the approach. The point is that components at Harmony could be very different, especially if we take into account that they may belong both to Class Libraries and VM world. There will be files that it makes sense to share for sure (like vmi.h and jni.h etc.) but they should be stable-API types that can be refreshed across the boundary as required if necessary. Hence, the most efficient (in terms of code sharing and readability) code placement would require a maximum flexibility, though preserving some well-defined rules. The scheme based on file dir/name matching seems to be flexible enough. How does the above proposal sound? Sounds good :) It makes a lot of sense to organise the code in a way that promotes reuse across platforms. +1 from me -- Oliver Deakin IBM United Kingdom Limited Cool, perhaps we can discuss if it should be gmake + vpath or ant. Thanks for resurrecting this thread. Regards, Tim Maybe in some components we would want to include a window manager family too, though let's cross that bridge... I had a quick hunt round for a recognized standard or convention for OS and CPU family names, but it seems there are enough subtle differences around that we should just define them for ourselves. My VM's config script maintains CPU type, OS name, and word size as three independent values. These are combined in various ways in the source code and support scripts depending on the
Re: Platform dependent code placement (was: Re: repo layout again)
Andrey Chernyshev wrote: src\main\native\win\foo_ia32.cpp - file is applicable only for Windows / IA32; why has the ARCH flipped onto the file name? why not win_ia32 ? Well, let's see - if I have a file which is shared between Windows and Linux, but it is IA32 specific, then I'll have to duplicate it in win_ia32 and linux_ia32 dirs. It means having both ARCH and OS in a file name isn't always convenient. Another case, if I have only one file in my component which is IA32-specific, it could be more convenient just to rename it like foo.c - foo_ia32.c and keep it at the same location, rather than move to some other directory. One sort of problems coming here is that every additional directory may need to be registered appropriately as a source search path in development / debugging tools (you'd face this if you try to debug with MSVC, for example). I just thought that giving a freedom to choose between naming files or directories will help people to work in the most convenient way. src\main\native\foo_ia32_em64t.cpp - file can be compiled for whatever OS on either IA32 or EM64T architecture, but nothing else. I agree with the approach, but left wondering why it is not something like: src\main\native\ common\ unix\ windows\ zos\ solaris\ solaris_x86\ solaris_sparc\ windows_ifp\ i.e. a taxonomy covering families of code (common, unix-like, windows-like) and increasingly specific discriminators. Well, this directory structure does fit to the scheme I proposed, it is a particular case of it. Some people probably will want also to play with the file names within a single directory in the same style: foo_solaris.c, foo_solaris_sparc.c, ... Ah, I see. I hadn't appreciated that you can mix-n-match the dir names and file names encoding. I guess if a component contains only 3 platform dependent c files, someone would be frustrated to create 3 different directories for them. The formal file selection rule may look like: (1) File is applicable for the given OS value if its pathname contains regexp [\W_]${OS}[\W_], or pathname doesn't contain any OS value; (2) File is applicable for the given ARCH value if its pathname contains regexp [\W_]${ARCH}[\W_], or pathname doesn't contain any ARCH value; (3) File is selected for a compilation if it satisfies both (1) and (2) criteria. If we restrict the OS and ARCH identifiers to directories then it will allow us to use the gmake VPATH functionality to select the right file, so compiling on solaris x86 will have a VPATH='solaris_x86:solaris:unix:common' and so on. I see. Possibly vpath (small letters) could address the filenames? Perhaps something like this should work: vpath %solaris%.c vpath %.c soalris:unix:common I like the idea -- if we agree to use gmake throughout then I think we get this functionality 'for free'. I guess the same could be done relatively easy for Ant as well, with help of filesets and containsregex selectors. do you have any examples (i.e. snippets of a non-trivial Ant script) that show what it would end up like? I'm trying to figure out in my own head whether it would be a few general regex selectors, or a load of them! I think you may be do it with just a few, right? Regards, Tim Using the names consistently will definitely help, but choosing whether to create a separate copy of the file in a platform-specific sub-directory, or to use #define's within a file in a shared-family sub-directory will likely come down to a case by case decision. For example, 32-bit vs. 64-bit code may be conveniently #ifdef'ed in some .c files, but a .h file that defines pointer types etc. may need different versions of the entire file to keep things readable. Yes, I agree. This is why I suggest to keep both selection mechanisms - sometimes #define is more efficient, and sometimes dir/filename is much more clear. Finally, I'd suggest that the platform dependent code can be organized in 3 different ways: (1) Explicitly, via defining the appropriate file list. For example, Ant xml file may choose either one or another fileset, depending on the current OS and ARCH property values. This approach is most convenient, for example, whenever a third-party code is compiled or the file names could not be changed for some reason. Ant ?! ;-) or platform-specific makefile #includes? Let's consider both for now :) There will be files that it makes sense to share for sure (like vmi.h and jni.h etc.) but they should be stable-API types that can be refreshed across the boundary as required if necessary. Agreed. I think it would be great if we can keep our inter-component interfaces (like vmi.h) platform independent. Thank you, Andrey Chernyshev Intel Middleware Products Division Hence, the most